Best-effort topology mgr policy doesn't give best-effort CPU NUMA alignment

### What happened?

### Use case:
A DPDK application uses VLAN trunking on SR-IOV NICs and requires dedicated SR-IOV NICs. For cost-reasons there is only one SRIOV-NIC per server but, to exploit the CPU resources optimally, the applications need to run one single-NUMA pod per CPU socket. For these pods, CPUs and huge-pages must be allocated from the same NUMA node, while the SR-IOV device may be allocated from the NIC on the remote NUMA node, if necessary.

### Problem description:
K8s bare metal node with CPU topology
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79
The single SR-IOV NIC is on NUMA 0. 

Kubelet is configured with
•	CPU manager policy "static"
•	Topology manager policy "best-effort"
•	reserved_cpus: 0,1,40,41

The application creates two Guaranteed QoS DPDK pods requesting 32 CPUs each. The Remaining 6 CPUs per NUMA node are meant to be used by best-effort and burstable QoS pods.

The expected behavior with best-effort policy is that the CPU manager provides both Guaranteed QoS pods with CPUs from a single NUMA node each, even if the device manager cannot provide each pod with a local SR-IOV VF.

Unfortunately this is not what happens: The CPU manager assigns CPUs 2-32,42-72 on NUMA 0 to the first pod and remaining CPUs 34-38,74-78 on NUMA 0 plus CPUs 3-25,43-65 on NUMA 1 to the second pod, thus breaking the DPDK application, which requires single NUMA CPUs.


### What did you expect to happen?

The expected behavior with best-effort policy is that the CPU manager provides both Guaranteed QoS pods with CPUs from a single NUMA node each, even if the device manager cannot provide each pod with a local SR-IOV VF.

### How can we reproduce it (as minimally and precisely as possible)?

See above. Create two guaranteed QoS pods with integer CPU requests and an SR-IOV device request from an SR-IOV network device pool on one NUMA node only such that the pods won't fit on the same NUMA node but one pod doesn't fully occupy the SR-IOV NUMA node either.

### Anything else we need to know?

### Analysis:
The problem is that for the second pod (to be landed on NUMA 1) the CPU manager offers the topology hints [10 (preferred), 11 (not preferred)]. The affinity bit strings enumerate the NUMA node right to left. The device manager's hint is [01 (preferred)]. The topology manager unconditionally merges these into a best hint 01 (not preferred). It does so by iterating over the cross-product of all provider hints, doing a bitwise AND of the affinity masks. For non-zero results the preferred status is set to true if and only if all combined provider hints were preferred. In our case the only non-zero affinity mask is 11 & 01 = 01, and it is not preferred.

With topology manager policies "single-numa-node" or "restricted" the topology manager would immediately reject pod admission. With "best effort" policy it admits the pod and returns the computed "best hint" 01 (not preferred) to the CPU and device manager for their resource allocations. Hence the CPU manager starts allocating CPUs from NUMA 0 and (since there are not enough) fills up the rest from NUMA 1. Note that the "best hint" 01 is not even among the options supplied by the CPU manager in the first place.

### Proposal:
If there is no preferred best hint, the topology manager with "best-effort" policy should return one preferred hint to each provider from the original list that it had received. For the device manager that would be 01, for the CPU manager it would be 10. That way, each resource owner could do its best to guarantee NUMA locality among its resources.
We will provide a corresponding PR to open the discussion on how to improve the best-effort behavior of the topology manager.


### Kubernetes version

<details>

```console
$ kubectl version
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"841a4f4f3d4528aa284171074e00503faea18496", GitTreeState:"clean", BuildDate:"2021-08-31T06:52:44Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
```

</details>


### Cloud provider

none

### OS version

<details>

```console
# On Linux:
$ cat /etc/os-release
NAME="SLES"
VERSION="15-SP2"
VERSION_ID="15.2"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP2"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15:sp2"

$ uname -a
Linux control-plane-n108-mast-n027 5.3.18-24.75.3.22886.0.PTF.1187468-default #1 SMP Thu Sep 9 23:24:48 UTC 2021 (37ce29d) x86_64 x86_64 x86_64 GNU/Linux
```

</details>


### Install tools

<details>

</details>


### Container runtime (CRI) and and version (if applicable)

<details>

</details>


### Related plugins (CNI, CSI, ...) and versions (if applicable)

<details>

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Best-effort topology mgr policy doesn't give best-effort CPU NUMA alignment #106270

What happened?

Use case:

Problem description:

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Analysis:

Proposal:

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Best-effort topology mgr policy doesn't give best-effort CPU NUMA alignment #106270

Description

What happened?

Use case:

Problem description:

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Analysis:

Proposal:

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions