-
Notifications
You must be signed in to change notification settings - Fork 40.9k
node: resourcemanager: API to check exclusive assignment availability #128728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
/sig node |
/hold reviewable, but I'd like to add more tests |
pkg/kubelet/kubelet.go
Outdated
} | ||
if utilfeature.DefaultFeatureGate.Enabled(features.MemoryManager) { | ||
if kl.containerManager.GetNodeConfig().ExperimentalMemoryManagerPolicy == "static" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @tallclair due to historycal reasons, memory manager policy names are CamelCase (e.g. "Static") so you may have a hidden bug here. My PR wants to remove this class of possible bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if cm.cpuManager.GetAllocatableCPUs().IsEmpty() { | ||
// Only do so when the cpumanager is not exclusively allocating CPUs, as it will do its own updating of the cpuset when | ||
// CPUs are exclusively allocated. | ||
if !cm.cpuManager.CanAllocateExclusively(v1.ResourceCPU) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @haircommander
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is cleaner, LGTM
@@ -2840,17 +2840,13 @@ func isPodResizeInProgress(pod *v1.Pod, podStatus *kubecontainer.PodStatus) bool | |||
// Returns true if the resize can proceed. | |||
func (kl *Kubelet) canResizePod(pod *v1.Pod) (bool, v1.PodResizeStatus) { | |||
if v1qos.GetPodQOS(pod) == v1.PodQOSGuaranteed && !utilfeature.DefaultFeatureGate.Enabled(features.InPlacePodVerticalScalingExclusiveCPUs) { | |||
if utilfeature.DefaultFeatureGate.Enabled(features.CPUManager) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed FGs because since 1.32 both managers are GA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we need to consider https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#cpu-policy-static--options which some of them are alpha, some beta?
ab81fd7
to
bf3a165
Compare
perhaps the VPA use case can be actually handled by the same API being added #127525 + a similar API for memory management; if so, the point of this PR will be moot. Depends on what VPA code axtually needs to know |
I am open to either approach, but we should separate the new cgroupManager API from #127525 if we wanted to get it in as a bug fix (seems there's also a featuregate addition in there) |
/triage accepted |
bf3a165
to
38482fd
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ffromani The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Overall question , have we thought about the tax/impacts in API from resources point of view ? I mean it depends how often it will be called etc but would be nice to have a figure and taking it into account in API dimensioning. |
No, I didn't run benchmark or profiling (do we do in general, as habit? do we do on special cases?). However most of the complexity comes from the need to traverse the layers. The actual check is either extremely cheap (hardcoded constant!) or very cheap (mutex + hashtable lookup) |
I think it is worth the effort to give it a try |
/test pull-kubernetes-node-kubelet-podresize |
uhm, not sure where I can get a pod object from here though. |
nope, in the node container manager flows there's no meanigful pod objects to be retreived. So we need to keep it like that for the time being. |
8cf6d09
to
d31331c
Compare
/test pull-kubernetes-node-kubelet-podresize |
d31331c
to
0b79dc8
Compare
/retest-required |
/test pull-kubernetes-unit cache test failure is unlikely (read: is not) this PR's fault |
0b79dc8
to
4b6aabc
Compare
There are now at least two pieces of code which wants to know if resource managers (CPU, Memory) support exclusive assignment: - VPA, for its internal needs - node container manager, to fix cgroup params Currently, code needing to know if exclusive assignment is available has mostly two options: - infer from available APIs. Works, keep somehow the responsability respected, but it's clumsy - peeks in the global config, which introduces unnecessary coupling and possibly bugs (e.g. cpumanager policy is called "static", while the memorymanager policy is called "Static") So, it seems the time to have an official API the code outside the resource/container manager can query safely, concisely and in a supported way. Signed-off-by: Francesco Romani <[email protected]>
4b6aabc
to
3b54618
Compare
/test pull-kubernetes-node-kubelet-serial-topology-manager |
/test pull-kubernetes-node-crio-cgrpv2-imagevolume-e2e |
/test pull-kubernetes-node-kubelet-serial-containerd-kubetest2 |
/test pull-kubernetes-node-kubelet-serial-containerd |
/test pull-kubernetes-node-kubelet-serial-cpu-manager-kubetest2 |
@ffromani: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/test pull-kubernetes-node-kubelet-serial-topology-manager |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
There are now at least two pieces of code which wants to know if resource managers (CPU, Memory) support exclusive assignment:
Currently, code needing to know if exclusive assignment is available has mostly two options:
So, it seems the time to have an official API the code outside the resource/container manager can query safely, concisely and in a supported way.
Which issue(s) this PR fixes:
Fixes #129531
Special notes for your reviewer:
Prompted by #128727 et. al.
Does this PR introduce a user-facing change?