Skip to content

node: resourcemanager: API to check exclusive assignment availability #128728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ffromani
Copy link
Contributor

@ffromani ffromani commented Nov 10, 2024

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

There are now at least two pieces of code which wants to know if resource managers (CPU, Memory) support exclusive assignment:

  • VPA, for its internal needs
  • node container manager, to fix cgroup params

Currently, code needing to know if exclusive assignment is available has mostly two options:

  • infer from available APIs. Works, keep somehow the responsability respected, but it's clumsy
  • peeks in the global config, which introduces unnecessary coupling and possibly bugs (e.g. cpumanager policy is called "static", while the memorymanager policy is called "Static")

So, it seems the time to have an official API the code outside the resource/container manager can query safely, concisely and in a supported way.

Which issue(s) this PR fixes:

Fixes #129531

Special notes for your reviewer:

Prompted by #128727 et. al.

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 10, 2024
@ffromani
Copy link
Contributor Author

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. area/kubelet and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 10, 2024
@ffromani
Copy link
Contributor Author

/hold

reviewable, but I'd like to add more tests

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 10, 2024
}
if utilfeature.DefaultFeatureGate.Enabled(features.MemoryManager) {
if kl.containerManager.GetNodeConfig().ExperimentalMemoryManagerPolicy == "static" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @tallclair due to historycal reasons, memory manager policy names are CamelCase (e.g. "Static") so you may have a hidden bug here. My PR wants to remove this class of possible bugs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ffromani , until the API approach is finished #130559 PR tries to fix the hidden bug you have described.

if cm.cpuManager.GetAllocatableCPUs().IsEmpty() {
// Only do so when the cpumanager is not exclusively allocating CPUs, as it will do its own updating of the cpuset when
// CPUs are exclusively allocated.
if !cm.cpuManager.CanAllocateExclusively(v1.ResourceCPU) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is cleaner, LGTM

@@ -2840,17 +2840,13 @@ func isPodResizeInProgress(pod *v1.Pod, podStatus *kubecontainer.PodStatus) bool
// Returns true if the resize can proceed.
func (kl *Kubelet) canResizePod(pod *v1.Pod) (bool, v1.PodResizeStatus) {
if v1qos.GetPodQOS(pod) == v1.PodQOSGuaranteed && !utilfeature.DefaultFeatureGate.Enabled(features.InPlacePodVerticalScalingExclusiveCPUs) {
if utilfeature.DefaultFeatureGate.Enabled(features.CPUManager) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed FGs because since 1.32 both managers are GA

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we need to consider https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#cpu-policy-static--options which some of them are alpha, some beta?

@ffromani ffromani force-pushed the resmrgr-introspection branch 2 times, most recently from ab81fd7 to bf3a165 Compare November 10, 2024 14:51
@ffromani
Copy link
Contributor Author

perhaps the VPA use case can be actually handled by the same API being added #127525 + a similar API for memory management; if so, the point of this PR will be moot. Depends on what VPA code axtually needs to know

@haircommander
Copy link
Contributor

perhaps the VPA use case can be actually handled by the same API being added #127525 + a similar API for memory management; if so, the point of this PR will be moot. Depends on what VPA code axtually needs to know

I am open to either approach, but we should separate the new cgroupManager API from #127525 if we wanted to get it in as a bug fix (seems there's also a featuregate addition in there)

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 12, 2024
@bart0sh
Copy link
Contributor

bart0sh commented Nov 12, 2024

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 12, 2024
@ffromani ffromani force-pushed the resmrgr-introspection branch from bf3a165 to 38482fd Compare January 8, 2025 17:36
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 8, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ffromani
Once this PR has been reviewed and has the lgtm label, please assign klueska for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@esotsal
Copy link
Contributor

esotsal commented Mar 12, 2025

Overall question , have we thought about the tax/impacts in API from resources point of view ? I mean it depends how often it will be called etc but would be nice to have a figure and taking it into account in API dimensioning.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 12, 2025
@ffromani
Copy link
Contributor Author

Overall question , have we thought about the tax/impacts in API from resources point of view ? I mean it depends how often it will be called etc but would be nice to have a figure and taking it into account in API dimensioning.

No, I didn't run benchmark or profiling (do we do in general, as habit? do we do on special cases?). However most of the complexity comes from the need to traverse the layers. The actual check is either extremely cheap (hardcoded constant!) or very cheap (mutex + hashtable lookup)

@esotsal
Copy link
Contributor

esotsal commented Mar 12, 2025

while writing tests, I was checking again the API and I'm inclined to change it a bit. The CanAllocateExclusively API should also take a pod object as its argument. The reason is: some managers only act on Guaranteed QoS pods, and they have this check internally. Failing to take the pod QoS into account in the API moves the burden on the calling code, which is another form of implementation detail leaking and unnecessary coupling which we are trying to avoid.

it is not so easy though. We need to call this functionality also in node_container_manager (one of the two key usecase), where we don't have a pod object handy.

I think it is worth the effort to give it a try

@esotsal
Copy link
Contributor

esotsal commented Mar 12, 2025

/test pull-kubernetes-node-kubelet-podresize
/test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
/test pull-kubernetes-node-kubelet-serial-podresize

@ffromani
Copy link
Contributor Author

while writing tests, I was checking again the API and I'm inclined to change it a bit. The CanAllocateExclusively API should also take a pod object as its argument. The reason is: some managers only act on Guaranteed QoS pods, and they have this check internally. Failing to take the pod QoS into account in the API moves the burden on the calling code, which is another form of implementation detail leaking and unnecessary coupling which we are trying to avoid.

it is not so easy though. We need to call this functionality also in node_container_manager (one of the two key usecase), where we don't have a pod object handy.

I think it is worth the effort to give it a try

uhm, not sure where I can get a pod object from here though.

@ffromani
Copy link
Contributor Author

while writing tests, I was checking again the API and I'm inclined to change it a bit. The CanAllocateExclusively API should also take a pod object as its argument. The reason is: some managers only act on Guaranteed QoS pods, and they have this check internally. Failing to take the pod QoS into account in the API moves the burden on the calling code, which is another form of implementation detail leaking and unnecessary coupling which we are trying to avoid.

it is not so easy though. We need to call this functionality also in node_container_manager (one of the two key usecase), where we don't have a pod object handy.

I think it is worth the effort to give it a try

uhm, not sure where I can get a pod object from here though.

nope, in the node container manager flows there's no meanigful pod objects to be retreived. So we need to keep it like that for the time being.

@ffromani ffromani force-pushed the resmrgr-introspection branch from 8cf6d09 to d31331c Compare March 12, 2025 07:58
@ffromani
Copy link
Contributor Author

/test pull-kubernetes-node-kubelet-podresize
/test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
/test pull-kubernetes-node-kubelet-serial-podresize

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 12, 2025
@ffromani ffromani force-pushed the resmrgr-introspection branch from d31331c to 0b79dc8 Compare March 12, 2025 08:27
@ffromani
Copy link
Contributor Author

/retest-required

@ffromani
Copy link
Contributor Author

/test pull-kubernetes-unit

cache test failure is unlikely (read: is not) this PR's fault

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 18, 2025
@ffromani ffromani force-pushed the resmrgr-introspection branch from 0b79dc8 to 4b6aabc Compare March 18, 2025 09:45
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 18, 2025
There are now at least two pieces of code which wants to know
if resource managers (CPU, Memory) support exclusive assignment:
- VPA, for its internal needs
- node container manager, to fix cgroup params

Currently, code needing to know if exclusive assignment is available
has mostly two options:
- infer from available APIs. Works, keep somehow the responsability
  respected, but it's clumsy
- peeks in the global config, which introduces unnecessary coupling
  and possibly bugs (e.g. cpumanager policy is called "static",
  while the memorymanager policy is called "Static")

So, it seems the time to have an official API the code
outside the resource/container manager can query safely, concisely
and in a supported way.

Signed-off-by: Francesco Romani <[email protected]>
@ffromani ffromani force-pushed the resmrgr-introspection branch from 4b6aabc to 3b54618 Compare March 18, 2025 10:17
@ffromani
Copy link
Contributor Author

/test pull-kubernetes-node-kubelet-serial-topology-manager
/test pull-kubernetes-node-kubelet-serial-topology-manager-kubetest2

@ffromani
Copy link
Contributor Author

/test pull-kubernetes-node-crio-cgrpv2-imagevolume-e2e

@ffromani
Copy link
Contributor Author

/test pull-kubernetes-node-kubelet-serial-containerd-kubetest2

@ffromani
Copy link
Contributor Author

/test pull-kubernetes-node-kubelet-serial-containerd
/test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers
/test pull-kubernetes-node-kubelet-serial-cpu-manager
/test pull-kubernetes-node-kubelet-serial-hugepages
/test pull-kubernetes-node-kubelet-serial-memory-manager
/test pull-kubernetes-node-kubelet-serial-topology-manager

@ffromani
Copy link
Contributor Author

/test pull-kubernetes-node-kubelet-serial-cpu-manager-kubetest2

@k8s-ci-robot
Copy link
Contributor

@ffromani: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-node-kubelet-serial-containerd-kubetest2 3b54618 link false /test pull-kubernetes-node-kubelet-serial-containerd-kubetest2
pull-kubernetes-node-kubelet-serial-memory-manager 3b54618 link false /test pull-kubernetes-node-kubelet-serial-memory-manager
pull-kubernetes-node-kubelet-serial-containerd 3b54618 link false /test pull-kubernetes-node-kubelet-serial-containerd

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ffromani
Copy link
Contributor Author

/test pull-kubernetes-node-kubelet-serial-topology-manager
/test pull-kubernetes-node-kubelet-serial-topology-manager-kubetest2

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 22, 2025
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

[FG:InPlacePodVerticalScaling] avoid checking the configuration of resource managers to learn their expected behavior
6 participants