-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf optimization: GetPodQOS() returns persisted value of PodStatus.QOSClass, if set. #119665
Perf optimization: GetPodQOS() returns persisted value of PodStatus.QOSClass, if set. #119665
Conversation
Please note that we're already in Test Freeze for the Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Sat Jul 29 10:13:20 UTC 2023. |
/test pull-kubernetes-e2e-gce-cos-alpha-features |
/triage accepted |
pkg/apis/core/helper/qos/qos.go
Outdated
if pod.Status.QOSClass != "" { | ||
return pod.Status.QOSClass | ||
} | ||
return GetPodQOS(pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe note that this is much more expensive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even rename to GetPodQOSExpensive() or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or name this GetPodQOS()
and move the old, expensive one to getPodQOSExpensive()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great suggestion, fewer changes! I'll rename the current expensive one as ComputePodQOS() or DeterminePodQOS(), I'll update & retest over the weekend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thockin My last change has addressed this. PTAL, ty
… PodStatus.QOSClass from GetPodQOS.
54c1181
to
4063ca4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @vinaykul!
see one nit below
Co-authored-by: Itamar Holder <[email protected]>
265a085
to
40b604e
Compare
@vinaykul: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
LGTM label has been added. Git tree hash: e13504b92a8276f4ac5ed20efde13c77f414a14f
|
@thockin PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: thockin, vinaykul The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Invoking GetPodQOS is much more heavyweight op compared to reading PodStatus.QOSClass, a value that does not change after initial assignment because QoS class of pods are immutable. Running in-place pod resize e2e tests currently has GetPodQOS showing up in only 0.1% in the samples of kubelet flamegraph with 100 other pods on the node, each pod with 3 Burstable QoS class containers (real world cost may be higher or lower depending on nature of workloads) The intent is to eliminate that 0.1% and keep it that way by (re)setting this precedent in light of KEP 2527 allowing to use status as reliable, persistent source of truth.
Which issue(s) this PR fixes:
Fixes #
partially fixes 109547
Special notes for your reviewer:
In the flamegraph attached below, please see the sampling percentage of SetPodResizeStatus for this e2e pod resize test scenario. If this small optimization trial balloon PR stands as a valid use of KEP 2527, we may also have the case for relying on PodStatus.Resize and PodStatus...AllocatedResources as the reliable and persistent source of truth instead of checkpoint file - a good optimization imho (with special handling for stand-alone kubelet in case we want to support in-place resize for that case)
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
Kubelet flamegraph data: