-
Notifications
You must be signed in to change notification settings - Fork 40.9k
Fix pod and container level swap metrics for CRI #132065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Welcome @yuanwang04! |
Hi @yuanwang04. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/cc |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it also need a addCRIPodSwapUsage
method here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iholder101 I don't see the Swap stats are being populated from the reference to CRI's ListPodSandboxStats function. Do you remember if this is a limitation of the CRI that Swap related data have to be retrieved from the cadvisor / cgroup directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there! Sorry for the late reply.
In #118865, I was adding a comment regarding that:
Special notes for your reviewer:
AFAICT, cadvisor is being used to fetch the metric data. If it is not available through cadvisor, it's possible to fallback into gathering CRI data. Unfortunately, it doesn't seem currently possible to do so (here) since memory usage is not part of the CRI-API. In follow-up work, I think it should be added as well, as I don't see a reason to why CRIs wouldn't be able to provide this information.
While this was true back then, a swapUsage
struct is now part of the CRI-API.
Therefore:
Does it also need a addCRIPodSwapUsage method here?
@ajaysundark, I think so, yes.
Do you remember if this is a limitation of the CRI that Swap related data have to be retrieved from the cadvisor / cgroup directly?
Prior to my work, this wasn't part of the CRI-API. Now it is. I'm not aware of any further limitation.
@haircommander can you please confirm that CRIs are implementing the swapUsage
API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/cri-o/cri-o/blob/155deb668c7b54a0bed35f2e091d4f3067c93208/internal/lib/stats/stats_server_linux.go#L328-L333 looks like CRI-O does, but to return those you'd need to turn on the CRI stats feature, so it doesn't OOTB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haircommander Could you help me understand the status of the CRI stats provider. This feature still seems to be alpha: https://kubernetes.io/docs/reference/instrumentation/cri-pod-container-metrics/?
containerd doesn't seem to support swapUsage in stats yet. I raised a bug to follow-up on this: containerd/containerd#12026.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CRI stats provider is used by default in containerd right now. there are two different halves of it. the legacy "partial" CRI stats (where the remainder are populated by cadvisor) vs the new (2371) full CRI stats. it's possible swap stats are only populated in the "full" case
@haircommander @kannon92 Could you help to review this PR? Thanks! I think we might also want to backport this fix to at least 1.33 so Swap can be monitored appropriately, cc @ajaysundark |
For the record, originally, I wasn't sure if swap should be considered as "CPU and memory" stats. If so, I think this is the right direction 👍 |
I tend to think this should be possible, if we treat this as a bug fix. |
/lgtm |
LGTM label has been added. Git tree hash: 205b8efb109b34df96a8735c9728881234977da7
|
Need a release note |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: SergeyKanzhelev, yuanwang04 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/release-note-edit
|
/retest Error is from resource_quota test, seems irrelevant from this change
|
What type of PR is this?
/kind bug
What this PR does / why we need it:
Fixes the missing pod-level and container-level swap metrics from the CRI provider.
Swap metrics are introduced in #118865. For the CRI provider, the swap data is only scraped from the
listPodStatsPartiallyFromCRI
function. However, the resource metrics are scraped from theListPodCPUAndMemoryStats
function. This PR added the similar code changes to theListPodCPUAndMemoryStats
function so that the swap metrics can be reported correctly.Also added the swap related metrics to the e2e test to assert these metrics exist.
Which issue(s) this PR fixes:
Fixes #131915
Special notes for your reviewer:
Also added the swap metrics to the e2e test suite
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: