Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add swap to stats to Summary API and Prometheus endpoints (/stats/summary and /metrics/resource) #118865

Merged

Conversation

iholder101
Copy link
Contributor

@iholder101 iholder101 commented Jun 26, 2023

What type of PR is this?

/kind feature
/sig node
/area kubelet

What this PR does / why we need it:

The Summary API, which is reachable through the stats/summary endpoint, allows to to consume mertics and statistics gathered by Kubelet at the node, volume, pod and container level.

This PR adds current swap usage to the summary API as part of the requirements to graduate swap to Beta1.

In addition, it adds "node_swap_usage_bytes", pod_swap_usage_bytes and container_swap_usage_bytes metrics that would be reachable from /metrics/resource which is also used as a Prometheus endpoint.

Which issue(s) this PR fixes:

Fixes #119424
Fixes #119425

Special notes for your reviewer:

AFAICT, cadvisor is being used to fetch the metric data. If it is not available through cadvisor, it's possible to fallback into gathering CRI data. Unfortunately, it doesn't seem currently possible to do so (here) since memory usage is not part of the CRI-API. In follow-up work, I think it should be added as well, as I don't see a reason to why CRIs wouldn't be able to provide this information.

Does this PR introduce a user-facing change?

Add swap to stats to Summary API and Prometheus endpoints (stats/summary and /metrics/resource).

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#beta-1

Sample output

In order to test this, I've created a pod named test-pod on an environment with NodeSwap feature gate on and UnlimitedSwap.

The /metrics/resource endpoint looks as follows:

> kubectl get --raw "/api/v1/nodes/<NODE-NAME>/proxy/metrics/resource"
# HELP container_cpu_usage_seconds_total [ALPHA] Cumulative cpu time consumed by the container in core-seconds
# TYPE container_cpu_usage_seconds_total counter
container_cpu_usage_seconds_total{container="c1",namespace="default",pod="test-pod"} 128.476535 1687950863878
container_cpu_usage_seconds_total{container="coredns",namespace="kube-system",pod="coredns-8f5847b64-t9gmr"} 0.915609 1687950855483
# HELP container_memory_working_set_bytes [ALPHA] Current working set of the container in bytes
# TYPE container_memory_working_set_bytes gauge
container_memory_working_set_bytes{container="c1",namespace="default",pod="test-pod"} 3.92847388672e+11 1687950863878
container_memory_working_set_bytes{container="coredns",namespace="kube-system",pod="coredns-8f5847b64-t9gmr"} 1.5310848e+07 1687950855483
# HELP container_start_time_seconds [ALPHA] Start time of the container since unix epoch in seconds
# TYPE container_start_time_seconds gauge
container_start_time_seconds{container="c1",namespace="default",pod="test-pod"} 1.6879504889662333e+09 1687950488966
container_start_time_seconds{container="coredns",namespace="kube-system",pod="coredns-8f5847b64-t9gmr"} 1.6879504537882898e+09 1687950453788
# HELP container_swap_usage_bytes [ALPHA] Current container amount of swap usage in bytes
# TYPE container_swap_usage_bytes gauge
container_swap_usage_bytes{container="c1",namespace="default",pod="test-pod"} 3.4400333824e+10 1687950863878
container_swap_usage_bytes{container="coredns",namespace="kube-system",pod="coredns-8f5847b64-t9gmr"} 0 1687950855483
# HELP node_cpu_usage_seconds_total [ALPHA] Cumulative cpu time consumed by the node in core-seconds
# TYPE node_cpu_usage_seconds_total counter
node_cpu_usage_seconds_total 45770.147028 1687950863599
# HELP node_memory_working_set_bytes [ALPHA] Current working set of the node in bytes
# TYPE node_memory_working_set_bytes gauge
node_memory_working_set_bytes 3.95755573248e+11 1687950863599
# HELP node_swap_usage_bytes [ALPHA] Current node swap usage in bytes
# TYPE node_swap_usage_bytes gauge
node_swap_usage_bytes 1.8446743709127774e+19 1687950863599
# HELP pod_cpu_usage_seconds_total [ALPHA] Cumulative cpu time consumed by the pod in core-seconds
# TYPE pod_cpu_usage_seconds_total counter
pod_cpu_usage_seconds_total{namespace="default",pod="test-pod"} 123.291472 1687950858784
pod_cpu_usage_seconds_total{namespace="kube-system",pod="coredns-8f5847b64-t9gmr"} 5.101812 1687950863144
# HELP pod_memory_working_set_bytes [ALPHA] Current working set of the pod in bytes
# TYPE pod_memory_working_set_bytes gauge
pod_memory_working_set_bytes{namespace="default",pod="test-pod"} 3.92474558464e+11 1687950858784
pod_memory_working_set_bytes{namespace="kube-system",pod="coredns-8f5847b64-t9gmr"} 1.5499264e+07 1687950863144
# HELP pod_swap_usage_bytes [ALPHA] Current pod amount of swap usage in bytes
# TYPE pod_swap_usage_bytes gauge
pod_swap_usage_bytes{namespace="default",pod="test-pod"} 3.4379333632e+10 1687950858784
pod_swap_usage_bytes{namespace="kube-system",pod="coredns-8f5847b64-t9gmr"} 0 1687950863144
# HELP scrape_error [ALPHA] 1 if there was an error while getting container metrics, 0 otherwise
# TYPE scrape_error gauge
scrape_error 0

The /stats/summary endpoint looks as follows (some parts were omitted and replaced by ... for simplicity):

> kubectl get --raw "/api/v1/nodes/localhost/proxy/stats/summary"
{
 "node": {
  "nodeName": "localhost",
  "systemContainers": [
   {
    "name": "kubelet",
    "startTime": "2023-06-28T11:07:22Z",
    "cpu": {
     ...
    },
    "memory": {
     "time": "2023-06-28T11:14:20Z",
     "usageBytes": 5760372736,
     "workingSetBytes": 435998720,
     "rssBytes": 364462080,
     "pageFaults": 178945553,
     "majorPageFaults": 183852
    },
    "swap": {
     "time": "2023-06-28T11:14:20Z",
     "swapUsageBytes": 532729856
    }
   },
   {
    "name": "pods",
    "startTime": "2023-06-25T13:15:58Z",
    "cpu": {
     ...
    },
    "memory": {
     "time": "2023-06-28T11:14:23Z",
     "availableBytes": 11100356608,
     "usageBytes": 392888811520,
     "workingSetBytes": 392850468864,
     "rssBytes": 391990575104,
     "pageFaults": 938848963,
     "majorPageFaults": 7139
    },
    "swap": {
     "time": "2023-06-28T11:14:23Z",
     "swapUsageBytes": 34400399360
    }
   }
  ],
  "startTime": "2023-06-25T13:13:08Z",
  "cpu": {
   "time": "2023-06-28T11:14:23Z",
   "usageNanoCores": 10005001255,
   "usageCoreNanoSeconds": 45770147028000
  },
  "memory": {
   "time": "2023-06-28T11:14:23Z",
   "availableBytes": 8195252224,
   "usageBytes": 400979005440,
   "workingSetBytes": 395755573248,
   "rssBytes": 392447672320,
   "pageFaults": 2055388993,
   "majorPageFaults": 263184
  },
  "network": {
    ...
  },
  "fs": {
   ...
  },
  "runtime": {
   ...
  },
  "rlimit": {
   ...
  },
  "swap": {
   "time": "2023-06-28T11:14:23Z",
   "swapAvailableBytes": 407531442176,
   "swapUsageBytes": 18446743709127774208
  }
 },
 "pods": [
  {
   "podRef": {
    "name": "test-pod",
    "namespace": "default",
    "uid": "85cfce3e-0299-4622-8137-4f414e2f3bd1"
   },
   "startTime": "2023-06-28T11:08:08Z",
   "containers": [
    {
     "name": "c1",
     "startTime": "2023-06-28T11:08:08Z",
     "cpu": {
      ...
     },
     "memory": {
      "time": "2023-06-28T11:14:23Z",
      "usageBytes": 392847446016,
      "workingSetBytes": 392847388672,
      "rssBytes": 391990722560,
      "pageFaults": 241860819,
      "majorPageFaults": 114
     },
     "rootfs": {
      ...
     },
     "logs": {
      ...
     },
     "swap": {
      "time": "2023-06-28T11:14:23Z",
      "swapUsageBytes": 34400333824
     }
    }
   ],
   "cpu": {
    ...
   },
   "memory": {
    "time": "2023-06-28T11:14:18Z",
    "usageBytes": 392474583040,
    "workingSetBytes": 392474558464,
    "rssBytes": 391627411456,
    "pageFaults": 241768635,
    "majorPageFaults": 114
   },
   "volume": [
    ...
   ],
   "ephemeral-storage": {
    ...
   },
   "process_stats": {
    "process_count": 0
   },
   "swap": {
    "time": "2023-06-28T11:14:18Z",
    "swapUsageBytes": 34379333632
   }
  },
  {
   "podRef": {
    "name": "coredns-8f5847b64-t9gmr",
    "namespace": "kube-system",
    "uid": "d0d8b989-ac33-4688-8b33-81613acd640c"
   },
   "startTime": "2023-06-28T11:07:33Z",
   "containers": [
    {
     "name": "coredns",
     "startTime": "2023-06-28T11:07:33Z",
     "cpu": {
      ...
     },
     "memory": {
      "time": "2023-06-28T11:14:15Z",
      "availableBytes": 162947072,
      "usageBytes": 15998976,
      "workingSetBytes": 15310848,
      "rssBytes": 13619200,
      "pageFaults": 6318,
      "majorPageFaults": 9
     },
     "rootfs": {
      ...
     },
     "logs": {
      ...
     },
     "swap": {
      "time": "2023-06-28T11:14:15Z",
      "swapUsageBytes": 0
     }
    }
   ],
   "cpu": {
    ...
   },
   "memory": {
    "time": "2023-06-28T11:14:23Z",
    "availableBytes": 162758656,
    "usageBytes": 16191488,
    "workingSetBytes": 15499264,
    "rssBytes": 13729792,
    "pageFaults": 8491,
    "majorPageFaults": 9
   },
   "volume": [
    ...
   ],
   "ephemeral-storage": {
    ...
   },
   "process_stats": {
    "process_count": 0
   },
   "swap": {
    "time": "2023-06-28T11:14:23Z",
    "swapUsageBytes": 0
   }
  }
 ]
}

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. sig/node Categorizes an issue or PR as relevant to SIG Node. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 26, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @iholder101. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jun 26, 2023
@iholder101 iholder101 marked this pull request as ready for review June 26, 2023 10:04
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 26, 2023
@iholder101
Copy link
Contributor Author

/cc @harche @pacoxu

@pacoxu
Copy link
Member

pacoxu commented Jun 26, 2023

Should we add it to /metrics/resource/ as well or only?
As https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/ pointed out, metric-server alter to use it now.

Metrics resource endpoint /metrics/resource in version v0.6.0+ or
Summary API endpoint /stats/summary in older versions

@iholder101
Copy link
Contributor Author

Should we add it to /metrics/resource/ as well or only? As https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/ pointed out, metric-server alter to use it now.

Metrics resource endpoint /metrics/resource in version v0.6.0+ or
Summary API endpoint /stats/summary in older versions

Please correct me if I'm wrong, but IIUC /metrics/resource/ would be populated with the same logic, therefore it's already supported.

You can see that in server.go, these metrics are populated by s.resourceAnalyzer of the ResourceAnalyzer interface. This interface embeds SummaryProvider with Get() and GetCPUAndMemoryStats() methods that eventually use cadvisorInfoToContainerCPUAndMemoryStats(), the function I've implemented support in.

Am I missing something here?

@pacoxu
Copy link
Member

pacoxu commented Jun 26, 2023

If so, this would be valid.

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 26, 2023
@iholder101 iholder101 force-pushed the kubelet/add-swap-to-summary-stats branch from a00c580 to 4671e33 Compare June 26, 2023 11:23
@iholder101 iholder101 marked this pull request as draft June 27, 2023 11:16
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 27, 2023
@swatisehgal
Copy link
Contributor

/triage accepted
/priority important-soon
As this is work is being targeted for 1.28 release and is already captured in the SIG Node planning doc.

@iholder101 iholder101 force-pushed the kubelet/add-swap-to-summary-stats branch from 8a32f45 to 4cb5547 Compare July 17, 2023 23:56
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 17, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 17, 2023
@mrunalp
Copy link
Contributor

mrunalp commented Jul 18, 2023

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 18, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: fc5fc1bd2080a44cb900a0bef5a248111991e3fe

@k8s-ci-robot k8s-ci-robot merged commit b4d793c into kubernetes:master Jul 18, 2023
13 checks passed
SIG Node CI/Test Board automation moved this from Archive-it to Done Jul 18, 2023
SIG Node PR Triage automation moved this from Needs Reviewer to Done Jul 18, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.28 milestone Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project