Skip to content

[KEP-2400] Report swap capacity as part of node.status.nodeSystemInfo #129954

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

iholder101
Copy link
Contributor

@iholder101 iholder101 commented Feb 3, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

Currently, there isn't an easy way to know the swap capacity for each node. It is available as part of Summary API, however it is both hard to read and demands certain permissions to use.

This PR adds the swap capacity into the node status, specifically under node.Status.NodeInfo.Swap.

Example for a node with 40GB swap capacity (trimmed for readability):

> k get node k8s-dev-worker -o yaml
apiVersion: v1
kind: Node
status:
  nodeInfo:
    architecture: amd64
    bootID: 5c5a1d6b-6447-45ac-ba88-b6a0c12c43f0
    containerRuntimeVersion: containerd://1.7.24
    kernelVersion: 6.11.9-100.fc39.x86_64
    kubeProxyVersion: v1.32.0
    kubeletVersion: v1.32.0
    machineID: 3050fbb203394830ad38c698ab0e261d
    operatingSystem: linux
    osImage: Debian GNU/Linux 12 (bookworm)
    swap:
      capacity: 42949664768
    systemUUID: 3ca8f8fd-1d9f-4bbc-99cb-d07cf9242484

Special notes for your reviewer:

I've tried to address this need in multiple past attempts.
In the first one, #123963, I've tried to add swap as another ReasourceName, hence to display it as part of kubectl describe node's status. This attempt was eventually denied with the main concern being that adding swap as a ResourceName is confusing since it cannot be directly/explicitly consumed (since the swap limit is auto-calculated).

In the second one, #125278, I've tried to add a swap condition to nodes. This PR was also denied with the main concern being that a condition is not suited to report such data, and that a field is recommended in this case.

I have also created another PR #129458 to add swap stats to kubectl top subcommands. However, this PR is currently blocked since it seems that using Summary API is problematic in terms of permissions, demonstrating again that a field for swap capacity is needed.

Does this PR introduce a user-facing change?

Start reporting swap capacity as part of node.status.nodeSystemInfo.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/code-generation area/kubelet kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 3, 2025
@k8s-ci-robot k8s-ci-robot requested review from caesarxuchao and deads2k and removed request for caesarxuchao and deads2k February 3, 2025 09:49
@iholder101 iholder101 force-pushed the swap/capacity-on-node-sys-info branch 4 times, most recently from fa23791 to 732802a Compare February 4, 2025 12:11
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Feb 4, 2025
@iholder101 iholder101 marked this pull request as ready for review February 4, 2025 12:50
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 4, 2025
@iholder101
Copy link
Contributor Author

@iholder101: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-unit 353dc8b link true /test pull-kubernetes-unit
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Unit tests fail locally on the master branch and do not seem related:

> make test WHAT=./staging/src/k8s.io/client-go/tools/cache/
go version go1.24.0 linux/amd64
+++ [0312 15:14:20] Set GOMAXPROCS automatically to 40
+++ [0312 15:14:20] Normalizing Go targets
+++ [0312 15:14:20] Running tests without code coverage and with -race
FAIL
FAIL	k8s.io/client-go/tools/cache	54.013s

=== Failed
=== FAIL: k8s.io/client-go/tools/cache TestGenericListerListMethod (0.00s)
    listers_test.go:126: unexpected object has been returned expected = [0xc000a64050 0xc000a64058 0xc000a64060] actual = [0xc000a64048 0xc000a64038 0xc000a64040], diff =   []runtime.Object{
        + 	&unstructured.Unstructured{
        + 		Object: map[string]any{
        + 			"apiVersion": string("group/version"),
        + 			"kind":       string("TheKind"),
        + 			"metadata":   map[string]any{"name": string("name-bar"), "namespace": string("ns-bar")},
        + 		},
        + 	},
          	&unstructured.Unstructured{Object: {"apiVersion": string("group/version"), "kind": string("TheKind"), "metadata": map[string]any{"name": string("name-foo"), "namespace": string("ns-foo")}}},
          	&unstructured.Unstructured{Object: {"apiVersion": string("group/version"), "kind": string("TheKind"), "metadata": map[string]any{"name": string("name-foo1"), "namespace": string("ns-foo")}}},
        - 	&unstructured.Unstructured{
        - 		Object: map[string]any{
        - 			"apiVersion": string("group/version"),
        - 			"kind":       string("TheKind"),
        - 			"metadata":   map[string]any{"name": string("name-bar"), "namespace": string("ns-bar")},
        - 		},
        - 	},
          }

@serathius
Copy link
Contributor

Bisected to #125102

@thockin
Copy link
Member

thockin commented Mar 12, 2025

Shouldn't this have some validation? Is it OK to set 0 here? Is it OK to set a negative number? It looks like node.status validation is sloppy, that doesn't mean we should make it worse.

Whatever the validation rules are should be commented on the field and enforced in pkg/apis/core/validation/validation.go

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 13, 2025
@iholder101 iholder101 force-pushed the swap/capacity-on-node-sys-info branch from 7534356 to 42efd31 Compare March 13, 2025 10:27
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 13, 2025
@iholder101
Copy link
Contributor Author

Shouldn't this have some validation?

Yeah, I've added some. Thanks!

Is it OK to set 0 here? Is it OK to set a negative number?

If the capacity is 0 kubelet will leave the field to be nil.
So both 0 and negative numbers should be invalid.

Whatever the validation rules are should be commented on the field and enforced in pkg/apis/core/validation/validation.go

Thanks for the pointer!
Done (see last commit), PTAL 🙏

@iholder101
Copy link
Contributor Author

Failures look like CI outage.
/retest

makeNode := func(swapCapacity *int64) core.Node {
node := makeNode("test-node", nil)
if swapCapacity != nil {
node.Status.NodeInfo.Swap = &core.NodeSwapStatus{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about when node.Status.NodeInfo.Swap == nil ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you can reframe it as

makeNode := func(swap *NodeSwapStatus) *Node { ... }
makeSwapStatus := func(capacity int64) *NodeSwapStatus { ... }

and then the testcases are:

makeNode(nil) // nil swap status
makeNode(&NodeSwapStatus{}) // nil capacity
makeNode(makeSwapStatus(123456))
makeNode(makeSwapStatus(0))
makeNode(makeSwapStatus(-123456))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thockin Done. PTAL.

@iholder101 iholder101 force-pushed the swap/capacity-on-node-sys-info branch from 42efd31 to 8ae0b0d Compare March 16, 2025 09:59
@iholder101 iholder101 force-pushed the swap/capacity-on-node-sys-info branch from 8ae0b0d to 1ae091e Compare March 16, 2025 09:59
@thockin
Copy link
Member

thockin commented Mar 17, 2025

Thanks!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 17, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: ffe61db0e42a5a8e878f6b077438282d83703ae0

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, iholder101, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 17, 2025
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@pacoxu
Copy link
Member

pacoxu commented Mar 18, 2025

/test pull-kubernetes-integration

@k8s-ci-robot k8s-ci-robot merged commit 2499a2c into kubernetes:master Mar 18, 2025
16 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.33 milestone Mar 18, 2025
@github-project-automation github-project-automation bot moved this from Needs Reviewer to Done in SIG Node: code and documentation PRs Mar 18, 2025
@github-project-automation github-project-automation bot moved this from Needs Triage to Done in SIG Apps Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. area/code-generation area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: Assigned
Archived in project
Development

Successfully merging this pull request may close these issues.