Skip to content

[KEP-2400] [POC] [NOT FOR MERGE]: Swap-aware memory eviction with no additional APIs #129223

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

iholder101
Copy link
Contributor

What type of PR is this?

PR is still in a testing phase. Please do NOT review until it gets out of draft.

/kind feature

What this PR does / why we need it:

To be filled.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 15, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Dec 15, 2024
@k8s-ci-robot k8s-ci-robot added area/kubelet area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 15, 2024
@iholder101
Copy link
Contributor Author

/test ?

@k8s-ci-robot
Copy link
Contributor

@iholder101: The following commands are available to trigger required jobs:

  • /test pull-cos-containerd-e2e-ubuntu-gce
  • /test pull-kubernetes-conformance-kind-ga-only-parallel
  • /test pull-kubernetes-coverage-unit
  • /test pull-kubernetes-dependencies
  • /test pull-kubernetes-dependencies-go-canary
  • /test pull-kubernetes-e2e-gce
  • /test pull-kubernetes-e2e-gce-100-performance
  • /test pull-kubernetes-e2e-gce-cos
  • /test pull-kubernetes-e2e-gce-cos-canary
  • /test pull-kubernetes-e2e-gce-cos-no-stage
  • /test pull-kubernetes-e2e-gce-network-proxy-http-connect
  • /test pull-kubernetes-e2e-gce-pull-through-cache
  • /test pull-kubernetes-e2e-kind
  • /test pull-kubernetes-e2e-kind-ipv6
  • /test pull-kubernetes-integration
  • /test pull-kubernetes-integration-go-canary
  • /test pull-kubernetes-kubemark-e2e-gce-scale
  • /test pull-kubernetes-node-e2e-containerd
  • /test pull-kubernetes-node-e2e-containerd-1-7-dra
  • /test pull-kubernetes-typecheck
  • /test pull-kubernetes-unit
  • /test pull-kubernetes-unit-go-canary
  • /test pull-kubernetes-update
  • /test pull-kubernetes-verify
  • /test pull-kubernetes-verify-go-canary

The following commands are available to trigger optional jobs:

  • /test check-dependency-stats
  • /test pull-crio-cgroupv1-node-e2e-eviction
  • /test pull-crio-cgroupv1-node-e2e-eviction-kubetest2
  • /test pull-crio-cgroupv1-node-e2e-features
  • /test pull-crio-cgroupv1-node-e2e-features-kubetest2
  • /test pull-crio-cgroupv1-node-e2e-hugepages
  • /test pull-crio-cgroupv1-node-e2e-hugepages-kubetest2
  • /test pull-crio-cgroupv1-node-e2e-resource-managers
  • /test pull-crio-cgroupv1-node-e2e-resource-managers-kubetest2
  • /test pull-crio-cgroupv2-imagefs-separatedisktest
  • /test pull-crio-cgroupv2-imagefs-separatedisktest-kubetest2
  • /test pull-crio-cgroupv2-node-e2e-eviction
  • /test pull-crio-cgroupv2-node-e2e-eviction-kubetest2
  • /test pull-crio-cgroupv2-node-e2e-hugepages
  • /test pull-crio-cgroupv2-node-e2e-hugepages-kubetest2
  • /test pull-crio-cgroupv2-node-e2e-resource-managers
  • /test pull-crio-cgroupv2-node-e2e-resource-managers-kubetest2
  • /test pull-crio-cgroupv2-splitfs-separate-disk
  • /test pull-crio-cgroupv2-splitfs-separate-disk-kubetest2
  • /test pull-e2e-gce-cloud-provider-disabled
  • /test pull-e2e-gci-gce-alpha-enabled-default
  • /test pull-kubernetes-apidiff
  • /test pull-kubernetes-apidiff-client-go
  • /test pull-kubernetes-conformance-image-test
  • /test pull-kubernetes-conformance-kind-ga-only
  • /test pull-kubernetes-conformance-kind-ipv6-parallel
  • /test pull-kubernetes-cos-cgroupv1-containerd-node-e2e
  • /test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-eviction
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
  • /test pull-kubernetes-crio-node-memoryqos-cgrpv2
  • /test pull-kubernetes-crio-node-memoryqos-cgrpv2-kubetest2
  • /test pull-kubernetes-cross
  • /test pull-kubernetes-e2e-autoscaling-hpa-cm
  • /test pull-kubernetes-e2e-autoscaling-hpa-cpu
  • /test pull-kubernetes-e2e-capz-azure-disk
  • /test pull-kubernetes-e2e-capz-azure-disk-vmss
  • /test pull-kubernetes-e2e-capz-azure-disk-windows
  • /test pull-kubernetes-e2e-capz-azure-file
  • /test pull-kubernetes-e2e-capz-azure-file-vmss
  • /test pull-kubernetes-e2e-capz-azure-file-windows
  • /test pull-kubernetes-e2e-capz-conformance
  • /test pull-kubernetes-e2e-capz-master-windows-nodelogquery
  • /test pull-kubernetes-e2e-capz-windows-alpha-feature-vpa
  • /test pull-kubernetes-e2e-capz-windows-alpha-features
  • /test pull-kubernetes-e2e-capz-windows-master
  • /test pull-kubernetes-e2e-capz-windows-serial-slow
  • /test pull-kubernetes-e2e-capz-windows-serial-slow-hpa
  • /test pull-kubernetes-e2e-containerd-gce
  • /test pull-kubernetes-e2e-ec2
  • /test pull-kubernetes-e2e-ec2-arm64
  • /test pull-kubernetes-e2e-ec2-conformance
  • /test pull-kubernetes-e2e-ec2-conformance-arm64
  • /test pull-kubernetes-e2e-ec2-device-plugin-gpu
  • /test pull-kubernetes-e2e-gce-canary
  • /test pull-kubernetes-e2e-gce-correctness
  • /test pull-kubernetes-e2e-gce-cos-alpha-features
  • /test pull-kubernetes-e2e-gce-csi-serial
  • /test pull-kubernetes-e2e-gce-device-plugin-gpu
  • /test pull-kubernetes-e2e-gce-disruptive-canary
  • /test pull-kubernetes-e2e-gce-kubelet-credential-provider
  • /test pull-kubernetes-e2e-gce-network-policies
  • /test pull-kubernetes-e2e-gce-network-proxy-grpc
  • /test pull-kubernetes-e2e-gce-serial
  • /test pull-kubernetes-e2e-gce-serial-canary
  • /test pull-kubernetes-e2e-gce-storage-disruptive
  • /test pull-kubernetes-e2e-gce-storage-selinux
  • /test pull-kubernetes-e2e-gce-storage-slow
  • /test pull-kubernetes-e2e-gce-storage-snapshot
  • /test pull-kubernetes-e2e-gci-gce-autoscaling
  • /test pull-kubernetes-e2e-gci-gce-ingress
  • /test pull-kubernetes-e2e-gci-gce-ipvs
  • /test pull-kubernetes-e2e-gci-gce-nftables
  • /test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
  • /test pull-kubernetes-e2e-kind-alpha-beta-features
  • /test pull-kubernetes-e2e-kind-alpha-features
  • /test pull-kubernetes-e2e-kind-beta-features
  • /test pull-kubernetes-e2e-kind-canary
  • /test pull-kubernetes-e2e-kind-cloud-provider-loadbalancer
  • /test pull-kubernetes-e2e-kind-dual-canary
  • /test pull-kubernetes-e2e-kind-evented-pleg
  • /test pull-kubernetes-e2e-kind-ipv6-canary
  • /test pull-kubernetes-e2e-kind-ipvs
  • /test pull-kubernetes-e2e-kind-kms
  • /test pull-kubernetes-e2e-kind-multizone
  • /test pull-kubernetes-e2e-kind-nftables
  • /test pull-kubernetes-e2e-relaxed-environment-variable-validation
  • /test pull-kubernetes-e2e-storage-kind-alpha-beta-features
  • /test pull-kubernetes-e2e-storage-kind-disruptive
  • /test pull-kubernetes-e2e-storage-kind-volume-group-snapshots
  • /test pull-kubernetes-kind-dra
  • /test pull-kubernetes-kind-dra-all
  • /test pull-kubernetes-kind-json-logging
  • /test pull-kubernetes-kind-text-logging
  • /test pull-kubernetes-kubemark-e2e-gce-big
  • /test pull-kubernetes-linter-hints
  • /test pull-kubernetes-local-e2e
  • /test pull-kubernetes-node-arm64-e2e-containerd-ec2
  • /test pull-kubernetes-node-arm64-e2e-containerd-serial-ec2
  • /test pull-kubernetes-node-arm64-ubuntu-serial-gce
  • /test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
  • /test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e-kubetest2
  • /test pull-kubernetes-node-crio-cgrpv2-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-e2e-kubetest2
  • /test pull-kubernetes-node-crio-cgrpv2-imagefs-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-imagefs-e2e-kubetest2
  • /test pull-kubernetes-node-crio-cgrpv2-imagevolume-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-imagevolume-e2e-kubetest2
  • /test pull-kubernetes-node-crio-cgrpv2-splitfs-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-splitfs-e2e-kubetest2
  • /test pull-kubernetes-node-crio-cgrpv2-userns-e2e-serial
  • /test pull-kubernetes-node-crio-cgrpv2-userns-e2e-serial-kubetest2
  • /test pull-kubernetes-node-crio-e2e
  • /test pull-kubernetes-node-crio-e2e-kubetest2
  • /test pull-kubernetes-node-e2e-alpha-ec2
  • /test pull-kubernetes-node-e2e-containerd-alpha-features
  • /test pull-kubernetes-node-e2e-containerd-ec2
  • /test pull-kubernetes-node-e2e-containerd-features
  • /test pull-kubernetes-node-e2e-containerd-features-kubetest2
  • /test pull-kubernetes-node-e2e-containerd-kubetest2
  • /test pull-kubernetes-node-e2e-containerd-serial-ec2
  • /test pull-kubernetes-node-e2e-containerd-serial-ec2-eks
  • /test pull-kubernetes-node-e2e-containerd-standalone-mode
  • /test pull-kubernetes-node-e2e-containerd-standalone-mode-all-alpha
  • /test pull-kubernetes-node-e2e-cri-proxy-serial
  • /test pull-kubernetes-node-e2e-crio-cgrpv1-dra
  • /test pull-kubernetes-node-e2e-crio-cgrpv1-dra-kubetest2
  • /test pull-kubernetes-node-e2e-crio-cgrpv2-dra
  • /test pull-kubernetes-node-e2e-crio-cgrpv2-dra-kubetest2
  • /test pull-kubernetes-node-e2e-resource-health-status
  • /test pull-kubernetes-node-kubelet-containerd-flaky
  • /test pull-kubernetes-node-kubelet-credential-provider
  • /test pull-kubernetes-node-kubelet-serial-containerd
  • /test pull-kubernetes-node-kubelet-serial-containerd-alpha-features
  • /test pull-kubernetes-node-kubelet-serial-containerd-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers
  • /test pull-kubernetes-node-kubelet-serial-cpu-manager
  • /test pull-kubernetes-node-kubelet-serial-cpu-manager-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv1
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv1-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv2
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv2-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-hugepages
  • /test pull-kubernetes-node-kubelet-serial-memory-manager
  • /test pull-kubernetes-node-kubelet-serial-podresize
  • /test pull-kubernetes-node-kubelet-serial-podresources
  • /test pull-kubernetes-node-kubelet-serial-topology-manager
  • /test pull-kubernetes-node-kubelet-serial-topology-manager-kubetest2
  • /test pull-kubernetes-node-swap-conformance-fedora-serial
  • /test pull-kubernetes-node-swap-conformance-ubuntu-serial
  • /test pull-kubernetes-node-swap-fedora
  • /test pull-kubernetes-node-swap-fedora-serial
  • /test pull-kubernetes-node-swap-ubuntu-serial
  • /test pull-kubernetes-scheduler-perf
  • /test pull-kubernetes-unit-experimental
  • /test pull-publishing-bot-validate

Use /test all to run the following jobs that were automatically triggered:

  • pull-kubernetes-conformance-kind-ga-only-parallel
  • pull-kubernetes-conformance-kind-ipv6-parallel
  • pull-kubernetes-dependencies
  • pull-kubernetes-e2e-ec2
  • pull-kubernetes-e2e-ec2-conformance
  • pull-kubernetes-e2e-gce
  • pull-kubernetes-e2e-gce-canary
  • pull-kubernetes-e2e-kind
  • pull-kubernetes-e2e-kind-ipv6
  • pull-kubernetes-integration
  • pull-kubernetes-linter-hints
  • pull-kubernetes-node-e2e-containerd
  • pull-kubernetes-typecheck
  • pull-kubernetes-unit
  • pull-kubernetes-verify

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@iholder101 iholder101 force-pushed the POC/swap-evictions-no-api branch 2 times, most recently from 5463864 to b9b5902 Compare December 16, 2024 08:38
@iholder101
Copy link
Contributor Author

PR is not ready for review yet.

/uncc @bobbypage @deads2k

@iholder101 iholder101 force-pushed the POC/swap-evictions-no-api branch 6 times, most recently from fa066e6 to dc4a5c2 Compare December 19, 2024 10:52
@iholder101 iholder101 force-pushed the POC/swap-evictions-no-api branch 2 times, most recently from 2de54de to fe71417 Compare February 26, 2025 12:25
@iholder101
Copy link
Contributor Author

/test pull-kubernetes-node-swap-conformance-fedora-serial
/test pull-kubernetes-node-swap-conformance-ubuntu-serial
/test pull-kubernetes-unit
/test pull-kubernetes-typecheck
/test pull-kubernetes-verify
/test pull-kubernetes-node-e2e-containerd

@haircommander haircommander moved this from Triage to Archive-it in SIG Node CI/Test Board Feb 26, 2025
@iholder101 iholder101 force-pushed the POC/swap-evictions-no-api branch 3 times, most recently from 1ee6584 to 0b1a7b0 Compare February 27, 2025 12:00
@iholder101
Copy link
Contributor Author

/test pull-kubernetes-node-swap-conformance-fedora-serial
/test pull-kubernetes-node-swap-conformance-ubuntu-serial
/test pull-kubernetes-unit
/test pull-kubernetes-typecheck
/test pull-kubernetes-verify
/test pull-kubernetes-node-e2e-containerd

@iholder101 iholder101 force-pushed the POC/swap-evictions-no-api branch from 0b1a7b0 to 978fe22 Compare February 27, 2025 18:13
@iholder101
Copy link
Contributor Author

/test pull-kubernetes-node-swap-conformance-fedora-serial
/test pull-kubernetes-node-swap-conformance-ubuntu-serial
/test pull-kubernetes-unit
/test pull-kubernetes-typecheck
/test pull-kubernetes-verify
/test pull-kubernetes-node-e2e-containerd

1 similar comment
@iholder101
Copy link
Contributor Author

/test pull-kubernetes-node-swap-conformance-fedora-serial
/test pull-kubernetes-node-swap-conformance-ubuntu-serial
/test pull-kubernetes-unit
/test pull-kubernetes-typecheck
/test pull-kubernetes-verify
/test pull-kubernetes-node-e2e-containerd

@iholder101 iholder101 force-pushed the POC/swap-evictions-no-api branch from 8926ab8 to 81bca32 Compare February 28, 2025 10:58
This small refactor:
- Adds swap log statistics.
- Adds a post pressure validation function.
- Adds a pre pods modification function.

The later can be used in order to perform
validation on the given pod right after a
pressure condition is met, or to perform
changes to pods before creation.

Signed-off-by: Itamar Holder <[email protected]>
Signed-off-by: Itamar Holder <[email protected]>
Turn swap on and off and sleep for a few seconds.
This helps stabalize the tests which will start
with a fresh unused swap space.

Signed-off-by: Itamar Holder <[email protected]>
Accessible swap is the amount of swap that is
accessible by Kubernetes pods, according to the
LimitedSwap swap behavior.

This PR treats accessible swap as an additional node
memory capacity in the context of the eviction manager
making memory signal observesations.

Signed-off-by: Itamar Holder <[email protected]>
@iholder101 iholder101 force-pushed the POC/swap-evictions-no-api branch from 81bca32 to e8c2306 Compare February 28, 2025 10:58
@iholder101
Copy link
Contributor Author

/test pull-kubernetes-node-swap-conformance-fedora-serial
/test pull-kubernetes-node-swap-conformance-ubuntu-serial
/test pull-kubernetes-unit
/test pull-kubernetes-typecheck
/test pull-kubernetes-verify
/test pull-kubernetes-node-e2e-containerd

This PR treats used swap memory as an additional
memory usage in the context of the eviction manager
ranking pods for eviction.

In addition, a pod's accessible swap is considered
as an additional memory request.

Signed-off-by: Itamar Holder <[email protected]>
@iholder101 iholder101 force-pushed the POC/swap-evictions-no-api branch from e8c2306 to 24797ab Compare February 28, 2025 12:27
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 22, 2025
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 24, 2025
@bart0sh bart0sh moved this from Triage to Work in progress in SIG Node: code and documentation PRs Jun 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants