[KEP-2400] Avoid logging that swap cgroup controller is missing for every container #123749

iholder101 · 2024-03-06T11:59:41Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Before this PR, if swap FG is enabled but cgroup swap controller is missing, kubelet would add a log entry for every container saying No swap cgroup controller present.

In this PR this log entry is deleted. A log entry would still fire up only once, when the kubelet would first try to configure swap resources, and no more.

Which issue(s) this PR fixes:

Fixes #123728

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md

Signed-off-by: Itamar Holder <[email protected]>

iholder101 · 2024-03-06T12:02:48Z

/sig node

ffromani · 2024-03-06T15:47:30Z

pkg/kubelet/kuberuntime/kuberuntime_container_linux.go

@@ -164,9 +164,9 @@ func (m *kubeGenericRuntimeManager) generateLinuxContainerResources(pod *v1.Pod,
 // Swap is only configured if a swap cgroup controller is available and the NodeSwap feature gate is enabled.
 func (m *kubeGenericRuntimeManager) configureContainerSwapResources(lcr *runtimeapi.LinuxContainerResources, pod *v1.Pod, container *v1.Container) {
 	if !swapControllerAvailable() {
-		klog.InfoS("No swap cgroup controller present", "swapBehavior", m.memorySwapBehavior, "pod", klog.KObj(pod), "containerName", container.Name)


just wondering, would it make sense to keep the log but bump the V level to 5 or more?

On the one hand I think it does, since it would be much more visible that every container cannot use swap.
On the other hand I think it doesn't, as this log is not providing any new information, just repeats itself for every container.

I guess the question is how we define high visibility mode (i.e. V>=5). Do we expect that logs would be more visible, or do we expect more information that's rarely necessary?

But I guess that high visibility is set for debugging, and making this log appear on every container's logs might help while not massively spamming logs. So I tend to think raising V is the right way to go.

@liggitt WDYT?

Thanks for elaborating! I don't have strong opinions, so by all means I'm fine with removing the log. I'm seeing what a believe is a trend in reporting mismatched configuration betweek kube/kubelet and the system, for example runtime lacking features (runtime too old/disabled), or system missing features like this case. We do log them, and this is fine, but we as SIG should perhaps think about a mechanism to make these conditions more visible. The reason why I'm mentioning this thought is that repeating the log makes at least the condition more visible, and doing at a high log level keeps the spam at bay. But this is a larger conversation and I don't want to drag the conversation too long, so from my PoV the question I had is answered and we can carry along.

Do we still log anything in that regard?

@fabiand Yes indeed. The log here would still fire, but will do so only once:

kubernetes/pkg/kubelet/kuberuntime/kuberuntime_container_linux.go

Lines 343 to 366 in d9c54f6

var swapControllerAvailable = func() bool {

// See https://github.com/containerd/containerd/pull/7838/

swapControllerAvailabilityOnce.Do(func() {

const warn = "Failed to detect the availability of the swap controller, assuming not available"

p := "/sys/fs/cgroup/memory/memory.memsw.limit_in_bytes"

if isCgroup2UnifiedMode() {

// memory.swap.max does not exist in the cgroup root, so we check /sys/fs/cgroup/<SELF>/memory.swap.max

_, unified, err := cgroups.ParseCgroupFileUnified("/proc/self/cgroup")

if err != nil {

klog.V(5).ErrorS(fmt.Errorf("failed to parse /proc/self/cgroup: %w", err), warn)

return

}

p = filepath.Join("/sys/fs/cgroup", unified, "memory.swap.max")

}

if _, err := os.Stat(p); err != nil {

if !errors.Is(err, os.ErrNotExist) {

klog.V(5).ErrorS(err, warn)

}

return

}

swapControllerAvailability = true

})

return swapControllerAvailability

}

ffromani · 2024-03-06T15:47:53Z

/triage accepted
/priority backlog

ffromani · 2024-03-06T15:49:13Z

/priority important-longterm

raising because #123728 (comment)

kannon92 · 2024-03-06T21:08:58Z

@liggitt You asked for this PR as a follow up. I realize I wasn't sure if you meant you wanted this in 1.30 or that we should do this before GA?

Either way, I don't think this warrants an exception but wanted your thoughts on priority?

liggitt · 2024-03-06T21:11:45Z

Before this PR, if swap FG is enabled but cgroup swap controller is missing, kubelet would add a log entry for every container saying No swap cgroup controller present.

My read of the code is that this log entry existed outside the feature gate already, so I don't think this has to be in 1.30, but I wouldn't object to it being included. Will defer to node leads to make the call

kannon92 · 2024-03-06T22:00:47Z

/cc @mrunalp @dchen1107

ffromani · 2024-03-07T06:48:21Z

/lgtm

k8s-ci-robot · 2024-03-07T06:48:26Z

LGTM label has been added.

Git tree hash: acaa9fe73132f802eaca12f8b29ebf68f57c2bc5

iholder101 · 2024-03-17T10:52:44Z

ping @mrunalp @dchen1107

Do we want this in for 1.30?

iholder101 · 2024-04-07T07:41:47Z

ping @mrunalp @dchen1107

Anything missing?

k8s-ci-robot · 2024-04-19T20:53:51Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: iholder101, mrunalp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/kubelet/OWNERS~~ [mrunalp]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-triage-robot · 2024-04-19T22:53:52Z

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

The PR does have any do-not-merge/* labels
The PR does not have the needs-ok-to-test label
The PR is mergeable (does not have a needs-rebase label)
The PR is approved (has cncf-cla: yes, lgtm, approved labels)
The PR is failing tests required for merge

You can:

Review the full test history for this PR
Prevent this bot from retesting with /lgtm cancel or /hold
Help make our tests less flaky by following our Flaky Tests Guide

/retest

Avoid logging that swap cgroup controller is missing for every container

f6e537d

Signed-off-by: Itamar Holder <[email protected]>

k8s-ci-robot requested review from derekwaynecarr and odinuge March 6, 2024 12:00

ffromani reviewed Mar 6, 2024

View reviewed changes

k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Mar 6, 2024

k8s-ci-robot requested review from dchen1107 and mrunalp March 6, 2024 22:00

k8s-ci-robot assigned ffromani Mar 7, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 7, 2024

ffromani mentioned this pull request Mar 7, 2024

RFE: add more node conditions to reflect missing node features #123790

Open

mrunalp approved these changes Apr 19, 2024

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 19, 2024

k8s-ci-robot merged commit 7f68d01 into kubernetes:master Apr 20, 2024

k8s-ci-robot added this to the v1.31 milestone Apr 20, 2024

pacoxu mentioned this pull request Aug 9, 2024

Node memory swap support kubernetes/enhancements#2400

Open

69 tasks

iholder101 changed the title ~~[KEP2400] Avoid logging that swap cgroup controller is missing for every container~~ [KEP-2400] Avoid logging that swap cgroup controller is missing for every container Feb 2, 2025

	var swapControllerAvailable = func() bool {
	// See https://github.com/containerd/containerd/pull/7838/
	swapControllerAvailabilityOnce.Do(func() {
	const warn = "Failed to detect the availability of the swap controller, assuming not available"
	p := "/sys/fs/cgroup/memory/memory.memsw.limit_in_bytes"
	if isCgroup2UnifiedMode() {
	// memory.swap.max does not exist in the cgroup root, so we check /sys/fs/cgroup/<SELF>/memory.swap.max
	_, unified, err := cgroups.ParseCgroupFileUnified("/proc/self/cgroup")
	if err != nil {
	klog.V(5).ErrorS(fmt.Errorf("failed to parse /proc/self/cgroup: %w", err), warn)
	return
	}
	p = filepath.Join("/sys/fs/cgroup", unified, "memory.swap.max")
	}
	if _, err := os.Stat(p); err != nil {
	if !errors.Is(err, os.ErrNotExist) {
	klog.V(5).ErrorS(err, warn)
	}
	return
	}
	swapControllerAvailability = true
	})
	return swapControllerAvailability
	}

[KEP-2400] Avoid logging that swap cgroup controller is missing for every container #123749

[KEP-2400] Avoid logging that swap cgroup controller is missing for every container #123749

Uh oh!

Conversation

iholder101 commented Mar 6, 2024

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

iholder101 commented Mar 6, 2024

Uh oh!

ffromani Mar 6, 2024

Choose a reason for hiding this comment

Uh oh!

iholder101 Mar 7, 2024

Choose a reason for hiding this comment

Uh oh!

iholder101 Mar 7, 2024

Choose a reason for hiding this comment

Uh oh!

ffromani Mar 7, 2024

Choose a reason for hiding this comment

Uh oh!

fabiand Apr 3, 2024

Choose a reason for hiding this comment

Uh oh!

iholder101 Apr 4, 2024

Choose a reason for hiding this comment

Uh oh!

ffromani commented Mar 6, 2024

Uh oh!

ffromani commented Mar 6, 2024

Uh oh!

kannon92 commented Mar 6, 2024

Uh oh!

liggitt commented Mar 6, 2024

Uh oh!

kannon92 commented Mar 6, 2024

Uh oh!

ffromani commented Mar 7, 2024

Uh oh!

k8s-ci-robot commented Mar 7, 2024

Uh oh!

iholder101 commented Mar 17, 2024

Uh oh!

iholder101 commented Apr 7, 2024

Uh oh!

k8s-ci-robot commented Apr 19, 2024

Uh oh!

k8s-triage-robot commented Apr 19, 2024

Uh oh!

Uh oh!