-
Notifications
You must be signed in to change notification settings - Fork 40.9k
[KEP-2400] Avoid logging that swap cgroup controller is missing for every container #123749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KEP-2400] Avoid logging that swap cgroup controller is missing for every container #123749
Conversation
Signed-off-by: Itamar Holder <[email protected]>
/sig node |
@@ -164,9 +164,9 @@ func (m *kubeGenericRuntimeManager) generateLinuxContainerResources(pod *v1.Pod, | |||
// Swap is only configured if a swap cgroup controller is available and the NodeSwap feature gate is enabled. | |||
func (m *kubeGenericRuntimeManager) configureContainerSwapResources(lcr *runtimeapi.LinuxContainerResources, pod *v1.Pod, container *v1.Container) { | |||
if !swapControllerAvailable() { | |||
klog.InfoS("No swap cgroup controller present", "swapBehavior", m.memorySwapBehavior, "pod", klog.KObj(pod), "containerName", container.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just wondering, would it make sense to keep the log but bump the V level to 5 or more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the one hand I think it does, since it would be much more visible that every container cannot use swap.
On the other hand I think it doesn't, as this log is not providing any new information, just repeats itself for every container.
I guess the question is how we define high visibility mode (i.e. V>=5). Do we expect that logs would be more visible, or do we expect more information that's rarely necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I guess that high visibility is set for debugging, and making this log appear on every container's logs might help while not massively spamming logs. So I tend to think raising V is the right way to go.
@liggitt WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for elaborating! I don't have strong opinions, so by all means I'm fine with removing the log. I'm seeing what a believe is a trend in reporting mismatched configuration betweek kube/kubelet and the system, for example runtime lacking features (runtime too old/disabled), or system missing features like this case. We do log them, and this is fine, but we as SIG should perhaps think about a mechanism to make these conditions more visible. The reason why I'm mentioning this thought is that repeating the log makes at least the condition more visible, and doing at a high log level keeps the spam at bay. But this is a larger conversation and I don't want to drag the conversation too long, so from my PoV the question I had is answered and we can carry along.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still log anything in that regard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabiand Yes indeed. The log here would still fire, but will do so only once:
kubernetes/pkg/kubelet/kuberuntime/kuberuntime_container_linux.go
Lines 343 to 366 in d9c54f6
var swapControllerAvailable = func() bool { | |
// See https://github.com/containerd/containerd/pull/7838/ | |
swapControllerAvailabilityOnce.Do(func() { | |
const warn = "Failed to detect the availability of the swap controller, assuming not available" | |
p := "/sys/fs/cgroup/memory/memory.memsw.limit_in_bytes" | |
if isCgroup2UnifiedMode() { | |
// memory.swap.max does not exist in the cgroup root, so we check /sys/fs/cgroup/<SELF>/memory.swap.max | |
_, unified, err := cgroups.ParseCgroupFileUnified("/proc/self/cgroup") | |
if err != nil { | |
klog.V(5).ErrorS(fmt.Errorf("failed to parse /proc/self/cgroup: %w", err), warn) | |
return | |
} | |
p = filepath.Join("/sys/fs/cgroup", unified, "memory.swap.max") | |
} | |
if _, err := os.Stat(p); err != nil { | |
if !errors.Is(err, os.ErrNotExist) { | |
klog.V(5).ErrorS(err, warn) | |
} | |
return | |
} | |
swapControllerAvailability = true | |
}) | |
return swapControllerAvailability | |
} |
/triage accepted |
/priority important-longterm raising because #123728 (comment) |
@liggitt You asked for this PR as a follow up. I realize I wasn't sure if you meant you wanted this in 1.30 or that we should do this before GA? Either way, I don't think this warrants an exception but wanted your thoughts on priority? |
My read of the code is that this log entry existed outside the feature gate already, so I don't think this has to be in 1.30, but I wouldn't object to it being included. Will defer to node leads to make the call |
/cc @mrunalp @dchen1107 |
/lgtm |
LGTM label has been added. Git tree hash: acaa9fe73132f802eaca12f8b29ebf68f57c2bc5
|
ping @mrunalp @dchen1107 Do we want this in for 1.30? |
ping @mrunalp @dchen1107 Anything missing? |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: iholder101, mrunalp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Before this PR, if swap FG is enabled but cgroup swap controller is missing, kubelet would add a log entry for every container saying
No swap cgroup controller present
.In this PR this log entry is deleted. A log entry would still fire up only once, when the kubelet would first try to configure swap resources, and no more.
Which issue(s) this PR fixes:
Fixes #123728
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: