-
Notifications
You must be signed in to change notification settings - Fork 40.9k
Remove file used for termination log #121181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Hi @Chaunceyctx. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Can you add some tests for it? maybe it should be an e2e test. |
/ok-to-test |
thanks, i will complete it |
/triage accepted |
@Chaunceyctx This looks like a user-facing change. Can you add a release note? |
/cc |
/cc @luozhiwenn |
/test pull-e2e-gci-gce-alpha-enabled-default |
subHostPath := filepath.Join(config.DefaultKubeletPodsDirName, string(labeledInfo.PodUID), config.DefaultKubeletContainersDirName, labeledInfo.ContainerName) | ||
if ok && utilfeature.DefaultFeatureGate.Enabled(features.TerminationLogFileCleanup) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subHostPath := filepath.Join(config.DefaultKubeletPodsDirName, string(labeledInfo.PodUID), config.DefaultKubeletContainersDirName, labeledInfo.ContainerName) | |
if ok && utilfeature.DefaultFeatureGate.Enabled(features.TerminationLogFileCleanup) { | |
if ok && utilfeature.DefaultFeatureGate.Enabled(features.TerminationLogFileCleanup) { | |
subHostPath := filepath.Join(config.DefaultKubeletPodsDirName, string(labeledInfo.PodUID), config.DefaultKubeletContainersDirName, labeledInfo.ContainerName) | |
subHostPath
is only used inside the if block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or
labeledInfo := getContainerInfoFromLabels(status.Labels)
if utilfeature.DefaultFeatureGate.Enabled(features.TerminationLogFileCleanup) {
containerPath, ok := status.Annotations[containerTerminationMessagePathLabel]
if ok {
subHostPath := filepath.Join(config.DefaultKubeletPodsDirName, string(labeledInfo.PodUID), config.DefaultKubeletContainersDirName, labeledInfo.ContainerName)
for ...
...
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#121181 (comment)
I am very sorry, this comment was mentioned before, but I did not process it because it has been a while. That's my fault. I will address it right now.
07f3f3d
to
e221a2a
Compare
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/reopen |
@HirazawaUi: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/remove-lifecycle rotten |
test/e2e/node/kubelet.go
Outdated
// wait for init/sidecar/app container restarts 4 times | ||
if containerStatus != nil && containerStatus.RestartCount >= 4 { | ||
framework.Logf(" the restart count of %s container in %s pod is %d which has been greater than 4", e2eInstances[i].focusContainerName, e2eInstances[i].pod.Name, containerStatus.RestartCount) | ||
createdPods = append(createdPods, createdPod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could there be a race condition here? You’re concurrently appending to a slice from multiple goroutines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, I will fix it soon
test/e2e/node/kubelet.go
Outdated
waitGroup.Add(len(e2eInstances)) | ||
ginkgo.By("check the number of termination message log files") | ||
for i := range e2eInstances { | ||
go func(i int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting with Go 1.22, the loop variable i in each iteration becomes an independent variable. You no longer need to pass i as a parameter into closures.
ref: https://tip.golang.org/doc/go1.22#:~:text=Previously%2C%20the%20variables,in%20Go%201.21.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+ In Go 1.22, each iteration of the loop creates new variables, to avoid accidental sharing bugs.
I will delete the index variable for closure.
@@ -764,6 +764,9 @@ var defaultVersionedKubernetesFeatureGates = map[featuregate.Feature]featuregate | |||
{Version: version.MustParse("1.32"), Default: true, PreRelease: featuregate.Beta}, | |||
}, | |||
|
|||
TerminationLogFileCleanup: { | |||
{Version: version.MustParse("1.33"), Default: false, PreRelease: featuregate.Alpha}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR changes the behavior in a backward-incompatible manner, which effectively means we're deprecating an existing behavior (since the old behavior has been modified). You should utilize the deprecated feature gate instead, and since deprecated gates have shorter lifecycles, we can remove it more rapidly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current modification involves minimal changes and effectively prevents the issue of inode resource exhaustion caused by lots of termination files. I believe this feature gate should be removed as soon as possible indeed, so I am inclined to use this feature gate as deprecated
. Thanks a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HirazawaUi PTAL. Thanks a lot :)
fabb531
to
3ec521a
Compare
/retest |
pkg/features/kube_features.go
Outdated
@@ -689,6 +689,12 @@ const ( | |||
// Enables support for the StorageVersionMigrator controller. | |||
StorageVersionMigrator featuregate.Feature = "StorageVersionMigrator" | |||
|
|||
// owner: @chaunceyctx | |||
// Deprecated: v1.33 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our goal is to remove the version comments from this file. Could you please delete this line?
@@ -764,6 +764,11 @@ var defaultVersionedKubernetesFeatureGates = map[featuregate.Feature]featuregate | |||
{Version: version.MustParse("1.32"), Default: true, PreRelease: featuregate.Beta}, | |||
}, | |||
|
|||
TerminationLogFileCleanup: { | |||
{Version: version.MustParse("1.0"), Default: false, PreRelease: featuregate.Alpha}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{Version: version.MustParse("1.0"), Default: false, PreRelease: featuregate.Alpha}, | |
{Version: version.MustParse("1.0"), Default: false, PreRelease: featuregate.GA}, |
// Remove file used for termination log | ||
labeledInfo := getContainerInfoFromLabels(status.Labels) | ||
|
||
if utilfeature.DefaultFeatureGate.Enabled(features.TerminationLogFileCleanup) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For deprecated feature gates, we set them to false by default. When the feature gate is disabled (false), we deprecate the previously stable behavior. If this causes issues with users' workloads, they can toggle the gate to true to revert to the original behavior. Therefore, the condition should be adjusted to:
if !utilfeature.DefaultFeatureGate.Enabled(features.TerminationLogFileCleanup)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you change it like that, then the feature gate name must become the opposite, too. Something like "KeepTerminationLogFile": false (default) means "don't delete it", true means "clean up".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be more inclined to support renaming the feature gate, as this aligns better with our definition of "Deprecated" and remains consistent with most other deprecated gates.
By the way, I think you meant that when KeepTerminationLogFile
is false (the default), we clean up the files, and when KeepTerminationLogFile
is true (manually modified), we keep the log files(current behavior), right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I had it backwards.
test/e2e/feature/feature.go
Outdated
@@ -396,6 +396,10 @@ var ( | |||
// - ci-kubernetes-node-e2e-cri-proxy-serial | |||
CriProxy = framework.WithFeature(framework.ValidFeatures.Add("CriProxy")) | |||
|
|||
// owner: @chaunceyctx | |||
// remove termination log file automatically when container has been removed. | |||
TerminationLogFileCleanup = framework.WithFeature(framework.ValidFeatures.Add("TerminationLogFileCleanup")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also means we don't need to define this variable here, as the feature gate is enabled by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it disabled by default?
Or am I reading {Version: version.MustParse("1.33"), Default: false, PreRelease: featuregate.Deprecated},
wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, apologies, my brain must have frozen while typing that. What I meant to say is: The deprecated behavior is currently enabled by default, so defining this variable shouldn't be necessary.
Detailed explanation:
As mentioned in #121181 (comment)
We typically set deprecated gate to false, and use code control like if !utilfeature.DefaultFeatureGate.Enabled(features.TerminationLogFileCleanUp)
to make the code enabled by default. If users experience workload disruptions due to this change, they can set the feature gate to true to restore previous behavior.
Therefore, defining this variable here appears unnecessary since the deprecated code path is already active by default. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With https://github.com/kubernetes/kubernetes/pull/121181/files#r1978448921 it starts to make a bit more sense 😄
But then the name becomes confusing: https://github.com/kubernetes/kubernetes/pull/121181/files#r1979432133
Last but not least, not all clusters use the default settings. That's what this test feature here is for. Where will the "kubelet should delete termination message log files when container is removed" test run in the future once this PR is merged?
Is there a test for the opposite behavior? There should be one.
That means you will need two test features: one for "feature is on", another for "feature is off". Look at the description of some other entries in this file and describe them accordingly ("When this label is added to a test, the cluster must be ....").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the Deprecated feature gate, it appears we haven't previously provided test cases for both enabling and disabling directions. If we need to add tests for both enabling/disabling states, we would have to create a test job in test-infra specifically for this feature gate.
This means someone would need to maintain it, and ultimately remove the test job from test-infra when the feature gate is eventually removed. I'm uncertain if this effort is justified, especially considering we lack formal documentation outlining the correct approach for handling such scenarios.
test/e2e/node/kubelet.go
Outdated
@@ -369,6 +372,209 @@ var _ = SIGDescribe("kubelet", func() { | |||
} | |||
}) | |||
|
|||
ginkgo.Describe("kubelet should delete termination message log files when container is removed", func() { | |||
f.It("termination message log files should be deleted automatically", feature.TerminationLogFileCleanup, framework.WithFeatureGate(features.TerminationLogFileCleanup), func(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also don't need Feature gate variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also don't need Feature gate variable.
@HirazawaUi Sorry, I've been quite busy with work recently. Let me summarize the discussion above:
- The current feature gate name needs to be changed to better align with the semantics of deprecated feature gates having default false values, such as renaming it to KeepTerminationLogFile.
- We should then cover scenarios in e2e test cases where this feature gate is enabled versus disabled (default disabled).
Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct, this also means you'll need to add a test job for this feature gate in the test-infra project. Otherwise, we won't be able to run tests for this feature gate (disabling is our default behavior, but enabling requires configuring kubelet with this feature gate enabled, which existing jobs don't cover).
However, I'm uncertain whether this is worthwhile, @pohly is the specialist on this topic and can provide more accurate guidance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have jobs that enable alpha/beta features. There will be more guidance in 1.34 #131040
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now this change is causing the test suite to hang by the looks of all the failures, otherwise you'd see test runs for alpha/beta feature gate labeled tests in https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/121181/pull-kubernetes-e2e-kind-alpha-beta-features/1904902893534711808 for example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If your test depends on a feature gate it should pass the featuregate and we have common jobs to enable featuregates (and will be updating them soon), WithFeatureGate is appropriate for any test that is linked to a feature gate.
If your test depends on something else being setup (like installing a LoadBalancer controller, some special config like IPv6DualStack) then you need a e2e test WithFeature "feature".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If your test depends on a feature gate it should pass the featuregate and we have common jobs to enable featuregates (and will be updating them soon), WithFeatureGate is appropriate for any test that is linked to a feature gate.
If your test depends on something else being setup (like installing a LoadBalancer controller, some special config like IPv6DualStack) then you need a e2e test WithFeature "feature".
get it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct, this also means you'll need to add a test job for this feature gate in the test-infra project. Otherwise, we won't be able to run tests for this feature gate (disabling is our default behavior, but enabling requires configuring kubelet with this feature gate enabled, which existing jobs don't cover).
However, I'm uncertain whether this is worthwhile, @pohly is the specialist on this topic and can provide more accurate guidance.
gently /ping @pohly
Is it necessary to cover the scenario where the deprecated feature is set to true in e2e tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to cover the scenario where the deprecated feature is set to true in e2e tests?
The feature owner is responsible to ensure that there is sufficient test coverage, not SIG Testing.
In general, if there is a feature gate that users can flip, then there should be tests somewhere which cover both configurations. They don't need to be E2E tests. Running kubelet with different configurations is easier in E2E node tests.
3ec521a
to
5209960
Compare
@Chaunceyctx: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
5209960
to
3a0a733
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Chaunceyctx, pacoxu The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
container restarting will run out of inode resource because termination log can not be reclaimed in time
Which issue(s) this PR fixes:
Fixes #121180
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: