-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: kubelet will not output logs after log file is rotated #115702
Conversation
Hi @xyz-li. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a unit test to TestReadLogs
to cover the failure scenario?
/ok-to-test |
Working on it. |
/assign @sjenning |
I was able to confirm that this patch/fix allows continuous output of |
I also did a confirmation of the performance of this fix by recursively executing the kubectl logs command over 100 times. A stuck log during rotation didn't occur even once using this fix. I used the reproduction method in #115701
As of the the release of v1.28, executing Specifically speaking, I executed the logging command for the pod 100 times. It occurs very very rarely, about once in one hundred executions. This PR also addresses and fixes this error's occurence as far as my reproduction methodology is concerned. Stdout:
|
this has waited long enough. let's see if something shakes up in the CI /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dims, xyz-li The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thank you very much for this approval. I believe this log rotation problem has been an issue for other people for a while now. |
as this fixes a bug many people had met in production environments, will this be cherry-picked to the previous releases? |
Versions of Kubernetes that do not contain kubernetes/kubernetes#115702 will fail to detect rolled log files, causing the API to stop sending logs to the agent for processing. To work around this, this commit intorduces a rolling average calculator to determine the average delta between log entries per target. If 3x the normal delta time has elapsed since the last entry, the tailer is restarted. False positives here are acceptable, but false negatives mean that log lines may not appear for an extended period of time until the rolling detection succeeds. Closes grafana#5040
Versions of Kubernetes that do not contain kubernetes/kubernetes#115702 will fail to detect rolled log files, causing the API to stop sending logs to the agent for processing. To work around this, this commit intorduces a rolling average calculator to determine the average delta between log entries per target. If 3x the normal delta time has elapsed since the last entry, the tailer is restarted. False positives here are acceptable, but false negatives mean that log lines may not appear for an extended period of time until the rolling detection succeeds. Closes grafana#5040 Co-authored-by: Edward Welch <[email protected]>
Versions of Kubernetes that do not contain kubernetes/kubernetes#115702 will fail to detect rolled log files, causing the API to stop sending logs to the agent for processing. To work around this, this commit intorduces a rolling average calculator to determine the average delta between log entries per target. If 3x the normal delta time has elapsed since the last entry, the tailer is restarted. False positives here are acceptable, but false negatives mean that log lines may not appear for an extended period of time until the rolling detection succeeds. Closes grafana#5040 Co-authored-by: Edward Welch <[email protected]>
* component/prometheus: fix panic in interceptor when child isn't set This commit fixes a panic in prometheus.Interceptor where an interceptor which doesn't forward samples to another appendable panics when appending data. Co-authored-by: Edward Welch <[email protected]> * loki.source.kubernetes: improve detection of rolled log files Versions of Kubernetes that do not contain kubernetes/kubernetes#115702 will fail to detect rolled log files, causing the API to stop sending logs to the agent for processing. To work around this, this commit intorduces a rolling average calculator to determine the average delta between log entries per target. If 3x the normal delta time has elapsed since the last entry, the tailer is restarted. False positives here are acceptable, but false negatives mean that log lines may not appear for an extended period of time until the rolling detection succeeds. Closes #5040 Co-authored-by: Edward Welch <[email protected]> * loki.source.kubernetes: support clustering Add support for loki.source.kubernetes to distribute targets using clustering. Closes #4502 Co-authored-by: Edward Welch <[email protected]> * loki.source.podlogs: support clustering Add support for loki.source.podlogs to distribute targets using clustering. * service/cluster: add common block for clustering arguments * remove irrelevant TODO comment #5623 (comment) --------- Co-authored-by: Edward Welch <[email protected]>
What type of PR is this?
/kind bug
What this PR does / why we need it:
kubectl logs POD_NAME -f won't output logs after kubelet rotate log file of the container .
Which issue(s) this PR fixes:
Fixes #115701
Special notes for your reviewer:
Comment from the method Wacher.Add
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: