-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a node startup latency tracker #118568
Conversation
/kind feature |
088619a
to
7d0773c
Compare
7d0773c
to
f5b8b66
Compare
CC @linxiulei |
f5b8b66
to
db57886
Compare
/cc @azylinski @linxiulei |
/lgtm New metrics now look even better, @qiutongs can add a release note mentioning that the kubelet now has metrics to expose the duration of the different stages of the node startup? |
LGTM label has been added. Git tree hash: a4b98e71b840b7a9d9f96303072f15240a87806c
|
/test pull-kubernetes-e2e-gce unrelated failure
|
Updated it in #118568 (comment) |
I think that we are good enough for merging, this metrics are important for working on improving the startup process of the kubelet, are you ok with this @dgrisonnet ? For approvals |
I would still love to see #118568 (comment) addressed, but am I fine with adding the additional documentation in a follow-up PR. |
I don't feel we can provide a good information right now about it, but we must not forget to update that metrics once we do the due diligence |
I will create a follow-up PR once I can summarize the steps. It will look like: Duration in seconds of node startup before registration, including X, Y and Z. |
Sounds fair 👍
Perfect, that's exactly what I was looking for :) |
+1 on addressing #118568 (comment) but ok for the following PR. Looking forward to seeing the reports generated from these metrics. Thanks! /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dchen1107, qiutongs The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I was wondering if this can be backported to 1.28 or earlier. After talking to a few folks, it seems such change is not qualified as a bug fix so we generally don't backport. If someone has different thoughts, please let me know! |
/retest-required |
What type of PR is this?
kind/feature
What this PR does / why we need it:
Export metrics for node startup latency breakdown. Then we can understand how the node startup performs.
Initially, I am focusing on four Kubelet starts running for the first time timepoints.
Therefore, I create 3 new latency metrics.
Which issue(s) this PR fixes:
Fixes # N/A
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: