-
Notifications
You must be signed in to change notification settings - Fork 40.9k
kubelet: add internal metric for the first pod with network latency #121720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubelet: add internal metric for the first pod with network latency #121720
Conversation
/assign @SergeyKanzhelev @dchen1107 @logicalhan @wojtek-t @dgrisonnet |
@aojea: GitHub didn't allow me to request PR reviews from the following users: Ramkumar-K. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not crazy about these metrics, this kind of data feels much better suited for logs.
pkg/kubelet/metrics/metrics.go
Outdated
FirstNetworkPodStartSLIDuration = metrics.NewGauge( | ||
&metrics.GaugeOpts{ | ||
Subsystem: KubeletSubsystem, | ||
Name: FirstNetworkPodStartSLIDurationKey, | ||
Help: "Duration in seconds to start the first network pod, excluding time to pull images and run init containers, measured from pod creation timestamp to when all its containers are reported as started and observed via watch", | ||
StabilityLevel: metrics.ALPHA, | ||
}, | ||
) | ||
|
||
// FirstNetworkPodStartTotalDuration is a gauge that tracks the duration (in seconds) it takes for the first network pod to run | ||
// since creation, including the time for image pulling. | ||
FirstNetworkPodStartTotalDuration = metrics.NewGauge( | ||
&metrics.GaugeOpts{ | ||
Subsystem: KubeletSubsystem, | ||
Name: FirstNetworkPodStartTotalDurationKey, | ||
Help: "Duration in seconds to start the first network pod since creation, including time to pull images and run init containers, measured from pod creation timestamp to when all its containers are reported as started and observed via watch", | ||
StabilityLevel: metrics.ALPHA, | ||
}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are very strange metrics.. are you basically just recording this a single time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the goal is to measure the user perceived latency when running workloads, we already have node startup latency #118568 and pod startup latency, but the latency perceived by an user will be the "node startup + first pod with network latency"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aojea is this the only place at which we can measure CNI plugin initialization? Because measuring it at the pod startup level kind of pollutes the data with other variables such as the image pulling latency.
alternative in #121721 |
pkg/kubelet/metrics/metrics.go
Outdated
FirstNetworkPodStartSLIDurationKey = "first_network_pod_start_sli_duration_seconds" | ||
FirstNetworkPodStartTotalDurationKey = "first_network_pod_start_total_duration_seconds" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current naming is based on an implementation detail so it may be difficult to understand for the end users.
Based on my current understanding of this metric, I would suggest something along those lines:
kubelet_node_startup_network_initialization_delay_seconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not really node startup since it depends on the workload to be scheduled to that node, that is the tricky part ... this metric only represent node_startup if the workload is scheduled as soon as the node is ready
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since is internal and temporary I think this name is ok since it really represents the metric
@@ -112,6 +114,13 @@ func (p *basicPodStartupLatencyTracker) ObservedPodOnWatch(pod *v1.Pod, when tim | |||
metrics.PodStartSLIDuration.WithLabelValues().Observe(podStartSLOduration) | |||
metrics.PodStartTotalDuration.WithLabelValues().Observe(podStartingDuration.Seconds()) | |||
state.metricRecorded = true | |||
// if is the first Pod with network track the start values | |||
// these metrics will help to identify problems with the CNI plugin | |||
if !pod.Spec.HostNetwork && !p.firstNetworkPodSeen { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my own education, why are we tracking pods with HostNetwork settings? Is it because these pods can only start after CNI initialization is finished, so we are using them as an indirect indicator of CNI init latency? If so why don't we measure the CNI latency directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we tracking pods with HostNetwork settings?
just the contrary, only the first Pod without HostNetwork set, that gives us the indirect CNI latency
If so why don't we measure the CNI latency directly?
This is complex and better explained on this doc
https://docs.google.com/document/d/1Ab2XvcS3koXB50RMhsfxdlXNkej5kW6RyxN1F48uqTs/edit
the TL;DR is that we created a chicken an egg problem and we need to deal with the consequences, existing network readiness is not accurate and the different CNI implementations usually are deployed as daemonset, so the CNI is installed as a hostNetwork Pod, and once it is running all the network pods can start running. ... there are some solutions going on to improve this modifying the CNI spec, but those things can take years in be adopted by all the components involved
// excluding the time for image pulling. This metric should reflect the "Pod startup latency SLI" definition and measure the time | ||
// for the CNI plugin to be able to provide the network capabilities to the Pod. | ||
// ref: https://github.com/kubernetes/community/blob/master/sig-scalability/slos/pod_startup_latency.md | ||
FirstNetworkPodStartSLIDuration = metrics.NewGauge( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aojea - I think you were chatting with @logicalhan about it already and I don't know where exactly you converged, but I wanted to throw one more idea and hear why it bad.
What if instead introducing a new metrics for that, we will introduce a label for the already existing PodStartSLIDuration metrics like "kind"? [better name is needed].
Then we will be able to distinguish different types of pods in that, e.g.:
"kind=first-pod-with-network"
for this particular case.
But I think it would be universally useful, because we will certainly need this for other cases, e.g. we want to be able to distinguish stateless and stateful pods because they clearly have a very different performance characteristic. And we would be able to to reuse the same label for it, e.g.:
kind=stateless
kind=stateful
@logicalhan @dgrisonnet - thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was actually what I originally proposed. However, given that this metric is a stopgap until we can properly pipe it via Status or something, I'm okay with an isolated metric which is of stability class Internal
so that it doesn't get output in our auto generated docs and will be removed after a year or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being able to distinguish pods could be useful, but it don't think it would make sense to add the label for the current use case as we are essentially trying to measure something different. The fact that we are measuring it via a specific pod latency is an implementation detail, that I don't think we should surface in the general purpose metric.
But as Han mentioned, the general consensus is to introduce a new Internal
metric as a temporary solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kind=stateless
Do we have that granularity on the kubelet?
The first pod with network latency impact user workloads, however, it is difficuly to understand where is the problem of this latency, since it depends on the CNI plugin to be ready at the moment of the pod creation. Add a new internal metric in the kubelet that allow developers and cluster administrator to understand the source of the latency problems on node startups. kubelet_first_network_pod_start_sli_duration_seconds Change-Id: I4cdb55b0df72c96a3a65b78ce2aae404c5195006
12529ce
to
b8533f7
Compare
/triage accepted |
// existing networking subsystem and CRI/CNI implementations that will be solved by https://github.com/containernetworking/cni/issues/859 | ||
// The metric represents the latency observed by an user to run workloads in a new node. | ||
// ref: https://github.com/kubernetes/community/blob/master/sig-scalability/slos/pod_startup_latency.md | ||
FirstNetworkPodStartSLIDuration = metrics.NewGauge( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd almost use a GaugeVec with no labels just for the Reset functionality for testing.
/priority important-longterm |
@dgrisonnet @logicalhan @wojtek-t kindly reminder, how can I get this to the finish line? |
@logicalhan - friendly ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
I think the concerns that we had in the past about this metric have been addressed by making it internal.
LGTM label has been added. Git tree hash: 590ed7e0c08ade6f764811ac15a6627d7b6ac4dd
|
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aojea, dgrisonnet, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The first pod with network latency impact user workloads, however,
it is difficuly to understand where is the problem of this latency,
since it depends on the CNI plugin to be ready at the moment of the
pod creation.
Add a new internal metric in the kubelet that allow developers and cluster
administrator to understand the source of the latency problems on
node startups.
kubelet_first_network_pod_start_sli_duration_seconds
/kind documentation
/sig instrumentation
/sig node
/sig network