kubelet: add internal metric for the first pod with network latency #121720

aojea · 2023-11-03T20:54:56Z

The first pod with network latency impact user workloads, however,
it is difficuly to understand where is the problem of this latency,
since it depends on the CNI plugin to be ready at the moment of the
pod creation.

Add a new internal metric in the kubelet that allow developers and cluster
administrator to understand the source of the latency problems on
node startups.

kubelet_first_network_pod_start_sli_duration_seconds

/kind documentation

Add a new internal metric in the kubelet that allow developers to understand the source of the latency problems on node startups.

kubelet_first_network_pod_start_sli_duration_seconds

/sig instrumentation
/sig node
/sig network

aojea · 2023-11-03T20:57:16Z

/assign @SergeyKanzhelev @dchen1107 @logicalhan @wojtek-t @dgrisonnet
/cc @Ramkumar-K

k8s-ci-robot · 2023-11-03T20:57:21Z

@aojea: GitHub didn't allow me to request PR reviews from the following users: Ramkumar-K.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/assign @SergeyKanzhelev @dchen1107 @logicalhan @wojtek-t @dgrisonnet
/cc @Ramkumar-K

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

aojea · 2023-11-03T20:58:16Z

/cc @ruiwen-zhao @qiutongs

logicalhan

I'm not crazy about these metrics, this kind of data feels much better suited for logs.

logicalhan · 2023-11-03T21:02:09Z

pkg/kubelet/metrics/metrics.go

+	FirstNetworkPodStartSLIDuration = metrics.NewGauge(
+		&metrics.GaugeOpts{
+			Subsystem:      KubeletSubsystem,
+			Name:           FirstNetworkPodStartSLIDurationKey,
+			Help:           "Duration in seconds to start the first network pod, excluding time to pull images and run init containers, measured from pod creation timestamp to when all its containers are reported as started and observed via watch",
+			StabilityLevel: metrics.ALPHA,
+		},
+	)
+
+	// FirstNetworkPodStartTotalDuration is a gauge that tracks the duration (in seconds) it takes for the first network pod to run
+	// since creation, including the time for image pulling.
+	FirstNetworkPodStartTotalDuration = metrics.NewGauge(
+		&metrics.GaugeOpts{
+			Subsystem:      KubeletSubsystem,
+			Name:           FirstNetworkPodStartTotalDurationKey,
+			Help:           "Duration in seconds to start the first network pod since creation, including time to pull images and run init containers, measured from pod creation timestamp to when all its containers are reported as started and observed via watch",
+			StabilityLevel: metrics.ALPHA,
+		},
+	)


These are very strange metrics.. are you basically just recording this a single time?

yes, the goal is to measure the user perceived latency when running workloads, we already have node startup latency #118568 and pod startup latency, but the latency perceived by an user will be the "node startup + first pod with network latency"

@aojea is this the only place at which we can measure CNI plugin initialization? Because measuring it at the pod startup level kind of pollutes the data with other variables such as the image pulling latency.

pkg/kubelet/util/pod_startup_latency_tracker_test.go

aojea · 2023-11-03T22:03:24Z

alternative in #121721

dgrisonnet · 2023-11-06T17:05:49Z

pkg/kubelet/metrics/metrics.go

+	FirstNetworkPodStartSLIDurationKey   = "first_network_pod_start_sli_duration_seconds"
+	FirstNetworkPodStartTotalDurationKey = "first_network_pod_start_total_duration_seconds"


The current naming is based on an implementation detail so it may be difficult to understand for the end users.
Based on my current understanding of this metric, I would suggest something along those lines:

kubelet_node_startup_network_initialization_delay_seconds

it is not really node startup since it depends on the workload to be scheduled to that node, that is the tricky part ... this metric only represent node_startup if the workload is scheduled as soon as the node is ready

since is internal and temporary I think this name is ok since it really represents the metric

pkg/kubelet/metrics/metrics.go

pkg/kubelet/util/pod_startup_latency_tracker.go

ruiwen-zhao · 2023-11-06T21:31:13Z

pkg/kubelet/util/pod_startup_latency_tracker.go

@@ -112,6 +114,13 @@ func (p *basicPodStartupLatencyTracker) ObservedPodOnWatch(pod *v1.Pod, when tim
 		metrics.PodStartSLIDuration.WithLabelValues().Observe(podStartSLOduration)
 		metrics.PodStartTotalDuration.WithLabelValues().Observe(podStartingDuration.Seconds())
 		state.metricRecorded = true
+		// if is the first Pod with network track the start values
+		// these metrics will help to identify problems with the CNI plugin
+		if !pod.Spec.HostNetwork && !p.firstNetworkPodSeen {


For my own education, why are we tracking pods with HostNetwork settings? Is it because these pods can only start after CNI initialization is finished, so we are using them as an indirect indicator of CNI init latency? If so why don't we measure the CNI latency directly?

why are we tracking pods with HostNetwork settings?

just the contrary, only the first Pod without HostNetwork set, that gives us the indirect CNI latency

If so why don't we measure the CNI latency directly?

This is complex and better explained on this doc

https://docs.google.com/document/d/1Ab2XvcS3koXB50RMhsfxdlXNkej5kW6RyxN1F48uqTs/edit

the TL;DR is that we created a chicken an egg problem and we need to deal with the consequences, existing network readiness is not accurate and the different CNI implementations usually are deployed as daemonset, so the CNI is installed as a hostNetwork Pod, and once it is running all the network pods can start running. ... there are some solutions going on to improve this modifying the CNI spec, but those things can take years in be adopted by all the components involved

wojtek-t · 2023-11-10T14:20:31Z

pkg/kubelet/metrics/metrics.go

+	// excluding the time for image pulling. This metric should reflect the "Pod startup latency SLI" definition and measure the time
+	// for the CNI plugin to be able to provide the network capabilities to the Pod.
+	// ref: https://github.com/kubernetes/community/blob/master/sig-scalability/slos/pod_startup_latency.md
+	FirstNetworkPodStartSLIDuration = metrics.NewGauge(


@aojea - I think you were chatting with @logicalhan about it already and I don't know where exactly you converged, but I wanted to throw one more idea and hear why it bad.

What if instead introducing a new metrics for that, we will introduce a label for the already existing PodStartSLIDuration metrics like "kind"? [better name is needed].

Then we will be able to distinguish different types of pods in that, e.g.:
"kind=first-pod-with-network"
for this particular case.

But I think it would be universally useful, because we will certainly need this for other cases, e.g. we want to be able to distinguish stateless and stateful pods because they clearly have a very different performance characteristic. And we would be able to to reuse the same label for it, e.g.:
kind=stateless
kind=stateful

@logicalhan @dgrisonnet - thoughts?

That was actually what I originally proposed. However, given that this metric is a stopgap until we can properly pipe it via Status or something, I'm okay with an isolated metric which is of stability class Internal so that it doesn't get output in our auto generated docs and will be removed after a year or so.

Being able to distinguish pods could be useful, but it don't think it would make sense to add the label for the current use case as we are essentially trying to measure something different. The fact that we are measuring it via a specific pod latency is an implementation detail, that I don't think we should surface in the general purpose metric.

But as Han mentioned, the general consensus is to introduce a new Internal metric as a temporary solution.

kind=stateless

Do we have that granularity on the kubelet?

The first pod with network latency impact user workloads, however, it is difficuly to understand where is the problem of this latency, since it depends on the CNI plugin to be ready at the moment of the pod creation. Add a new internal metric in the kubelet that allow developers and cluster administrator to understand the source of the latency problems on node startups. kubelet_first_network_pod_start_sli_duration_seconds Change-Id: I4cdb55b0df72c96a3a65b78ce2aae404c5195006

dashpole · 2023-11-16T15:52:54Z

/triage accepted

logicalhan · 2023-11-16T17:40:54Z

pkg/kubelet/metrics/metrics.go

+	// existing networking subsystem and CRI/CNI implementations that will be solved by https://github.com/containernetworking/cni/issues/859
+	// The metric represents the latency observed by an user to run workloads in a new node.
+	// ref: https://github.com/kubernetes/community/blob/master/sig-scalability/slos/pod_startup_latency.md
+	FirstNetworkPodStartSLIDuration = metrics.NewGauge(


I'd almost use a GaugeVec with no labels just for the Reset functionality for testing.

bart0sh · 2023-11-30T09:54:51Z

/priority important-longterm

aojea · 2024-01-16T22:04:30Z

@dgrisonnet @logicalhan @wojtek-t kindly reminder, how can I get this to the finish line?

wojtek-t · 2024-01-31T14:39:20Z

@logicalhan - friendly ping

dgrisonnet

/lgtm

I think the concerns that we had in the past about this metric have been addressed by making it internal.

k8s-ci-robot · 2024-02-22T11:17:17Z

LGTM label has been added.

Git tree hash: 590ed7e0c08ade6f764811ac15a6627d7b6ac4dd

wojtek-t · 2024-02-22T13:07:50Z

/lgtm
/approve

k8s-ci-robot · 2024-02-22T13:08:14Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, dgrisonnet, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/kubelet/OWNERS~~ [wojtek-t]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested review from bobbypage and tallclair November 3, 2023 20:55

k8s-ci-robot added the area/kubelet label Nov 3, 2023

k8s-ci-robot assigned dchen1107, dgrisonnet, logicalhan, SergeyKanzhelev and wojtek-t Nov 3, 2023

k8s-ci-robot requested review from qiutongs and ruiwen-zhao November 3, 2023 20:58

logicalhan reviewed Nov 3, 2023

View reviewed changes

aojea mentioned this pull request Nov 3, 2023

kubelet: add metrics for the first pod with network latency #121721

Closed

dgrisonnet reviewed Nov 6, 2023

View reviewed changes

ruiwen-zhao reviewed Nov 6, 2023

View reviewed changes

wojtek-t reviewed Nov 10, 2023

View reviewed changes

aojea force-pushed the first_pod_network_startup branch from 12529ce to b8533f7 Compare November 15, 2023 06:11

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 15, 2023

aojea changed the title ~~kubelet: add metrics for the first pod with network latency~~ kubelet: add internal metric for the first pod with network latency Nov 15, 2023

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 16, 2023

logicalhan reviewed Nov 16, 2023

View reviewed changes

k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 30, 2023

dgrisonnet approved these changes Feb 22, 2024

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 22, 2024

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 22, 2024

k8s-ci-robot merged commit cbfebf0 into kubernetes:master Feb 22, 2024

k8s-ci-robot added this to the v1.30 milestone Feb 22, 2024

		FirstNetworkPodStartSLIDurationKey = "first_network_pod_start_sli_duration_seconds"
		FirstNetworkPodStartTotalDurationKey = "first_network_pod_start_total_duration_seconds"

kubelet: add internal metric for the first pod with network latency #121720

kubelet: add internal metric for the first pod with network latency #121720

Uh oh!

Conversation

aojea commented Nov 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aojea commented Nov 3, 2023

Uh oh!

k8s-ci-robot commented Nov 3, 2023

Uh oh!

aojea commented Nov 3, 2023

Uh oh!

logicalhan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aojea commented Nov 3, 2023

Uh oh!

dgrisonnet Nov 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dashpole commented Nov 16, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bart0sh commented Nov 30, 2023

Uh oh!

aojea commented Jan 16, 2024

Uh oh!

wojtek-t commented Jan 31, 2024

Uh oh!

dgrisonnet left a comment

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Feb 22, 2024

Uh oh!

wojtek-t commented Feb 22, 2024

Uh oh!

k8s-ci-robot commented Feb 22, 2024

Uh oh!

Uh oh!

aojea commented Nov 3, 2023 •

edited

Loading

dgrisonnet Nov 6, 2023 •

edited

Loading