Skip to content

kubelet: add internal metric for the first pod with network latency #121720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 22, 2024

Conversation

aojea
Copy link
Member

@aojea aojea commented Nov 3, 2023

The first pod with network latency impact user workloads, however,
it is difficuly to understand where is the problem of this latency,
since it depends on the CNI plugin to be ready at the moment of the
pod creation.

Add a new internal metric in the kubelet that allow developers and cluster
administrator to understand the source of the latency problems on
node startups.

kubelet_first_network_pod_start_sli_duration_seconds

/kind documentation

Add a new internal metric in the kubelet that allow developers to understand the source of the latency problems on node startups.

kubelet_first_network_pod_start_sli_duration_seconds

/sig instrumentation
/sig node
/sig network

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/documentation Categorizes issue or PR as related to documentation. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/network Categorizes an issue or PR as relevant to SIG Network. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 3, 2023
@aojea
Copy link
Member Author

aojea commented Nov 3, 2023

@k8s-ci-robot
Copy link
Contributor

@aojea: GitHub didn't allow me to request PR reviews from the following users: Ramkumar-K.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/assign @SergeyKanzhelev @dchen1107 @logicalhan @wojtek-t @dgrisonnet
/cc @Ramkumar-K

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@aojea
Copy link
Member Author

aojea commented Nov 3, 2023

/cc @ruiwen-zhao @qiutongs

Copy link
Member

@logicalhan logicalhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not crazy about these metrics, this kind of data feels much better suited for logs.

Comment on lines 220 to 227
FirstNetworkPodStartSLIDuration = metrics.NewGauge(
&metrics.GaugeOpts{
Subsystem: KubeletSubsystem,
Name: FirstNetworkPodStartSLIDurationKey,
Help: "Duration in seconds to start the first network pod, excluding time to pull images and run init containers, measured from pod creation timestamp to when all its containers are reported as started and observed via watch",
StabilityLevel: metrics.ALPHA,
},
)

// FirstNetworkPodStartTotalDuration is a gauge that tracks the duration (in seconds) it takes for the first network pod to run
// since creation, including the time for image pulling.
FirstNetworkPodStartTotalDuration = metrics.NewGauge(
&metrics.GaugeOpts{
Subsystem: KubeletSubsystem,
Name: FirstNetworkPodStartTotalDurationKey,
Help: "Duration in seconds to start the first network pod since creation, including time to pull images and run init containers, measured from pod creation timestamp to when all its containers are reported as started and observed via watch",
StabilityLevel: metrics.ALPHA,
},
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are very strange metrics.. are you basically just recording this a single time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the goal is to measure the user perceived latency when running workloads, we already have node startup latency #118568 and pod startup latency, but the latency perceived by an user will be the "node startup + first pod with network latency"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aojea is this the only place at which we can measure CNI plugin initialization? Because measuring it at the pod startup level kind of pollutes the data with other variables such as the image pulling latency.

@aojea
Copy link
Member Author

aojea commented Nov 3, 2023

alternative in #121721

Comment on lines 33 to 34
FirstNetworkPodStartSLIDurationKey = "first_network_pod_start_sli_duration_seconds"
FirstNetworkPodStartTotalDurationKey = "first_network_pod_start_total_duration_seconds"
Copy link
Member

@dgrisonnet dgrisonnet Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current naming is based on an implementation detail so it may be difficult to understand for the end users.
Based on my current understanding of this metric, I would suggest something along those lines:

kubelet_node_startup_network_initialization_delay_seconds

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not really node startup since it depends on the workload to be scheduled to that node, that is the tricky part ... this metric only represent node_startup if the workload is scheduled as soon as the node is ready

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since is internal and temporary I think this name is ok since it really represents the metric

@@ -112,6 +114,13 @@ func (p *basicPodStartupLatencyTracker) ObservedPodOnWatch(pod *v1.Pod, when tim
metrics.PodStartSLIDuration.WithLabelValues().Observe(podStartSLOduration)
metrics.PodStartTotalDuration.WithLabelValues().Observe(podStartingDuration.Seconds())
state.metricRecorded = true
// if is the first Pod with network track the start values
// these metrics will help to identify problems with the CNI plugin
if !pod.Spec.HostNetwork && !p.firstNetworkPodSeen {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own education, why are we tracking pods with HostNetwork settings? Is it because these pods can only start after CNI initialization is finished, so we are using them as an indirect indicator of CNI init latency? If so why don't we measure the CNI latency directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we tracking pods with HostNetwork settings?

just the contrary, only the first Pod without HostNetwork set, that gives us the indirect CNI latency

If so why don't we measure the CNI latency directly?

This is complex and better explained on this doc

https://docs.google.com/document/d/1Ab2XvcS3koXB50RMhsfxdlXNkej5kW6RyxN1F48uqTs/edit

the TL;DR is that we created a chicken an egg problem and we need to deal with the consequences, existing network readiness is not accurate and the different CNI implementations usually are deployed as daemonset, so the CNI is installed as a hostNetwork Pod, and once it is running all the network pods can start running. ... there are some solutions going on to improve this modifying the CNI spec, but those things can take years in be adopted by all the components involved

// excluding the time for image pulling. This metric should reflect the "Pod startup latency SLI" definition and measure the time
// for the CNI plugin to be able to provide the network capabilities to the Pod.
// ref: https://github.com/kubernetes/community/blob/master/sig-scalability/slos/pod_startup_latency.md
FirstNetworkPodStartSLIDuration = metrics.NewGauge(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aojea - I think you were chatting with @logicalhan about it already and I don't know where exactly you converged, but I wanted to throw one more idea and hear why it bad.

What if instead introducing a new metrics for that, we will introduce a label for the already existing PodStartSLIDuration metrics like "kind"? [better name is needed].

Then we will be able to distinguish different types of pods in that, e.g.:
"kind=first-pod-with-network"
for this particular case.

But I think it would be universally useful, because we will certainly need this for other cases, e.g. we want to be able to distinguish stateless and stateful pods because they clearly have a very different performance characteristic. And we would be able to to reuse the same label for it, e.g.:
kind=stateless
kind=stateful

@logicalhan @dgrisonnet - thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was actually what I originally proposed. However, given that this metric is a stopgap until we can properly pipe it via Status or something, I'm okay with an isolated metric which is of stability class Internal so that it doesn't get output in our auto generated docs and will be removed after a year or so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being able to distinguish pods could be useful, but it don't think it would make sense to add the label for the current use case as we are essentially trying to measure something different. The fact that we are measuring it via a specific pod latency is an implementation detail, that I don't think we should surface in the general purpose metric.

But as Han mentioned, the general consensus is to introduce a new Internal metric as a temporary solution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kind=stateless

Do we have that granularity on the kubelet?

The first pod with network latency impact user workloads, however,
it is difficuly to understand where is the problem of this latency,
since it depends on the CNI plugin to be ready at the moment of the
pod creation.

Add a new internal metric in the kubelet that allow developers and cluster
administrator to understand the source of the latency problems on
node startups.

kubelet_first_network_pod_start_sli_duration_seconds

Change-Id: I4cdb55b0df72c96a3a65b78ce2aae404c5195006
@aojea aojea force-pushed the first_pod_network_startup branch from 12529ce to b8533f7 Compare November 15, 2023 06:11
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 15, 2023
@aojea aojea changed the title kubelet: add metrics for the first pod with network latency kubelet: add internal metric for the first pod with network latency Nov 15, 2023
@dashpole
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 16, 2023
// existing networking subsystem and CRI/CNI implementations that will be solved by https://github.com/containernetworking/cni/issues/859
// The metric represents the latency observed by an user to run workloads in a new node.
// ref: https://github.com/kubernetes/community/blob/master/sig-scalability/slos/pod_startup_latency.md
FirstNetworkPodStartSLIDuration = metrics.NewGauge(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd almost use a GaugeVec with no labels just for the Reset functionality for testing.

@bart0sh
Copy link
Contributor

bart0sh commented Nov 30, 2023

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 30, 2023
@aojea
Copy link
Member Author

aojea commented Jan 16, 2024

@dgrisonnet @logicalhan @wojtek-t kindly reminder, how can I get this to the finish line?

@wojtek-t
Copy link
Member

@logicalhan - friendly ping

Copy link
Member

@dgrisonnet dgrisonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

I think the concerns that we had in the past about this metric have been addressed by making it internal.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 22, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 590ed7e0c08ade6f764811ac15a6627d7b6ac4dd

@wojtek-t
Copy link
Member

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, dgrisonnet, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 22, 2024
@k8s-ci-robot k8s-ci-robot merged commit cbfebf0 into kubernetes:master Feb 22, 2024
@k8s-ci-robot k8s-ci-robot added this to the v1.30 milestone Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

10 participants