Skip to content

Adding metrics for Maxunavailable feature in StatefulSet #130951

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Edwinhr716
Copy link

@Edwinhr716 Edwinhr716 commented Mar 20, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

Adds a metric to track how many times there has been a maxunavailable violation, requirement for kubernetes/enhancements#961 beta graduation.

Which issue(s) this PR fixes:

Part of kubernetes/enhancements#961

Special notes for your reviewer:

This is a follow up to the discussion on the KEP update PR kubernetes/enhancements#4474 (comment).

General consensus seems to be that this metric should be in tree instead of in kube-state-metrics.

Open question:

  • Should the metric be generic like the one exposed by deployment?

cc @atiratree @dgrisonnet @wojtek-t who were part of the original discussion.

Does this PR introduce a user-facing change?

Adds metric for Maxunavailable feature

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 20, 2025
@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 20, 2025
@github-project-automation github-project-automation bot moved this to Needs Triage in SIG Apps Mar 20, 2025
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 20, 2025
@Edwinhr716
Copy link
Author

/assign @janetkuo @soltysh

@k8s-ci-robot k8s-ci-robot added area/stable-metrics Issues or PRs involving stable metrics sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. labels Mar 20, 2025
@janetkuo
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 20, 2025
@Edwinhr716
Copy link
Author

/retest

@k8s-triage-robot
Copy link

This PR may require stable metrics review.

Stable metrics are guaranteed to not change. Please review the documentation for the requirements and lifecycle of stable metrics and ensure that your metrics meet these guidelines.

@dims
Copy link
Member

dims commented Mar 24, 2025

cc @xiaohongchen1991

@janetkuo
Copy link
Member

LGTM in general after the presubmit check failure is fixed. @soltysh would you like to take a look as well?

Copy link
Member

@janetkuo janetkuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 14, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 30677f32c92a34a1acfbc1b72625aa1bb7a63803

@dgrisonnet
Copy link
Member

/assign

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is a good starting point, I see Damien will look at it from instrumentation pov.

/lgtm
/approve

@soltysh
Copy link
Contributor

soltysh commented Apr 17, 2025

/triage accepted
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 17, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Edwinhr716, janetkuo, soltysh
Once this PR has been reviewed and has the lgtm label, please ask for approval from dgrisonnet. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Apr 17, 2025
Comment on lines 748 to +753
"statefulSet", klog.KObj(set),
"unavailablePods", unavailablePods,
"maxUnavailable", maxUnavailable)
if unavailablePods > maxUnavailable {
metrics.MaxUnavailableViolations.WithLabelValues(set.Namespace, set.Name).Inc()
}
Copy link
Member

@atiratree atiratree Apr 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always triggered by Parallel StatefulSets multiple times during the initial rollout (depending on the number of replicas, minReadySeconds), so I am not sure how useful this metric is.

IMO, we should remove the logging here when we graduate to beta, or at least make it less verbose (4?), to prevent the spam.

Also, during the OrderedReady rollout there is a period of time where we have unavailable pods, but we don't log that. We also don't notice loss of availability in future StatefulSet updates.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always triggered by Parallel StatefulSets multiple times during the initial rollout

By this you mean if PodManagementPolicy is set to Parallel? If so. could you expand on this? I don't see how it is triggered multiple times. There will be at most maxUnavailable unavailable pods no? Even using minReadySeconds?

IMO, we should remove the logging here when we graduate to beta, or at least make it less verbose (4?), to prevent the spam.

Makes sense, especially if we keep the metric.

Also, during the OrderedReady rollout there is a period of time where we have unavailable pods, but we don't log that

Are you suggesting we log it? Wouldn't that just be logging it everytime there is an unavailable pod?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always triggered by Parallel StatefulSets multiple times during the initial rollout (depending on the number of replicas, minReadySeconds), so I am not sure how useful this metric is.

This part is being fixed in #130909, where we're only missing unit test to properly account unavailable pods with minReadySeconds taken into account.

IMO, we should remove the logging here when we graduate to beta, or at least make it less verbose (4?), to prevent the spam.

That seems reasonable.

Also, during the OrderedReady rollout there is a period of time where we have unavailable pods, but we don't log that. We also don't notice loss of availability in future StatefulSet updates.

We don't strive to log the time for how long the pods will be unavailable. As you pointed in your first question, this will vary from one statefulset to another, and by the nature of statefulsets it's hard to use that as a reasonable metric. This was discussed several times in the past.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to remove those logs. Even if we're keeping the logs, we don't need to log when unavailablePods == maxUnavailable given that it's a valid case.

Copy link
Member

@atiratree atiratree Apr 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By this you mean if PodManagementPolicy is set to Parallel? If so. could you expand on this? I don't see how it is triggered multiple times. There will be at most maxUnavailable unavailable pods no? Even using minReadySeconds?

Yes, in case the PodManagementPolicy is set to Parallel and MaxUnavailableStatefulSet FG enabled. It happens before the statefulset reconciles to the final state.

Yes, but it takes time for all the pods to reach the minReadySeconds.

This part is being fixed in #130909, where we're only missing unit test to properly account unavailable pods with minReadySeconds taken into account.

Even when I test it with #130909, it still happens. Parallel policy hits this point +/- 18 times for StatefulSet with 5 pods for me. It depends on the kubelet/apiserver and other variables how many reconciles we hit.

Also, during the OrderedReady rollout there is a period of time where we have unavailable pods, but we don't log that

Are you suggesting we log it? Wouldn't that just be logging it everytime there is an unavailable pod?

We don't strive to log the time for how long the pods will be unavailable. As you pointed in your first question, this will vary from one statefulset to another, and by the nature of statefulsets it's hard to use that as a reasonable metric. This was discussed several times in the past.

I do not think we necessarily have to log that. Just saying there is difference between OrderedReady and Parallel.

Copy link
Author

@Edwinhr716 Edwinhr716 May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it takes time for all the pods to reach the minReadySeconds.

Sure, but we take that into account when we determine how many pods are unavailable since they are only set as available once minReadySeconds has passed. So when we determine how many pods to delete

podsToDelete := maxUnavailable - unavailablePods
it shouldn't delete more (causing them to be unavailable) unless more than maxUnavailable pods are available after minReadySeconds have passed.

@atiratree
Copy link
Member

/hold
for #130951 (comment)

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 21, 2025
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 23, 2025
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@Edwinhr716
Copy link
Author

/retest

@k8s-ci-robot
Copy link
Contributor

@Edwinhr716: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-e2e-gce e9b66a0 link true /test pull-kubernetes-e2e-gce

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@soltysh
Copy link
Contributor

soltysh commented Apr 28, 2025

/hold
for #130951 (comment)

I believe the comment is being addressed in #130909.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 28, 2025
@atiratree
Copy link
Member

Added additional info: #130951 (comment)

@richabanker
Copy link
Contributor

/assign

@soltysh
Copy link
Contributor

soltysh commented Jun 5, 2025

Added additional info: #130951 (comment)

Filip's comment is on-point, this will need to be addressed.

@soltysh
Copy link
Contributor

soltysh commented Jun 5, 2025

/assign

@richabanker any feedback from the sig-instrumentation pov?

@richabanker
Copy link
Contributor

/assign

@richabanker any feedback from the sig-instrumentation pov?

whoops, super sorry for the late reply, I think I was just curious, why is the new metric starting off at BETA stabilityLevel?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/stable-metrics Issues or PRs involving stable metrics cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: Needs Triage
Development

Successfully merging this pull request may close these issues.

9 participants