Skip to content

KEP-4346: Add metrics for informer #129160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

xigang
Copy link
Member

@xigang xigang commented Dec 11, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

  1. Adds reflector metrics
  2. Adds informer metrics
  3. Expose informer reflector/queue/eventHandler metrics

KEP-4346
https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/4346-informer-metrics

Which issue(s) this PR fixes:

#121474
#129795
#117123
#122067 (comment)
#130767
kubernetes/client-go#1027
kubernetes-sigs/controller-runtime#817
kubernetes-sigs/controller-runtime#3189
kubernetes-sigs/controller-runtime#3182

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Added metrics to help monitor and debug the performance of informers and reflectors.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 11, 2024
@k8s-ci-robot
Copy link
Contributor

Please note that we're already in Test Freeze for the release-1.32 branch. This means every merged PR will be automatically fast-forwarded via the periodic ci-fast-forward job to the release branch of the upcoming v1.32.0 release.

Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Wed Dec 11 12:08:11 UTC 2024.

@k8s-ci-robot k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 11, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @xigang. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 11, 2024
@xigang xigang changed the title [WIP] clent-go: Add metrics for informer clent-go: Add metrics for informer Dec 12, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 12, 2024
@xigang xigang changed the title clent-go: Add metrics for informer client-go: Add metrics for informer Dec 12, 2024
@xigang
Copy link
Member Author

xigang commented Dec 12, 2024

/sig api-machinery
/sig scalability

@k8s-ci-robot k8s-ci-robot added the sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. label Dec 12, 2024
@xigang xigang changed the title client-go: Add metrics for informer KEP-4346: Add metrics for informer Dec 12, 2024
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Dec 12, 2024
@dgrisonnet
Copy link
Member

for sig-instrumentation review

/assign

@Jefftree
Copy link
Member

/cc @richabanker
/triage accepted

@xigang xigang force-pushed the informer_metrics branch 2 times, most recently from 7dc4850 to 0f303fe Compare May 30, 2025 08:10
@xigang
Copy link
Member Author

xigang commented Jun 3, 2025

@dgrisonnet @richabanker For approval?

@richabanker
Copy link
Contributor

I think my comments are all addressed, thanks for that. Looking good to me mostly.
IIUC There are some concerns about usability of the metrics, esp with the event_handler_name label which contains the hash value, hoping to get someone from api-machinery to weigh in on that.

@xigang
Copy link
Member Author

xigang commented Jun 5, 2025

I think my comments are all addressed, thanks for that. Looking good to me mostly.
IIUC There are some concerns about usability of the metrics, esp with the event_handler_name label which contains the hash value, hoping to get someone from api-machinery to weigh in on tha

/cc @liggitt @deads2k @jpbetz @dims
The sig-instrumentation review has been completed. Please take another look? Thank you!

@xigang
Copy link
Member Author

xigang commented Jun 10, 2025

/assign @liggitt
for api-machinery review.

@xigang
Copy link
Member Author

xigang commented Jun 19, 2025

@liggitt @deads2k This PR has been blocked for a long time — could you help take a look, or let me know if there’s a plan for it?

I believe this PR is quite important. Related issue:
#129795
#122067 (comment)
#130767
kubernetes/client-go#1027
kubernetes-sigs/controller-runtime#817
kubernetes-sigs/controller-runtime#3189
kubernetes-sigs/controller-runtime#3182

cc @wojtek-t @dims @soltysh PTAL. Thanks!

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 26, 2025
@xigang xigang force-pushed the informer_metrics branch from 0f303fe to b9e7aa4 Compare June 26, 2025 09:54
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 26, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: xigang
Once this PR has been reviewed and has the lgtm label, please ask for approval from liggitt. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

@xigang: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-apidiff-client-go b9e7aa4 link false /test pull-kubernetes-apidiff-client-go

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@Rajalakshmi-Girish Rajalakshmi-Girish moved this to Pending inclusion in [sig-release] Bug Triage Jun 30, 2025
@Prajyot-Parab
Copy link

Hello @xigang @liggitt
I'd like to check what's the status of this PR. If there's anything we can do, please let us know. The code freeze is starting 02:00 UTC Friday 25th July 2025 (about 4 weeks from now). Please make sure the PR has both lgtm and approved labels before the code freeze.
Thanks!

@Prajyot-Parab Prajyot-Parab moved this from Pending inclusion to Tracked in [sig-release] Bug Triage Jun 30, 2025
@xigang
Copy link
Member Author

xigang commented Jun 30, 2025

@Prajyot-Parab This PR for sig-instrumentation has been reviewed, but it still needs confirmation from api-machinery.
cc @liggitt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloudprovider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: Needs Review
Status: Tracked
Development

Successfully merging this pull request may close these issues.