add Authorization tracking request/error counts and latency metrics #117211

HirazawaUi · 2023-04-11T15:38:25Z

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

add Authentication tracking request/error counts and latency metrics

Which issue(s) this PR fixes:

Fixes #117167

Special notes for your reviewer:

Does this PR introduce a user-facing change?

kube-apiserver adds two new metrics `authorization_attempts_total` and `authorization_duration_seconds` that allow users to monitor requests to authorization webhooks, split by result.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2023-04-11T15:38:29Z

Please note that we're already in Test Freeze for the release-1.27 branch. This means every merged PR will be automatically fast-forwarded via the periodic ci-fast-forward job to the release branch of the upcoming v1.27.0 release.

Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Tue Apr 11 14:07:05 UTC 2023.

k8s-ci-robot · 2023-04-11T15:38:35Z

Hi @HirazawaUi. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cici37 · 2023-04-11T20:10:16Z

/cc @shyamjvs issue owner for initial review. Thank you
/triage accepted

dgrisonnet · 2023-04-12T11:10:54Z

staging/src/k8s.io/apiserver/pkg/endpoints/filters/metrics.go

+ resultLabel = successLabel
+ case authorized == authorizer.DecisionNoOpinion:
+ // continue to the next authorizer
+ resultLabel = successLabel


It will be quite confusing to return a success when the authorizer didn't make a decision. I would understand a result="success" label of authorization_attempts_total to represent an authorization, which is not the case with DecisionNoOpinion.

Maybe we could just have a decision label instead of result?

It will be quite confusing to return a success when the authorizer didn't make a decision. I would understand a result="success" label of authorization_attempts_total to represent an authorization, which is not the case with DecisionNoOpinion.

Maybe we could just have a decision label instead of result?

In the Authorize interface, the action of Authorize does not exactly match the presence of error, when error is not nil and action is not allowed, withAuthorization will return an error, this situation cannot be described by the decision label
So, I don't think it is possible to simply use the decision label instead of result.
metrics should be divided into the following cases in order

when action is allow, the error is ignored and the count of metrics with decision="allow" and result="success" is added to 1

when error is non-empty, the metrics count of {result="error"} is added to 1

When action is forbid, also ignore the error and add 1 to the metrics count of decision="fobid and result="success".

When action is NoOpinion, add 1 to the metrics count of {decision="no opinion"}.

The above is just a shallow understanding after reading the code, do you think this is reasonable?
Please review my latest commit code, it follows the above logic

Hmm, I see thank you for the clarification. I have the feeling that separating decision and result will just be confusing to the general users and that the difference is an implementation detail that users don't necessarily need to be aware of.

I have two suggestions to make it simpler for the users:

a single metric:

authorization_attempts_total{result="allowed|forbidden|no-opinion|error"}

two metrics:

authorization_attempts_total{decision="allowed|forbidden|no-opinion|"} authorization_attempt_errors_total{}

The second is more idiomatic when computing error rate since you don't have to do requests_total - requests_total{error} in order to get the total number of errors, but at the same time, I am not super happy with having an empty decision label in case of errors.

Thanks for your explanation, I agree with you and have used the first suggestion to make changes, please review code

dgrisonnet · 2023-04-12T11:12:06Z

staging/src/k8s.io/apiserver/pkg/endpoints/filters/metrics.go

+
+ authorizationDecisionAnnotationCounter = metrics.NewCounterVec(
+ &metrics.CounterOpts{
+ Name: "authorization_decision_annotations_total",


Could you please share some lights on the use-case for this metric?

Could you please share some lights on the use-case for this metric?

It is the metrics of the decision label, it seems that the name here is not very accurate, in the latest commit has been modified, its use case in the last comment in the reply I have described, please review, and hope you provide comments to tell me whether it is reasonable

shyamjvs · 2023-04-14T01:19:56Z

Please change the PR title from Authentication -> Authorization for accuracy.

/ok-to-test
/sig auth (for reviewing the metric)
/sig instrumentation (for approving the metric style)

k8s-ci-robot · 2023-04-14T01:19:59Z

@shyamjvs: The label(s) sig/(for, sig/reviewing, sig/the, sig/metric), sig/(for, sig/approving, sig/the, sig/metric, sig/style) cannot be applied, because the repository doesn't have them.

In response to this:

Please change the PR title from Authentication -> Authorization for accuracy.

/ok-to-test
/sig auth (for reviewing the metric)
/sig instrumentation (for approving the metric style)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

shyamjvs · 2023-04-14T01:20:26Z

/priority important-soon

shyamjvs · 2023-04-14T01:22:12Z

@HirazawaUi IMO this change deserves a release note, to let users know that they can now track their authz webhooks through these new apiserver metrics.

k8s-ci-robot · 2023-04-14T22:04:14Z

LGTM label has been added.

Git tree hash: 247a4837d2bc5b2cefc7d33caa2e1b2331bc5c63

dgrisonnet · 2023-04-14T22:04:47Z

@HirazawaUi could you perhaps update the release note with the correct metric names and description?

HirazawaUi · 2023-04-17T10:26:47Z

@HirazawaUi could you perhaps update the release note with the correct metric names and description?

Okay, I've modified it

shyamjvs · 2023-04-19T21:03:10Z

I still see the incorrect metric names in the release note - authorizationAttemptsCounter and authorizationLatency.
@HirazawaUi Could you please fix it?

shyamjvs · 2023-04-19T21:09:23Z

I recommend this for the release note:

Apiserver adds two new metrics `authorization_attempts_total` and `authorization_duration_seconds` that allow users to monitor requests to authorization webhooks, split by result.

nckturner · 2023-04-20T00:30:06Z

staging/src/k8s.io/apiserver/pkg/endpoints/filters/metrics.go

@@ -38,6 +39,10 @@ const (
 successLabel = "success"
 failureLabel = "failure"
 errorLabel = "error"
+
+ allowedLabel = "allowed"
+ forbiddenLabel = "forbidden"


Suggested change

forbiddenLabel = "forbidden"

deniedLabel = "denied"

deny|denied is the language used in the Kubernetes authorization docs and the API to refer to this case, although I see that the forbid|forbidden language is used in the code. IMO we should align the metric name with the language in the public facing docs and the API instead of the code.

Thanks for the suggestion, changing it to denied seems to be a more reasonable behavior

nckturner · 2023-04-20T00:40:09Z

staging/src/k8s.io/apiserver/pkg/endpoints/filters/metrics.go

+ //Follow the behavior in withAuthorization, even encounter evaluation errors and still allow the request
+ //Will not judge whether there is an error


Suggested change

//Follow the behavior in withAuthorization, even encounter evaluation errors and still allow the request

//Will not judge whether there is an error

// In withAuthorization(), it is possible to allow the request and still encounter an evaluation error.

But also, doesn't this mean we should have a separate errors_total metric? Because in the case where there are errors and they are also allowed, we are not reporting them in metrics, right?

This issue was discussed with @dgrisonnet earlier and the conclusion was that whether or not to expose the metrics as an result error but still authorize successfully depends on whether or not the user should know the implementation details of the code
But we all feel as if the user doesn't need to care about how it is implemented
If it's because I left a comment that causes confusion, I'll remove it

HirazawaUi · 2023-04-20T13:00:20Z

I recommend this for the release note:

Apiserver adds two new metrics `authorization_attempts_total` and `authorization_duration_seconds` that allow users to monitor requests to authorization webhooks, split by result.

Okay, I've changed it as recommended

shyamjvs · 2023-04-25T18:12:53Z

/retest (unrelated failure)
/lgtm

k8s-ci-robot · 2023-04-25T18:12:55Z

@shyamjvs: The /retest command does not accept any targets.
The following commands are available to trigger required jobs:

/test pull-cadvisor-e2e-kubernetes
/test pull-kubernetes-conformance-kind-ga-only-parallel
/test pull-kubernetes-coverage-unit
/test pull-kubernetes-dependencies
/test pull-kubernetes-dependencies-go-canary
/test pull-kubernetes-e2e-gce
/test pull-kubernetes-e2e-gce-100-performance
/test pull-kubernetes-e2e-gce-big-performance
/test pull-kubernetes-e2e-gce-canary
/test pull-kubernetes-e2e-gce-cos
/test pull-kubernetes-e2e-gce-cos-canary
/test pull-kubernetes-e2e-gce-cos-no-stage
/test pull-kubernetes-e2e-gce-network-proxy-http-connect
/test pull-kubernetes-e2e-gce-scale-performance-manual
/test pull-kubernetes-e2e-kind
/test pull-kubernetes-e2e-kind-ipv6
/test pull-kubernetes-integration
/test pull-kubernetes-integration-go-canary
/test pull-kubernetes-kubemark-e2e-gce-scale
/test pull-kubernetes-node-e2e-containerd
/test pull-kubernetes-typecheck
/test pull-kubernetes-unit
/test pull-kubernetes-unit-go-canary
/test pull-kubernetes-update
/test pull-kubernetes-verify
/test pull-kubernetes-verify-go-canary

The following commands are available to trigger optional jobs:

/test check-dependency-stats
/test pull-ci-kubernetes-unit-windows
/test pull-e2e-gce-cloud-provider-disabled
/test pull-kubernetes-conformance-image-test
/test pull-kubernetes-conformance-kind-ga-only
/test pull-kubernetes-conformance-kind-ipv6-parallel
/test pull-kubernetes-cos-cgroupv1-containerd-node-e2e
/test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-eviction
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
/test pull-kubernetes-cross
/test pull-kubernetes-e2e-autoscaling-hpa-cm
/test pull-kubernetes-e2e-autoscaling-hpa-cpu
/test pull-kubernetes-e2e-capz-azure-disk
/test pull-kubernetes-e2e-capz-azure-disk-vmss
/test pull-kubernetes-e2e-capz-azure-file
/test pull-kubernetes-e2e-capz-azure-file-vmss
/test pull-kubernetes-e2e-capz-conformance
/test pull-kubernetes-e2e-capz-windows
/test pull-kubernetes-e2e-capz-windows-alpha-features
/test pull-kubernetes-e2e-capz-windows-serial-slow-hpa
/test pull-kubernetes-e2e-containerd-gce
/test pull-kubernetes-e2e-gce-correctness
/test pull-kubernetes-e2e-gce-cos-alpha-features
/test pull-kubernetes-e2e-gce-cos-kubetest2
/test pull-kubernetes-e2e-gce-csi-serial
/test pull-kubernetes-e2e-gce-device-plugin-gpu
/test pull-kubernetes-e2e-gce-network-proxy-grpc
/test pull-kubernetes-e2e-gce-serial
/test pull-kubernetes-e2e-gce-storage-disruptive
/test pull-kubernetes-e2e-gce-storage-slow
/test pull-kubernetes-e2e-gce-storage-snapshot
/test pull-kubernetes-e2e-gci-gce-autoscaling
/test pull-kubernetes-e2e-gci-gce-ingress
/test pull-kubernetes-e2e-gci-gce-ipvs
/test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
/test pull-kubernetes-e2e-kind-canary
/test pull-kubernetes-e2e-kind-dual-canary
/test pull-kubernetes-e2e-kind-ipv6-canary
/test pull-kubernetes-e2e-kind-ipvs-dual-canary
/test pull-kubernetes-e2e-kind-kms
/test pull-kubernetes-e2e-kind-multizone
/test pull-kubernetes-e2e-kops-aws
/test pull-kubernetes-e2e-kubelet-credential-provider
/test pull-kubernetes-e2e-ubuntu-gce-network-policies
/test pull-kubernetes-kind-dra
/test pull-kubernetes-kind-json-logging
/test pull-kubernetes-kind-text-logging
/test pull-kubernetes-kubemark-e2e-gce-big
/test pull-kubernetes-local-e2e
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/test pull-kubernetes-node-crio-cgrpv2-e2e
/test pull-kubernetes-node-crio-cgrpv2-e2e-kubetest2
/test pull-kubernetes-node-crio-e2e
/test pull-kubernetes-node-crio-e2e-kubetest2
/test pull-kubernetes-node-e2e-containerd-alpha-features
/test pull-kubernetes-node-e2e-containerd-features
/test pull-kubernetes-node-e2e-containerd-features-kubetest2
/test pull-kubernetes-node-e2e-containerd-kubetest2
/test pull-kubernetes-node-e2e-containerd-sidecar-containers
/test pull-kubernetes-node-e2e-containerd-standalone-mode
/test pull-kubernetes-node-e2e-containerd-standalone-mode-all-alpha
/test pull-kubernetes-node-kubelet-credential-provider
/test pull-kubernetes-node-kubelet-serial-containerd
/test pull-kubernetes-node-kubelet-serial-containerd-kubetest2
/test pull-kubernetes-node-kubelet-serial-cpu-manager
/test pull-kubernetes-node-kubelet-serial-cpu-manager-kubetest2
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv1
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv2
/test pull-kubernetes-node-kubelet-serial-hugepages
/test pull-kubernetes-node-kubelet-serial-memory-manager
/test pull-kubernetes-node-kubelet-serial-pod-disruption-conditions
/test pull-kubernetes-node-kubelet-serial-topology-manager
/test pull-kubernetes-node-kubelet-serial-topology-manager-kubetest2
/test pull-kubernetes-node-memoryqos-cgrpv2
/test pull-kubernetes-node-swap-fedora
/test pull-kubernetes-node-swap-fedora-serial
/test pull-kubernetes-node-swap-ubuntu-serial
/test pull-kubernetes-unit-experimental
/test pull-kubernetes-verify-strict-lint
/test pull-publishing-bot-validate

Use /test all to run the following jobs that were automatically triggered:

pull-kubernetes-conformance-kind-ga-only-parallel
pull-kubernetes-dependencies
pull-kubernetes-e2e-gce
pull-kubernetes-e2e-kind
pull-kubernetes-e2e-kind-ipv6
pull-kubernetes-integration
pull-kubernetes-node-e2e-containerd
pull-kubernetes-typecheck
pull-kubernetes-unit
pull-kubernetes-verify

In response to this:

/retest (unrelated failure)
/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2023-04-25T18:13:00Z

LGTM label has been added.

Git tree hash: 7cf7a9b1bf4089ef2d9f63d155b09be97e8ff58f

shyamjvs · 2023-04-25T18:13:23Z

/retest

shyamjvs · 2023-04-25T23:21:26Z

@lavalamp / @wojtek-t can one of you help approve this? Thanks

wojtek-t · 2023-05-05T08:16:56Z

This is great - thanks!

/lgtm
/approve

k8s-ci-robot · 2023-05-05T08:17:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgrisonnet, HirazawaUi, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~staging/src/k8s.io/apiserver/pkg/endpoints/OWNERS~~ [wojtek-t]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

HirazawaUi · 2023-05-05T09:26:39Z

/retest

…tency metrics Description: "Apiserver adds two new metrics `authorization_attempts_total` and `authorization_duration_seconds` that allow users to monitor requests to authorization webhooks, split by result." (PR description) Upstream PR, Issue, KEP, etc. links: * Take from upstream Kubernetes PR kubernetes#117211 (kubernetes#117211), which, at the time this patch was added, had not been merged yet. * Partially fixes Kubernetes Issue kubernetes#117167 (kubernetes#117167) If this patch is based on an upstream commit, how (if at all) do this patch and the upstream source differ? TBD after change is merged by upstream If this patch's changes have not been added by upstream, why not? Change is proposed to upstream Other patches related to this patch: None Changes made to this patch after its initial creation and reasons for these changes: None Kubernetes version this patch can be dropped: TBD after change is merged by upstream

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 11, 2023

k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 11, 2023

k8s-ci-robot requested review from soltysh and sttts April 11, 2023 15:38

k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Apr 11, 2023

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 11, 2023

dgrisonnet reviewed Apr 12, 2023

View reviewed changes

HirazawaUi force-pushed the add-auth-metrics branch from 8ff18f0 to ab8dc07 Compare April 13, 2023 16:51

k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Apr 14, 2023

k8s-ci-robot assigned dgrisonnet Apr 14, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 14, 2023

nckturner reviewed Apr 20, 2023

View reviewed changes

add Authentication tracking request/error counts and latency metrics

104c208

HirazawaUi force-pushed the add-auth-metrics branch from 9a8fb80 to 104c208 Compare April 20, 2023 13:36

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 20, 2023

k8s-ci-robot requested a review from dgrisonnet April 20, 2023 13:37

k8s-ci-robot assigned shyamjvs Apr 25, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 25, 2023

shyamjvs mentioned this pull request Apr 25, 2023

apiserver/etcd: add request and request_error metrics #117222

Merged

k8s-ci-robot assigned wojtek-t May 5, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 5, 2023

k8s-ci-robot merged commit 7e25f12 into kubernetes:master May 5, 2023
11 checks passed

k8s-ci-robot added this to the v1.28 milestone May 5, 2023

MikeSpreitzer mentioned this pull request May 5, 2023

Monitoring gaps in apiserver extension mechanisms #117167

Open

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add Authorization tracking request/error counts and latency metrics #117211

add Authorization tracking request/error counts and latency metrics #117211

HirazawaUi commented Apr 11, 2023 •

edited by wojtek-t

k8s-ci-robot commented Apr 11, 2023

k8s-ci-robot commented Apr 11, 2023

cici37 commented Apr 11, 2023

dgrisonnet Apr 12, 2023

HirazawaUi Apr 13, 2023 •

edited

dgrisonnet Apr 14, 2023

HirazawaUi Apr 14, 2023 •

edited

dgrisonnet Apr 12, 2023

HirazawaUi Apr 13, 2023 •

edited

shyamjvs commented Apr 14, 2023

k8s-ci-robot commented Apr 14, 2023

shyamjvs commented Apr 14, 2023

shyamjvs commented Apr 14, 2023

k8s-ci-robot commented Apr 14, 2023

dgrisonnet commented Apr 14, 2023

HirazawaUi commented Apr 17, 2023

shyamjvs commented Apr 19, 2023

shyamjvs commented Apr 19, 2023

nckturner Apr 20, 2023

HirazawaUi Apr 20, 2023

nckturner Apr 20, 2023

HirazawaUi Apr 20, 2023

HirazawaUi commented Apr 20, 2023

shyamjvs commented Apr 25, 2023

k8s-ci-robot commented Apr 25, 2023

k8s-ci-robot commented Apr 25, 2023

shyamjvs commented Apr 25, 2023

shyamjvs commented Apr 25, 2023

wojtek-t commented May 5, 2023

k8s-ci-robot commented May 5, 2023

HirazawaUi commented May 5, 2023

		//Follow the behavior in withAuthorization, even encounter evaluation errors and still allow the request
		//Will not judge whether there is an error

	//Follow the behavior in withAuthorization, even encounter evaluation errors and still allow the request
	//Will not judge whether there is an error
	// In withAuthorization(), it is possible to allow the request and still encounter an evaluation error.

add Authorization tracking request/error counts and latency metrics #117211

add Authorization tracking request/error counts and latency metrics #117211

Conversation

HirazawaUi commented Apr 11, 2023 • edited by wojtek-t

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Apr 11, 2023

k8s-ci-robot commented Apr 11, 2023

cici37 commented Apr 11, 2023

dgrisonnet Apr 12, 2023

Choose a reason for hiding this comment

HirazawaUi Apr 13, 2023 • edited

Choose a reason for hiding this comment

dgrisonnet Apr 14, 2023

Choose a reason for hiding this comment

HirazawaUi Apr 14, 2023 • edited

Choose a reason for hiding this comment

dgrisonnet Apr 12, 2023

Choose a reason for hiding this comment

HirazawaUi Apr 13, 2023 • edited

Choose a reason for hiding this comment

shyamjvs commented Apr 14, 2023

k8s-ci-robot commented Apr 14, 2023

shyamjvs commented Apr 14, 2023

shyamjvs commented Apr 14, 2023

k8s-ci-robot commented Apr 14, 2023

dgrisonnet commented Apr 14, 2023

HirazawaUi commented Apr 17, 2023

shyamjvs commented Apr 19, 2023

shyamjvs commented Apr 19, 2023

nckturner Apr 20, 2023

Choose a reason for hiding this comment

HirazawaUi Apr 20, 2023

Choose a reason for hiding this comment

nckturner Apr 20, 2023

Choose a reason for hiding this comment

HirazawaUi Apr 20, 2023

Choose a reason for hiding this comment

HirazawaUi commented Apr 20, 2023

shyamjvs commented Apr 25, 2023

k8s-ci-robot commented Apr 25, 2023

k8s-ci-robot commented Apr 25, 2023

shyamjvs commented Apr 25, 2023

shyamjvs commented Apr 25, 2023

wojtek-t commented May 5, 2023

k8s-ci-robot commented May 5, 2023

HirazawaUi commented May 5, 2023

HirazawaUi commented Apr 11, 2023 •

edited by wojtek-t

HirazawaUi Apr 13, 2023 •

edited

HirazawaUi Apr 14, 2023 •

edited

HirazawaUi Apr 13, 2023 •

edited