Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Fix incorrect calculation for ResourceQuota with PriorityClass as its scope #117677

Merged
merged 2 commits into from
May 5, 2023

Conversation

Huang-Wei
Copy link
Member

@Huang-Wei Huang-Wei commented Apr 28, 2023

What type of PR is this?

/kind bug
/sig api-machinery

What this PR does / why we need it:

Fix incorrect calculation for ResourceQuota with PriorityClass as its scope. See #117676 for repro steps.

Which issue(s) this PR fixes:

Fixes #117676

Special notes for your reviewer:

The root cause is at

innerMatch, err := scopeFunc(corev1.ScopedResourceSelectorRequirement{ScopeName: scope}, item)

It would always return an error as the SelectorRequirement is invalid for PriorityClassScoped quota, and then the err is swollen and returned

if err != nil {
return result, nil
}

Not sure we need UT here as we don't seem to have UT for CalculateUsageStats at all.

Does this PR introduce a user-facing change?

Fix incorrect calculation for ResourceQuota with PriorityClass as its scope.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/apiserver labels Apr 28, 2023
@Huang-Wei
Copy link
Member Author

/retest

@aojea
Copy link
Member

aojea commented Apr 29, 2023

can we have a regression test covering this bug?

@liggitt
Copy link
Member

liggitt commented Apr 29, 2023

This definitely needs tests demonstrating the issue and the fix

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 30, 2023
@Huang-Wei
Copy link
Member Author

@aojea @liggitt Added a UT to repro the issue. Please check the sub-test "partial pods matching quotaScopeSelector - w/ scopeName specified" in test pkg/quota/v1/evaluator/core/pods_test.go#TestPodEvaluatorUsageStats()

@cici37
Copy link
Contributor

cici37 commented May 2, 2023

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 2, 2023
Comment on lines 331 to 335
if len(selector.Operator) == 0 && selector.Values == nil {
// this is just checking for existence of a priorityClass on the pod
return len(pod.Spec.PriorityClassName) != 0, nil
}
Copy link
Member

@liggitt liggitt May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

digging deeper, I think we can even avoid this special case by making CalculateUsageStats consistent with getScopeSelectorsFromQuota:

func getScopeSelectorsFromQuota(quota *corev1.ResourceQuota) []corev1.ScopedResourceSelectorRequirement {
selectors := []corev1.ScopedResourceSelectorRequirement{}
for _, scope := range quota.Spec.Scopes {
selectors = append(selectors, corev1.ScopedResourceSelectorRequirement{
ScopeName: scope,
Operator: corev1.ScopeSelectorOpExists})
}
if quota.Spec.ScopeSelector != nil {
selectors = append(selectors, quota.Spec.ScopeSelector.MatchExpressions...)
}
return selectors
}

Things in Scopes should be treated as an Exists selector requirement check:

diff --git a/staging/src/k8s.io/apiserver/pkg/quota/v1/generic/evaluator.go b/staging/src/k8s.io/apiserver/pkg/quota/v1/generic/evaluator.go
index 55b31a745a0..e122248f861 100644
--- a/staging/src/k8s.io/apiserver/pkg/quota/v1/generic/evaluator.go
+++ b/staging/src/k8s.io/apiserver/pkg/quota/v1/generic/evaluator.go
@@ -199,7 +199,7 @@ func CalculateUsageStats(options quota.UsageStatsOptions,
 		// need to verify that the item matches the set of scopes
 		matchesScopes := true
 		for _, scope := range options.Scopes {
-			innerMatch, err := scopeFunc(corev1.ScopedResourceSelectorRequirement{ScopeName: scope}, item)
+			innerMatch, err := scopeFunc(corev1.ScopedResourceSelectorRequirement{ScopeName: scope, Operator: corev1.ScopeSelectorOpExists}, item)
 			if err != nil {
 				return result, nil
 			}

which would already work properly with podMatchesSelector

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question is: will this impose extra overhead on the underlying selector check (labelSelector.Matches)? If not, this solution is neater.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should change the selector constructed in CalculateUsageStats either way

I guess we can still optimize the evaluation inside podMatchesScopeFunc if we want, but do it in a more principled way by actually looking to see if the operator is the Exists operator:

        case corev1.ResourceQuotaScopePriorityClass:
+               if selector.Operator == corev1.ScopeSelectorOpExists {
+                       // this is just checking for existence of a priorityClass on the pod, no need to take the overhead of selector parsing/evaluation
+                       return len(pod.Spec.PriorityClassName) != 0, nil
+               }
                return podMatchesSelector(pod, selector)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can still optimize the evaluation inside podMatchesScopeFunc if we want

We really need to. Without this optimization, if we impose Exists operator, the benchmark test result is like:

⇒  go test ./pkg/quota/v1/evaluator/core/... -bench BenchmarkPodMatchesScopeFunc -run ^$
goos: darwin
goarch: arm64
pkg: k8s.io/kubernetes/pkg/quota/v1/evaluator/core
BenchmarkPodMatchesScopeFunc/PriorityClass_selector_w/o_operator-10         	 2401466	       493.9 ns/op
BenchmarkPodMatchesScopeFunc/PriorityClass_selector_w/_'Exists'_operator-10 	 1378538	       872.0 ns/op
BenchmarkPodMatchesScopeFunc/BestEfforts_selector_w/o_operator-10           	 2486842	       472.4 ns/op
BenchmarkPodMatchesScopeFunc/BestEfforts_selector_w/_'Exists'_operator-10   	 2300052	       473.3 ns/op

And with this optimization, as it breaks early, the result is:

⇒  go test ./pkg/quota/v1/evaluator/core/... -bench BenchmarkPodMatchesScopeFunc -run ^$
goos: darwin
goarch: arm64
pkg: k8s.io/kubernetes/pkg/quota/v1/evaluator/core
BenchmarkPodMatchesScopeFunc/PriorityClass_selector_w/o_operator-10         	 2424650	       493.7 ns/op
BenchmarkPodMatchesScopeFunc/PriorityClass_selector_w/_'Exists'_operator-10 	 5513385	       215.7 ns/op
BenchmarkPodMatchesScopeFunc/BestEfforts_selector_w/o_operator-10           	 2347708	       484.5 ns/op
BenchmarkPodMatchesScopeFunc/BestEfforts_selector_w/_'Exists'_operator-10   	 2467605	       471.6 ns/op
PASS
ok  	k8s.io/kubernetes/pkg/quota/v1/evaluator/core	6.815s

@liggitt
Copy link
Member

liggitt commented May 5, 2023

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 5, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: e9770c60854199a1952038d14f05ba14384cbeb5

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Huang-Wei, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 5, 2023
@k8s-ci-robot k8s-ci-robot merged commit dea1312 into kubernetes:master May 5, 2023
12 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.28 milestone May 5, 2023
@Huang-Wei Huang-Wei deleted the fix-quota-priorityclass branch May 5, 2023 01:11
@Huang-Wei
Copy link
Member Author

@liggitt do you think we need to backport this? (if not, it can be worked around though)

@liggitt
Copy link
Member

liggitt commented May 5, 2023

@liggitt do you think we need to backport this? (if not, it can be worked around though)

I would backport this... it's very contained, performance impact is negligible, and makes two different parts of quota work consistently

@Huang-Wei
Copy link
Member Author

I would backport this... it's very contained, performance impact is negligible, and makes two different parts of quota work consistently

@liggitt cherry-pick PRs are created. PTAL when you get a chance.

k8s-ci-robot added a commit that referenced this pull request May 11, 2023
…17677-upstream-release-1.26

Automated cherry pick of #117677: Fix incorrect calculation for ResourceQuota with
k8s-ci-robot added a commit that referenced this pull request May 11, 2023
…17677-upstream-release-1.25

Automated cherry pick of #117677: Fix incorrect calculation for ResourceQuota with
k8s-ci-robot added a commit that referenced this pull request May 11, 2023
…17677-upstream-release-1.24

Automated cherry pick of #117677: Fix incorrect calculation for ResourceQuota with
k8s-ci-robot added a commit that referenced this pull request May 11, 2023
…17677-upstream-release-1.27

Automated cherry pick of #117677: Fix incorrect calculation for ResourceQuota with
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ResourceQuota with "PriortyClass" as its scope is not calculated properly
5 participants