Fixed large resourceversion and limit for storages #132374

PatrickLaabs · 2025-06-18T14:40:46Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

Currently, when you do try to query the StorageClassList with a very high resourceVersion and a high limit value, you'll get a response, like this:
NOTE: Youll not only get a error message from etcdserver, you will also get a 500 error code.

curl -v -X GET "https://localhost:6443/apis/storage.k8s.io/v1/storageclasses?resourceVersion=276&timeoutSeconds=1&gracePeriodSeconds=103" -H "Authorization: Bearer $TOKEN" --insecure

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "etcdserver: mvcc: required revision is a future revision",
  "code": 500

The issues #132358 suggested, that we might want to return a more graceful error message and a error code of 504, like this:

    "causes": [
      {
        "reason": "ResourceVersionTooLarge",
        "message": "Too large resource version"
      }
    ],
    "retryAfterSeconds": 1
  },
  "code": 504

After making some slight adjustments on the error handling, we are able to query the StorageClassList with a very high resourceVersion and a high limit value, and we will receive the desired message:

curl -v -X GET "https://localhost:6443/apis/storage.k8s.io/v1/storageclasses?resourceVersion=44232320&limit=38&timeoutSeconds=1&gracePeriodSeconds=103" -H "Authorization: Bearer $TOKEN" --insecure

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "Timeout: Too large resource version: 2902323232, current: 300",
  "reason": "Timeout",
  "details": {
    "causes": [
      {
        "reason": "ResourceVersionTooLarge",
        "message": "Too large resource version"
      }
    ],
    "retryAfterSeconds": 1
  },
  "code": 504

Which issue(s) this PR is related to:

Fixes #132358

Special notes for your reviewer:

I am quite not sure, if the current change is what we really want.
This was a challenge for me 😄

We will check for the error message from etcd:
if err == etcdrpc.ErrFutureRev, which is infact this

ErrGRPCFutureRev               = status.Error(codes.OutOfRange, "etcdserver: mvcc: required revision is a future revision")

As I am not sure, if this is a good practice, I looked at the interpretListError function, which will be called within the GetList function.

And yes, we already doing it this way:

func interpretListError(err error, paging bool, continueKey, keyPrefix string) error {
	switch {
	case err == etcdrpc.ErrCompacted:
		if paging {
			return handleCompactedErrorForPaging(continueKey, keyPrefix)
		}
		return errors.NewResourceExpired(expired)
	}
	return err
}

But I'd be more than happy for some suggestions, if this is not the right approach 👍

Does this PR introduce a user-facing change?

Fixed API response for StorageClassList queries and returns a graceful error message, if the provided ResourceVersion is too large.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE

k8s-ci-robot · 2025-06-18T14:40:55Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2025-06-18T14:40:56Z

Hi @PatrickLaabs. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

likakuli · 2025-06-19T08:38:07Z

Could you please add a unit test for it?

PatrickLaabs · 2025-06-19T09:09:37Z

Thanks for the response and of course. I was waiting for a response on this, before I invest more time.

serathius · 2025-06-19T11:09:10Z

etcdserver: mvcc: required revision is a future revision is error from etcd, Too large resource version is error from cache.

Reason why you get different errors is due to &limit=38 argument, it changes LIST semantic from NotOlderThan served from cache, to legacy Exact served from etcd.

I'm not against unifying the semantic, I think it's even benefitial, but we definitely need a test for that. Suggest adding a scenario to RunTestList.

serathius · 2025-06-19T11:11:00Z

staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go

+					// If we can't get the current RV, use 0 as a fallback.
+					currentRV = 0
+				}
+				return storage.NewTooLargeResourceVersionError(requestedRV, currentRV, 1)


If I remember correctly the 1 in NewTooLargeResourceVersionError stands for 1 second in Retry-After. Seems like a incorrect suggestion for clients.

Yes, thats correct. Setting it to value of 0 is a better approach, right?

Could you check other places we use NewTooLargeResourceVersionError and see what we do there.

kubernetes/staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go

Lines 1035 to 1049 in 0f478e5

func (s *store) validateMinimumResourceVersion(minimumResourceVersion string, actualRevision uint64) error {

if minimumResourceVersion == "" {

return nil

}

minimumRV, err := s.versioner.ParseResourceVersion(minimumResourceVersion)

if err != nil {

return apierrors.NewBadRequest(fmt.Sprintf("invalid resource version: %v", err))

}

// Enforce the storage.Interface guarantee that the resource version of the returned data

// "will be at least 'resourceVersion'".

if minimumRV > actualRevision {

return storage.NewTooLargeResourceVersionError(minimumRV, actualRevision, 0)

}

return nil

}

I found 3 cases:

kubernetes/staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go

Line 1046 in 0f478e5

return storage.NewTooLargeResourceVersionError(minimumRV, actualRevision, 0)

- 0 used in etcd3 store.

kubernetes/staging/src/k8s.io/apiserver/pkg/storage/cacher/watch_cache.go

Line 475 in 0f478e5

return storage.NewTooLargeResourceVersionError(resourceVersion, w.resourceVersion, resourceVersionTooHighRetrySeconds)

- resourceVersionTooHighRetrySeconds = 1 used in cache.

kubernetes/staging/src/k8s.io/apiserver/pkg/storage/etcd3/watcher.go

Line 359 in 0f478e5

wc.sendError(storage.NewTooLargeResourceVersionError(uint64(wc.initialRev), currentStorageRV, int(wait.Jitter(1*time.Second, 3).Seconds())))

- used for watch

Looks like there is no established value. For now I would set it to 0 to be consistent with etcd3 store. Long term it would be better to have one common strategy.

Thats exactly what i thought. I'll set a reminder, to open up a follow-up issue for this one 👍

staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go

PatrickLaabs · 2025-06-19T20:59:28Z

After reviewing the unit test, I have found something:

if tt.expectRVTooLarge {
	// TODO: Clasify etcd future revision error as TooLargeResourceVersion
	if err == nil || !(storage.IsTooLargeResourceVersion(err) || strings.Contains(err.Error(), "etcdserver: mvcc: required revision is a future revision")) {
		t.Fatalf("expecting resource version too high error, but get: %q", err)
	}
	return
}

If I am not completely wrong here.. thats exact the point we are looking for in our unit tests, right?

With my update Code, I had commented in the reviews, I made these adjustments for the unit test:

			if tt.expectRVTooLarge {
				if !storage.IsTooLargeResourceVersion(err) {
					t.Fatalf("expecting resource version too high error, but get: %v", err)
				}
				return
			}

What do you thing? Or shall we extend the testings?

serathius · 2025-06-20T07:20:39Z

If I am not completely wrong here.. thats exact the point we are looking for in our unit tests, right?

yes

staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go

serathius · 2025-06-20T08:49:18Z

/ok-to-test

serathius · 2025-06-23T16:29:05Z

/lgtm

PTAL @wojtek-t

k8s-ci-robot · 2025-06-23T16:29:13Z

LGTM label has been added.

Git tree hash: 5a109d1122209e58f31535536f127fa2c8ca0d60

richabanker · 2025-06-23T18:08:56Z

staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go

@@ -744,6 +746,14 @@ func (s *store) GetList(ctx context.Context, key string, opts storage.ListOption
 		})
 		metrics.RecordEtcdRequest(metricsOp, s.groupResource, err, startTime)
 		if err != nil {
+			if errors.Is(err, etcdrpc.ErrFutureRev) {
+				currentRV, getRVErr := s.GetCurrentResourceVersion(ctx)
+				if getRVErr != nil {


Do we need to surface this error somewhere, or just log it maybe?

This is already on error handling path, we only call GetCurrentResourceVersion to provide more information to error. I don't think there is a need to surface the error, but maybe returning NewTooLargeResourceVersionError with rev 0 might not be correct. Maybe we should change error.

I am currently a little busy at work. I'll get back to this in Friday 😊

I think we can improved that as a follow-up - let's merge that as this is already unifying the error types.

@wojtek-t Sounds good. Shall I create the issue, or do you want to create and assign it to me?

Please go ahead and create it

wojtek-t · 2025-06-25T08:53:11Z

/lgtm
/approve

k8s-ci-robot · 2025-06-25T08:53:26Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: PatrickLaabs, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~staging/src/k8s.io/apiserver/pkg/storage/OWNERS~~ [wojtek-t]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

PatrickLaabs · 2025-06-25T09:19:31Z

Follow-Up issue to improve returned error message:

#132526

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 18, 2025

k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jun 18, 2025

k8s-ci-robot requested review from serathius and wojtek-t June 18, 2025 14:41

k8s-ci-robot added area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/etcd Categorizes an issue or PR as relevant to SIG Etcd. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 18, 2025

serathius reviewed Jun 19, 2025

View reviewed changes

staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go Outdated Show resolved Hide resolved

serathius reviewed Jun 20, 2025

View reviewed changes

staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go Outdated Show resolved Hide resolved

fixing large resourceversion and limit for storages

ccdef28

PatrickLaabs force-pushed the 132358 branch from be284d4 to ccdef28 Compare June 20, 2025 08:35

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 20, 2025

k8s-ci-robot assigned serathius Jun 23, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 23, 2025

richabanker reviewed Jun 23, 2025

View reviewed changes

k8s-ci-robot assigned wojtek-t Jun 25, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 25, 2025

PatrickLaabs mentioned this pull request Jun 25, 2025

Follow-Up Issue to improve returned error message on Too high ResourceRevisions #132526

Open

k8s-ci-robot merged commit 49bff13 into kubernetes:master Jun 25, 2025
13 checks passed

k8s-ci-robot added this to the v1.34 milestone Jun 25, 2025

yashuatla mentioned this pull request Jun 25, 2025

Feature implementation from commits 49c20d6..49bff13 yashuatla/kubernetes#1

Open

	func (s *store) validateMinimumResourceVersion(minimumResourceVersion string, actualRevision uint64) error {
	if minimumResourceVersion == "" {
	return nil
	}
	minimumRV, err := s.versioner.ParseResourceVersion(minimumResourceVersion)
	if err != nil {
	return apierrors.NewBadRequest(fmt.Sprintf("invalid resource version: %v", err))
	}
	// Enforce the storage.Interface guarantee that the resource version of the returned data
	// "will be at least 'resourceVersion'".
	if minimumRV > actualRevision {
	return storage.NewTooLargeResourceVersionError(minimumRV, actualRevision, 0)
	}
	return nil
	}

Fixed large resourceversion and limit for storages #132374

Fixed large resourceversion and limit for storages #132374

Conversation

PatrickLaabs commented Jun 18, 2025

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

k8s-ci-robot commented Jun 18, 2025

Uh oh!

k8s-ci-robot commented Jun 18, 2025

Uh oh!

likakuli commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PatrickLaabs commented Jun 19, 2025

Uh oh!

serathius commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

PatrickLaabs commented Jun 19, 2025

Uh oh!

serathius commented Jun 20, 2025

Uh oh!

Uh oh!

serathius commented Jun 20, 2025

Uh oh!

serathius commented Jun 23, 2025

Uh oh!

k8s-ci-robot commented Jun 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wojtek-t commented Jun 25, 2025

Uh oh!

k8s-ci-robot commented Jun 25, 2025

Uh oh!

PatrickLaabs commented Jun 25, 2025

Uh oh!

Uh oh!

Uh oh!

likakuli commented Jun 19, 2025 •

edited

Loading

serathius commented Jun 19, 2025 •

edited

Loading