Add support for CRI `ErrSignatureValidationFailed` #117717

saschagrunert · 2023-05-02T07:43:14Z

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This allows container runtimes to propagate an image signature verification error through the CRI and display that to the end user during image pull. There is no other behavioral difference compared to a regular image pull failure.

Which issue(s) this PR fixes:

Follow-up on #117612

Special notes for your reviewer:

None

Does this PR introduce a user-facing change?

Allow container runtimes to use `ErrSignatureValidationFailed` as possible image pull failure.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

None

saschagrunert · 2023-05-02T07:43:41Z

/priority important-soon
/sig node

saschagrunert · 2023-05-02T07:43:50Z

@kubernetes/sig-node-pr-reviews PTAL

dims · 2023-05-02T12:19:10Z

/assign @mrunalp @SergeyKanzhelev

the string check feels a little brittle. but generally ok from me.

pkg/kubelet/images/image_manager.go

endocrimes · 2023-05-02T13:59:12Z

/triage accepted
/priority important-soon

haircommander · 2023-05-02T17:07:00Z

pkg/kubelet/images/image_manager.go

@@ -164,6 +164,10 @@ func (m *imageManager) EnsureImageExists(ctx context.Context, pod *v1.Pod, conta
 msg := fmt.Sprintf("image pull failed for %s because the registry is unavailable.", container.Image)
 return "", msg, imagePullResult.err
 }
+ if imagePullResult.err.Error() == ErrInvalidSignature.Error() {
+ msg := fmt.Sprintf("image pull failed for %s because the signature is invalid.", container.Image)
+ return "", msg, imagePullResult.err


should there be a difference in behavior? like should the kubelet stop trying to pull or something?

That's a good question. Right now all CRI errors behave in the same way:

Initial pull via the CRI

If failed, show the error for 10s in the status, like:
NAMESPACE NAME READY STATUS RESTARTS AGE default test 0/1 RegistryUnavailable 0 1s

On backoff time hit, show a new status:
NAMESPACE NAME READY STATUS RESTARTS AGE default test 0/1 ImagePullBackOff 0 16s

After the backoff times out, we retry and show the desired error for another 10s:
NAMESPACE NAME READY STATUS RESTARTS AGE default test 0/1 RegistryUnavailable 0 52s

…

The backoff time increases, means we mostly will show ImagePullBackOff to the end users and they have to check the events to get an idea about what is going on:

> k describe pod test | tail -n 4 Warning Failed 2m59s (x4 over 4m28s) kubelet Failed to pull image "localhost:5000/foo:1.0.0": RegistryUnavailable Warning Failed 2m59s (x4 over 4m28s) kubelet Error: RegistryUnavailable Warning Failed 2m47s (x6 over 4m27s) kubelet Error: ImagePullBackOff Normal BackOff 2m35s (x7 over 4m27s) kubelet Back-off pulling image "localhost:5000/foo:1.0.0"

The backoff has no direct API to specify a time "unlimited" as far as I can see. Removing the entry within the backoff will simply re-pull every 10s.

Means to exclude an image from pull, we would require something like an exclude map storing the last error (reason) why we do not re-pull it again. I can follow-up on that but it seems out of scope of this PR.

fair enough, agreed

you could imagine a transient signature check failure due to e.g. reporting this error when signature fails to fetch due to a network flake etc? I think it should probably keep retrying

I agree, we should keep trying it again and not changing the loop.

endocrimes

/lgtm

saschagrunert · 2023-05-03T15:21:05Z

/retest

haircommander · 2023-05-03T15:38:56Z

still
/lgtm

k8s-ci-robot · 2023-05-03T15:39:03Z

LGTM label has been added.

Git tree hash: 0e6e39de84c8f1c8938b3e4adb7000acb67b0a64

SergeyKanzhelev · 2023-05-03T15:41:50Z

pkg/kubelet/images/image_manager.go

+func evalCRIPullErr(container *v1.Container, err error) (errMsg string, errRes error) {
+ // Error assertions via errors.Is is not supported by gRPC (remote runtime) errors right now.
+ // See https://github.com/grpc/grpc-go/issues/3616
+ if err.Error() == ErrRegistryUnavailable.Error() {


should those errors be defined in CRI API package or have a comment for API list for OOMKilled?

kubernetes/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1/api.proto

Line 1230 in a6825c8

// Must be set to "OOMKilled" for containers terminated by cgroup-based Out-of-Memory killer.

I moved them into the CRI API errors package, would that fit better?

This allows container runtimes to propagate an image signature verification error through the CRI and display that to the end user during image pull. There is no other behavioral difference compared to a regular image pull failure. Signed-off-by: Sascha Grunert <[email protected]>

SergeyKanzhelev

/lgtm

thank you for moving errors to the cri api package!

k8s-ci-robot · 2023-05-04T15:35:57Z

LGTM label has been added.

Git tree hash: 048b480c5fdc0c78bbb78943c33e31fe88d022e7

k8s-ci-robot · 2023-05-05T15:21:54Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: endocrimes, mrunalp, saschagrunert, SergeyKanzhelev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/kubelet/OWNERS~~ [mrunalp]
~~staging/src/k8s.io/cri-api/pkg/OWNERS~~ [mrunalp]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

This allows to not re-pull the image if the signature verification failed by using a simple map for storing the previous error. Follow-up on: kubernetes#117717 (comment) Signed-off-by: Sascha Grunert <[email protected]>

k8s-ci-robot requested review from feiskyer and krmayankk May 2, 2023 07:43

k8s-ci-robot added the area/kubelet label May 2, 2023

bart0sh added this to Triage in SIG Node PR Triage May 2, 2023

k8s-ci-robot assigned mrunalp and SergeyKanzhelev May 2, 2023

endocrimes reviewed May 2, 2023

View reviewed changes

pkg/kubelet/images/image_manager.go Outdated Show resolved Hide resolved

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 2, 2023

endocrimes moved this from Triage to Waiting on Author in SIG Node PR Triage May 2, 2023

haircommander reviewed May 2, 2023

View reviewed changes

saschagrunert force-pushed the invalid-signature-error branch from 5bdbe17 to 1bf0803 Compare May 3, 2023 08:23

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 3, 2023

endocrimes approved these changes May 3, 2023

View reviewed changes

k8s-ci-robot requested review from haircommander, mrunalp and SergeyKanzhelev May 3, 2023 15:14

saschagrunert changed the title ~~Add support for CRI ErrInvalidSignature~~ Add support for CRI ErrSignatureValidationFailed May 3, 2023

saschagrunert force-pushed the invalid-signature-error branch from 5539fc3 to 50aa9a8 Compare May 3, 2023 15:22

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 3, 2023

SergeyKanzhelev reviewed May 3, 2023

View reviewed changes

saschagrunert force-pushed the invalid-signature-error branch from 50aa9a8 to 50ca0b7 Compare May 4, 2023 06:32

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 4, 2023

k8s-ci-robot requested a review from SergeyKanzhelev May 4, 2023 06:32

saschagrunert force-pushed the invalid-signature-error branch from 50ca0b7 to 63b69dd Compare May 4, 2023 06:34

SergeyKanzhelev approved these changes May 4, 2023

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 4, 2023

SergeyKanzhelev mentioned this pull request May 4, 2023

CRItest for invalid registry and bad signature error codes kubernetes-sigs/cri-tools#1158

Open

mrunalp approved these changes May 5, 2023

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 5, 2023

k8s-ci-robot merged commit af92da5 into kubernetes:master May 5, 2023
12 checks passed

SIG Node PR Triage automation moved this from Waiting on Author to Done May 5, 2023

k8s-ci-robot added this to the v1.28 milestone May 5, 2023

saschagrunert deleted the invalid-signature-error branch May 8, 2023 07:11

This was referenced May 8, 2023

Exclude ErrSignatureValidationFailed from pull backoff #117857

Closed

Allow runtimes to provide additional context on CRI pull errors #117935

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for CRI `ErrSignatureValidationFailed` #117717

Add support for CRI `ErrSignatureValidationFailed` #117717

saschagrunert commented May 2, 2023 •

edited

saschagrunert commented May 2, 2023

saschagrunert commented May 2, 2023

dims commented May 2, 2023

endocrimes commented May 2, 2023

haircommander May 2, 2023

saschagrunert May 3, 2023

haircommander May 3, 2023

BenTheElder May 11, 2023

saschagrunert May 15, 2023

endocrimes left a comment

saschagrunert commented May 3, 2023

haircommander commented May 3, 2023

k8s-ci-robot commented May 3, 2023

SergeyKanzhelev May 3, 2023

saschagrunert May 4, 2023

SergeyKanzhelev left a comment

k8s-ci-robot commented May 4, 2023

k8s-ci-robot commented May 5, 2023

Add support for CRI ErrSignatureValidationFailed #117717

Add support for CRI ErrSignatureValidationFailed #117717

Conversation

saschagrunert commented May 2, 2023 • edited

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

saschagrunert commented May 2, 2023

saschagrunert commented May 2, 2023

dims commented May 2, 2023

endocrimes commented May 2, 2023

haircommander May 2, 2023

Choose a reason for hiding this comment

saschagrunert May 3, 2023

Choose a reason for hiding this comment

haircommander May 3, 2023

Choose a reason for hiding this comment

BenTheElder May 11, 2023

Choose a reason for hiding this comment

saschagrunert May 15, 2023

Choose a reason for hiding this comment

endocrimes left a comment

Choose a reason for hiding this comment

saschagrunert commented May 3, 2023

haircommander commented May 3, 2023

k8s-ci-robot commented May 3, 2023

SergeyKanzhelev May 3, 2023

Choose a reason for hiding this comment

saschagrunert May 4, 2023

Choose a reason for hiding this comment

SergeyKanzhelev left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented May 4, 2023

k8s-ci-robot commented May 5, 2023

Add support for CRI `ErrSignatureValidationFailed` #117717

Add support for CRI `ErrSignatureValidationFailed` #117717

saschagrunert commented May 2, 2023 •

edited