Skip to content

Deny pod admission for static pods referencing API objects #131837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

sreeram-venkitesh
Copy link
Member

What type of PR is this?

/kind feature

What this PR does / why we need it:

Currently, mirror pod reconciliation for static pods which reference API objects will fail. However the pod itself is not denied admission and the node would be silently running the static pod's container. As discussed in #103587 (comment), we want to tighten the validation here so that pod would not be running at all in the node. Currently the static pod would be running in the node while the mirror pod would not be created (which is enforced in the API as per #103587 (comment)). This PR updates the validation logic so that the pod is denied admission for static pods referencing API objects and make sure that the container is not created in the node.

Which issue(s) this PR fixes:

Fixes #103587

Special notes for your reviewer:

Notes:

Question:

  • I wanted to add an e2e_node test. I was going through test/e2e_node/mirror_pod_test.go since it has helpers for static pod creation etc, but I'm still figuring out the failure case where we need to assert that the pod was denied admission from the kubelet logs?

Does this PR introduce a user-facing change?

Static pods that reference API objects are now denied admission by the kubelet so that static pods would not be silently running even after the mirror pod creation fails.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

(The feature gate should be documented. I will update the docs PR here once opened.)


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels May 18, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 18, 2025
@sreeram-venkitesh
Copy link
Member Author

CC @haircommander @liggitt Tagging from the discussion in the original issue. I'd love to hear what you folks think! Thanks!

@sreeram-venkitesh
Copy link
Member Author

/retest

1 similar comment
@sreeram-venkitesh
Copy link
Member Author

/retest

Copy link

@lmktfy lmktfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

Should we also deny static Pods that specify .spec.resources.claims and / or .spec.resourceClaims?

(We perhaps shouldn't assume that these claims will fail anyway).

@sreeram-venkitesh
Copy link
Member Author

(We perhaps shouldn't assume that these claims will fail anyway).

True, but if they don't make sense for static pods I'm okay with denying pods with .spec.resources.claims and / or .spec.resourceClaims too. I wonder if we'd want to let users create static pods with resource claims.

@sreeram-venkitesh
Copy link
Member Author

/retest

// owner: @sreeram-venkitesh
//
// Denies pod admission if static pods reference other API objects.
StaticPodStrictValidation featuregate.Feature = "StaticPodStrictValidation"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"strict" is pretty ambiguous ... let's target this specifically at static pods referencing API objects, so call this "PreventStaticPodAPIReferences" or something similar.

That will also let us make use of the same gate in the kubelet and in NodeRestriction admission

@@ -1783,6 +1788,10 @@ var defaultVersionedKubernetesFeatureGates = map[featuregate.Feature]featuregate
{Version: version.MustParse("1.31"), Default: true, PreRelease: featuregate.GA, LockToDefault: true}, // GA in 1.31, remove in 1.33
},

StaticPodStrictValidation: {
{Version: version.MustParse("1.34"), Default: false, PreRelease: featuregate.Alpha},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider starting this as beta and on-by-default ... this is a bugfix with an escape hatch mitigation for a few releases

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. Fixed this in 8f86d9d.

@@ -445,6 +445,10 @@ var (
// TODO: document the feature (owning SIG, when to use this feature for a test)
StatefulSet = framework.WithFeature(framework.ValidFeatures.Add("StatefulSet"))

// Owner: sig-node
// Denies pod admission if static pods reference other API objects.
StaticPodStrictValidation = framework.WithFeature(framework.ValidFeatures.Add("StaticPodStrictValidation"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need an e2e feature for this, there's no other configuration associated with it, if you write an e2e test, use framework.WithFeatureGate(features.PreventStaticPodAPIReferences), etc

if !kubetypes.IsStaticPod(pod) {
return PodAdmitResult{Admit: true}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a single method that takes a pod and returns an error if the pod references API objects will be easier to ensure is called correctly. It might make sense to extract the checks done at

// don't allow a node to create a pod that references any other API objects
if pod.Spec.ServiceAccountName != "" {
return admission.NewForbidden(a, fmt.Errorf("node %q can not create pods that reference a service account", nodeName))
}
hasSecrets := false
podutil.VisitPodSecretNames(pod, func(name string) (shouldContinue bool) { hasSecrets = true; return false }, podutil.AllContainers)
if hasSecrets {
return admission.NewForbidden(a, fmt.Errorf("node %q can not create pods that reference secrets", nodeName))
}
hasConfigMaps := false
podutil.VisitPodConfigmapNames(pod, func(name string) (shouldContinue bool) { hasConfigMaps = true; return false }, podutil.AllContainers)
if hasConfigMaps {
return admission.NewForbidden(a, fmt.Errorf("node %q can not create pods that reference configmaps", nodeName))
}
for _, vol := range pod.Spec.Volumes {
if vol.VolumeSource.Projected != nil {
for _, src := range vol.VolumeSource.Projected.Sources {
if src.ClusterTrustBundle != nil {
return admission.NewForbidden(a, fmt.Errorf("node %q can not create pods that reference clustertrustbundles", nodeName))
}
}
}
}
for _, v := range pod.Spec.Volumes {
if v.PersistentVolumeClaim != nil {
return admission.NewForbidden(a, fmt.Errorf("node %q can not create pods that reference persistentvolumeclaims", nodeName))
}
}
if len(pod.Spec.ResourceClaims) > 0 {
return admission.NewForbidden(a, fmt.Errorf("node %q can not create pods that reference resourceclaims", nodeName))
}
to a method in pkg/api/pod that deals with an internal Pod.

Then the NodeRestriction admission plugin could call that, and the kubelet could call it from tryDecodeSinglePod / tryDecodePodList where it has an internal pod, and only hits that code path when loading static pods.

Copy link
Member Author

@sreeram-venkitesh sreeram-venkitesh May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt Please check 4809d0f. I've extracted the logic to a method in pkg/api/pod/util.go. This is then called from both NodeRestriction admission's admitPodCreate and tryDecodeSinglePod for static pods.

I have a question about tryDecodeSinglePod. Does this only process static pods? I see that this is called from pkg/kubelet/config/file.go and pkg/kubelet/config/http.go. The file one makes sense - reading static pod manifests, but I was not sure about the tryDecodeSinglePod call here in extractFromURL in pkg/kubelet/config/http.go. generatePodName also tells me this is static pod specific.

One more question: Compared to my earlier approach of adding a new AdmitHandler, this logic would now be executed only when parsing static pods. The AdmitHandler approach is redundant because,it would process all pods and then inside the handler check if the pod is a static pod. But we can avoid checking all the pods. Am I correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kubelet can load static pods from file or from a non-API server URL. Both of those should have this enforcement.

doing the enforcement there would be instead of a kubelet admission handler

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I forgot that we can specify --manifest-url too. Thanks!

@sreeram-venkitesh sreeram-venkitesh force-pushed the static-pod-strict-validation-for-api-object-reference branch from 8f86d9d to 4809d0f Compare May 21, 2025 07:24
@k8s-ci-robot k8s-ci-robot added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label May 21, 2025
@sreeram-venkitesh sreeram-venkitesh force-pushed the static-pod-strict-validation-for-api-object-reference branch from dfad17a to aab9a2f Compare May 21, 2025 09:20
ErrStaticPodResourceClaims = errors.New("can not create static pods that reference resourceclaims")
)

func HasAPIObjectReferences(pod *api.Pod) (bool, error) {
Copy link
Member

@liggitt liggitt May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document what this does, something like "returns true if the pod spec references any API objects, along with the plural resource of the first reference encountered"

I would make this return a string resource name, rather than do the error wrapping here and exposing error variables

// HasAPIObjectReference returns true if a reference to an API object is found in the pod spec,
// along with the plural resource of the referenced API type, or an error if an unknown field is encountered.
func HasAPIObjectReference(pod *api.Pod) (bool, string, error)

@bart0sh bart0sh moved this from Triage to not-only-sig-node in SIG Node: code and documentation PRs Jun 10, 2025
@sreeram-venkitesh sreeram-venkitesh force-pushed the static-pod-strict-validation-for-api-object-reference branch from 0c512ae to c05007e Compare June 11, 2025 17:25
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 11, 2025
@sreeram-venkitesh sreeram-venkitesh force-pushed the static-pod-strict-validation-for-api-object-reference branch from c05007e to bedcb38 Compare June 12, 2025 17:35
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 12, 2025
@SergeyKanzhelev SergeyKanzhelev moved this from not-only-sig-node to Waiting on Author in SIG Node: code and documentation PRs Jun 13, 2025
@liggitt
Copy link
Member

liggitt commented Jun 16, 2025

/retest

@sreeram-venkitesh
Copy link
Member Author

@liggitt Can I squash and clean up the commits here?

@liggitt
Copy link
Member

liggitt commented Jun 25, 2025

yup, looks good to squash, thanks!

@sreeram-venkitesh sreeram-venkitesh force-pushed the static-pod-strict-validation-for-api-object-reference branch from c92f30a to f9a5aec Compare June 25, 2025 18:52
@sreeram-venkitesh
Copy link
Member Author

/retest

@liggitt
Copy link
Member

liggitt commented Jun 26, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 26, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 92ca3dde1f4a5d2a04d90c55b28d3e32557f904f

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, sreeram-venkitesh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 26, 2025
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@sreeram-venkitesh
Copy link
Member Author

/retest

@k8s-ci-robot k8s-ci-robot merged commit b7c9333 into kubernetes:master Jun 26, 2025
16 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.34 milestone Jun 26, 2025
@github-project-automation github-project-automation bot moved this from Archive-it to Done in SIG Node CI/Test Board Jun 26, 2025
@github-project-automation github-project-automation bot moved this from In Review to Closed / Done in SIG Auth Jun 26, 2025
@github-project-automation github-project-automation bot moved this from Waiting on Author to Done in SIG Node: code and documentation PRs Jun 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: Closed / Done
Development

Successfully merging this pull request may close these issues.

Kubelet should enforce that static pods do not reference other API objects
6 participants