LegacyServiceAccountTokenCleanUp alpha #115554

yt2985 · 2023-02-06T18:24:11Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Start to clean up auto-generated service account token.

Special notes for your reviewer:

Does this PR introduce a user-facing change?

kube-controller-manager: The `LegacyServiceAccountTokenCleanUp` feature gate is now available as alpha (off by default). When enabled, the `legacy-service-account-token-cleaner` controller loop removes service account token secrets that have not been used in the time specified by `--legacy-service-account-token-clean-up-period` (defaulting to one year), **and are** referenced from the `.secrets` list of a ServiceAccount object, **and are not** referenced from pods.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

-[KEP]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-auth/2799-reduction-of-secret-based-service-account-token

k8s-ci-robot · 2023-02-06T18:24:21Z

Hi @yt2985. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

yt2985 · 2023-02-06T18:25:58Z

Open the PR first. I am still working on the integration test and may be the e2e test.

k8s-triage-robot · 2023-02-06T19:06:43Z

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

cici37 · 2023-02-07T21:10:58Z

/remove-sig api-machinery

pkg/controller/serviceaccount/config/types.go

yt2985 · 2023-05-16T05:37:43Z

/test pull-kubernetes-conformance-kind-ga-only-parallel

liggitt

Implementation looks pretty close, I haven't looked at test coverage yet

Thinking about someone rolling this out, wondering if we'll be happy with all long-unused token secrets getting deleted at the first kube-controller-manager startup. Things like dry-run behavior or rate-limiting come to mind as things we might want to make sure the results are what we expect.

cc @deads2k for any thoughts along those lines on this?

cmd/kube-controller-manager/app/options/legacyserviceaccounttokencleaner.go

cmd/kube-controller-manager/app/core.go

pkg/controller/serviceaccount/legacy_serviceaccount_token_cleaner.go

liggitt · 2023-05-16T20:47:42Z

pkg/controller/serviceaccount/legacy_serviceaccount_token_cleaner.go

+ if podMountedSecrets[secretNamespace] != nil {
+ return podMountedSecrets[secretNamespace], nil
+ }


look for existence rather than non-nil:

if secrets, ok := podMountedSecrets[secretNamespace]; ok { return secrets }

liggitt · 2023-05-16T20:49:10Z

pkg/controller/serviceaccount/legacy_serviceaccount_token_cleaner.go

+ if podMountedSecrets[secretNamespace] != nil {
+ return podMountedSecrets[secretNamespace], nil
+ }
+ podMountedSecrets[secretNamespace] = sets.NewString()


don't store here, otherwise we cache an empty set in an error case, then have the possibility of returning the cached-and-wrong empty set the next time getMountedSecretNames is invoked on the namespace

liggitt · 2023-05-16T20:50:47Z

pkg/controller/serviceaccount/legacy_serviceaccount_token_cleaner.go

+ for _, pod := range podList {
+ podutil.VisitPodSecretNames(pod, func(secretName string) bool {
+ podMountedSecrets[secretNamespace].Insert(secretName)
+ return true
+ })
+ }
+ return podMountedSecrets[secretNamespace], nil


use a nil set until we know we need to store secret names:

var secrets sets.String for _, pod := range podList { podutil.VisitPodSecretNames(pod, func(secretName string) bool { if secrets == nil { secrets = sets.NewString() } secrets.Insert(secretName) return true }) }

then after we've constructed the set without errors, cache it:

podMountedSecrets[secretNamespace] = secrets return secrets, nil

liggitt · 2023-05-16T20:56:27Z

pkg/controller/serviceaccount/legacy_serviceaccount_token_cleaner.go

+ if err != nil {
+ return time.Time{}, fmt.Errorf("error parsing trackedSince time: %v", err)
+ }
+ return trackedSinceTime.Add(24 * time.Hour), nil


I think we want to do trackedSinceTime.AddDate(0,0,1) rather than assume 24 hours from 00:00 on the tracked-since date covers all possible times in that date (daylight saving time can put 25 hours in a nominal date)

this also deserves a comment on this line and on the godoc for the function

and latestPossibleTrackedSinceTime might be a better function name

this also deserves a comment on this line and on the godoc for the function

Hi Jordan, may I know what is the godoc here?

pkg/controller/serviceaccount/legacy_serviceaccount_token_cleaner.go

yt2985 · 2023-05-18T06:23:06Z

/test pull-kubernetes-e2e-kind

deads2k · 2023-05-23T13:43:17Z

Thinking about someone rolling this out, wondering if we'll be happy with all long-unused token secrets getting deleted at the first kube-controller-manager startup. Things like dry-run behavior or rate-limiting come to mind as things we might want to make sure the results are what we expect.

cc @deads2k for any thoughts along those lines on this?

I've thought a bit about this and I can't think of a "gentle" way to remove a long-unused token gradually via dry-run or rate-limiting because no one will notice until they're broken and we're talking about something that runs infrequently to even hit this. The rate-limit would have to be order of years for someone to notice in time to matter.

I did have one different thought, but it would push out the deletion timeframe. What if tokens that hadn't been used in a year were made invalid for use against the kube-apiserver, but remained in the API. That would force client-side failures, with a cluster-admin able to "reset" a secret by removing the last-used annotation. The pattern this would produce is

nothing uses token for a very long time
instead of deleting the secret, it's marked as "not valid against kube-apiserver"
rarely run client fails with, "secret is no longer valid"
client now knows it needs to update itself
for the current run, the last-used annotation can be reset to avoid impending bad-things
another year after being marked, "not valid against kube", the secret is deleted.

A year is a long time to not use a token. I don't have strong opinions about whether the extra step is worth the effort, but it does provide a way to break people so they know they must change, but also provide immediate relief that doesn't require distributing new tokens. That ability to avoid distribution is the only meaningful difference between this approach and telling someone to create a new SA token secret.

zshihang · 2023-05-23T16:21:49Z

I did have one different thought, but it would push out the deletion timeframe. What if tokens that hadn't been used in a year were made invalid for use against the kube-apiserver, but remained in the API. That would force client-side failures, with a cluster-admin able to "reset" a secret by removing the last-used annotation. The pattern this would produce is

nothing uses token for a very long time

instead of deleting the secret, it's marked as "not valid against kube-apiserver"

rarely run client fails with, "secret is no longer valid"

client now knows it needs to update itself

for the current run, the last-used annotation can be reset to avoid impending bad-things

another year after being marked, "not valid against kube", the secret is deleted.

A year is a long time to not use a token. I don't have strong opinions about whether the extra step is worth the effort, but it does provide a way to break people so they know they must change, but also provide immediate relief that doesn't require distributing new tokens. That ability to avoid distribution is the only meaningful difference between this approach and telling someone to create a new SA token secret.

this two-phase deletion approach equals to the existing approach:

assume waiting for a year before marking as "not valid against kube-apiserver"
if a year passed after being marked, delete.

does this act as the same as the existing approach where we can set the wait time to be two years?

do we think one year is long enough to consider a auto legacy token being unused? i think it is.

liggitt · 2023-05-24T15:28:22Z

this two-phase deletion approach equals to the existing approach:

assume waiting for a year before marking as "not valid against kube-apiserver"

if a year passed after being marked, delete.

does this act as the same as the existing approach where we can set the wait time to be two years?

Not quite the same, it makes the revocation reversible... marking a token as invalid without deleting it means that action can be undone without needing to distribute a new credential to the impacted user.

A year is a long time to not use a token. I don't have strong opinions about whether the extra step is worth the effort, but it does provide a way to break people so they know they must change, but also provide immediate relief that doesn't require distributing new tokens.

I also don't have strong feelings on this. There's two aspects I'm trying to decide if we should provide:

ability to recover if it deletes something that turned out to be very-rarely-used (your suggestion would cover this)
ability to dry-run this controller in some way ("what would you delete (or at least, how many things would you delete)?")

liggitt · 2023-05-24T15:29:01Z

that said, I don't think we need those for the alpha... let's add an item to the KEP to resolve this question before graduating to beta

pkg/controller/serviceaccount/legacy_serviceaccount_token_cleaner.go

plugin/pkg/auth/authorizer/rbac/bootstrappolicy/controller_policy.go

test/integration/serviceaccount/legacy_service_account_token_clean_up_test.go

liggitt · 2023-05-26T14:41:56Z

/lgtm
/approve

k8s-ci-robot · 2023-05-26T14:42:02Z

LGTM label has been added.

Git tree hash: 2274f3d84ec93368cd6d5b3db6cf5581ce01289a

k8s-ci-robot · 2023-05-26T14:42:30Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, yt2985

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~api/OWNERS~~ [liggitt]
~~cmd/kube-controller-manager/OWNERS~~ [liggitt]
~~pkg/controller/apis/config/OWNERS~~ [liggitt]
~~pkg/controller/serviceaccount/OWNERS~~ [liggitt]
~~pkg/features/OWNERS~~ [liggitt]
~~pkg/generated/openapi/OWNERS~~ [liggitt]
~~plugin/pkg/auth/authorizer/OWNERS~~ [liggitt]
~~staging/src/k8s.io/kube-controller-manager/config/OWNERS~~ [liggitt]
~~test/OWNERS~~ [liggitt]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Feb 6, 2023

yt2985 changed the title ~~Clean sa~~ LegacyServiceAccountTokenCleanUp alpha Feb 6, 2023

k8s-ci-robot requested review from cheftako and enj February 6, 2023 18:24

zshihang mentioned this pull request Feb 6, 2023

Reduction of Secret-based Service Account Tokens kubernetes/enhancements#2799

Closed

39 tasks

yt2985 force-pushed the cleanSA branch from 4a3cb07 to 112827d Compare February 7, 2023 18:52

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 7, 2023

yt2985 force-pushed the cleanSA branch 2 times, most recently from ed33782 to ad9bebc Compare February 7, 2023 18:58

k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Feb 7, 2023

zshihang reviewed Feb 10, 2023

View reviewed changes

pkg/controller/serviceaccount/config/types.go Show resolved Hide resolved

yt2985 force-pushed the cleanSA branch from f871476 to 58ebd0f Compare May 16, 2023 16:23

liggitt reviewed May 16, 2023

View reviewed changes

pkg/controller/serviceaccount/legacy_serviceaccount_token_cleaner.go Outdated Show resolved Hide resolved

yt2985 force-pushed the cleanSA branch from 58ebd0f to 5ace185 Compare May 18, 2023 05:46

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 18, 2023

yt2985 force-pushed the cleanSA branch from 5ace185 to 0c77d84 Compare May 18, 2023 06:56

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 23, 2023

deads2k closed this May 23, 2023

deads2k reopened this May 23, 2023

yt2985 force-pushed the cleanSA branch from 0c77d84 to 463202c Compare May 23, 2023 16:42

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 23, 2023

liggitt reviewed May 24, 2023

View reviewed changes

implement LegacyServiceAccountTokenCleanUp alpha

133eff3

yt2985 force-pushed the cleanSA branch from 463202c to 133eff3 Compare May 24, 2023 23:20

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 26, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 26, 2023

k8s-ci-robot merged commit c35a277 into kubernetes:master May 26, 2023
13 of 14 checks passed

yt2985 mentioned this pull request May 26, 2023

Add LegacyServiceAccountTokenCleanUp feature in alpha; move LegacyServiceAccountTokenTracking to GA kubernetes/website#41341

Merged

yt2985 mentioned this pull request Oct 18, 2023

Add LegacyServiceAccountTokenCleanUp feature in beta kubernetes/website#43563

Merged

jaskaransarkaria mentioned this pull request Jun 5, 2024

Planning upgrade to EKS 1.28 ministryofjustice/cloud-platform#5570

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LegacyServiceAccountTokenCleanUp alpha #115554

LegacyServiceAccountTokenCleanUp alpha #115554

yt2985 commented Feb 6, 2023 •

edited by liggitt

k8s-ci-robot commented Feb 6, 2023

yt2985 commented Feb 6, 2023

k8s-triage-robot commented Feb 6, 2023

cici37 commented Feb 7, 2023

yt2985 commented May 16, 2023

liggitt left a comment

liggitt May 16, 2023

liggitt May 16, 2023

liggitt May 16, 2023

liggitt May 16, 2023

liggitt May 16, 2023

liggitt May 16, 2023

yt2985 May 18, 2023

yt2985 commented May 18, 2023

deads2k commented May 23, 2023

zshihang commented May 23, 2023

liggitt commented May 24, 2023

liggitt commented May 24, 2023

liggitt commented May 26, 2023

k8s-ci-robot commented May 26, 2023

k8s-ci-robot commented May 26, 2023

LegacyServiceAccountTokenCleanUp alpha #115554

LegacyServiceAccountTokenCleanUp alpha #115554

Conversation

yt2985 commented Feb 6, 2023 • edited by liggitt

What type of PR is this?

What this PR does / why we need it:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Feb 6, 2023

yt2985 commented Feb 6, 2023

k8s-triage-robot commented Feb 6, 2023

cici37 commented Feb 7, 2023

yt2985 commented May 16, 2023

liggitt left a comment

Choose a reason for hiding this comment

liggitt May 16, 2023

Choose a reason for hiding this comment

liggitt May 16, 2023

Choose a reason for hiding this comment

liggitt May 16, 2023

Choose a reason for hiding this comment

liggitt May 16, 2023

Choose a reason for hiding this comment

liggitt May 16, 2023

Choose a reason for hiding this comment

liggitt May 16, 2023

Choose a reason for hiding this comment

yt2985 May 18, 2023

Choose a reason for hiding this comment

yt2985 commented May 18, 2023

deads2k commented May 23, 2023

zshihang commented May 23, 2023

liggitt commented May 24, 2023

liggitt commented May 24, 2023

liggitt commented May 26, 2023

k8s-ci-robot commented May 26, 2023

k8s-ci-robot commented May 26, 2023

yt2985 commented Feb 6, 2023 •

edited by liggitt