Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove last endpoint for kubernetes Service during graceful shutdown of final kube-apiserver #116685

Merged
merged 1 commit into from
Apr 28, 2023

Conversation

nayihz
Copy link
Contributor

@nayihz nayihz commented Mar 16, 2023

What type of PR is this?

/kind bug

What this PR does / why we need it:

The IP of the last one apiserver isn't removed from k8s svc endpoints on shutdown. Clients (using in-cluster API mode) will try to continue talking/connecting to that instance even after the apiserver is dead.

Which issue(s) this PR fixes:

Fixes #115804

Special notes for your reviewer:

Does this PR introduce a user-facing change?

kube-apiserver always removes its endpoint from kubernetes service during graceful shutdown (even if it's the only/last one)

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 16, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @czybjtu. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 16, 2023
@nayihz
Copy link
Contributor Author

nayihz commented Mar 17, 2023

fix it according to your suggestion, please take a look when you have time. @aojea

@nayihz nayihz force-pushed the fix_lease_remove_endpoints branch from b99e83c to 3600d59 Compare March 17, 2023 12:36
@nayihz nayihz marked this pull request as ready for review March 17, 2023 12:38
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 17, 2023
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 17, 2023
pkg/controlplane/reconcilers/lease.go Outdated Show resolved Hide resolved
@@ -503,6 +503,16 @@ func TestLeaseRemoveEndpoints(t *testing.T) {
expectUpdate: makeEndpointsArray("foo", []string{"4.3.2.2", "4.3.2.3", "4.3.2.4"}, []corev1.EndpointPort{{Name: "foo", Port: 8080, Protocol: "TCP"}}),
expectLeases: []string{"4.3.2.2", "4.3.2.3", "4.3.2.4"},
},
{
testName: "the last apiserver was shutdown",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
testName: "the last apiserver was shutdown",
testName: "the last API server was shut down cleanly",

Copy link
Contributor

@sftim sftim Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 if we know that this is the last kube-apiserver in a cluster, which should we do:

  • remove the endpoint(s) representing us from the relevant EndpointSlice(s)
  • delete those EndpointSlice(s) (the first API server to start up would need to make a new one)

From an architectural / philosophical point of view, I'm not sure which feels more appropriate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • remove the endpoint(s) representing us from the relevant EndpointSlice(s)
  • delete those EndpointSlice(s) (the first API server to start up would need to make a new one)

the former, because you can never know if your are really the last one, or you are restarting, or .... I think that each apiserver should modify only its own IP, that is the one he is authoritative

@sftim
Copy link
Contributor

sftim commented Mar 17, 2023

This change is visible to end users. We should provide a release note.

@sftim
Copy link
Contributor

sftim commented Mar 17, 2023

/retitle Remove last endpoint for kubernetes Service during graceful shutdown of final kube-apiserver

@k8s-ci-robot k8s-ci-robot changed the title remove last ip when apiserver was shut down Remove last endpoint for kubernetes Service during graceful shutdown of final kube-apiserver Mar 17, 2023
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 19, 2023
Comment on lines 527 to 542
r.StopReconciling()
err = r.RemoveEndpoints(test.serviceName, netutils.ParseIPSloppy(test.ip), test.endpointPorts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this no affect the other tests?
we may add more test cases to exercise the races you mentioned during the review

Copy link
Contributor Author

@nayihz nayihz Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r.StopReconciling() has no side effect to current tests because the precondition for executing TestLeaseRemoveEndpoints function is that the apiserver has already been shutdown. It's better to add test case for the scenario of apiserver start up.

we may add more test cases to exercise the races

Yes, I'll do this later.

@aojea
Copy link
Member

aojea commented Mar 19, 2023

/lgtm cancel

the tests can be improved, also I couldn't look deeper to see if modifying the existing one can cause issues

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 19, 2023
@nayihz nayihz force-pushed the fix_lease_remove_endpoints branch from 36ccb20 to 35519b1 Compare March 20, 2023 12:16
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 20, 2023
@nayihz nayihz force-pushed the fix_lease_remove_endpoints branch from 35519b1 to e567490 Compare March 20, 2023 13:15
@nayihz
Copy link
Contributor Author

nayihz commented Mar 21, 2023

/retest-required

@cici37
Copy link
Contributor

cici37 commented Mar 21, 2023

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 21, 2023
@nayihz
Copy link
Contributor Author

nayihz commented Apr 27, 2023

kindly ping @aojea

@aojea
Copy link
Member

aojea commented Apr 27, 2023

kindly ping @aojea

yeah, if you don't ping me I would nto remember sorry, let me check it

@aojea
Copy link
Member

aojea commented Apr 27, 2023

/lgtm
/assign @wojtek-t

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 27, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 0394723b279d416652ed5f1ee1d73fc921ceee6f

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Apr 28, 2023
@wojtek-t
Copy link
Member

@aojea I have added release note - PTAL.

Overall this looks good - thanks for making this happen.

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: czybjtu, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 28, 2023
@k8s-ci-robot k8s-ci-robot merged commit f66e1a3 into kubernetes:master Apr 28, 2023
11 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.28 milestone Apr 28, 2023
@nayihz nayihz deleted the fix_lease_remove_endpoints branch April 28, 2023 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Last apiserver to shutdown doesn't remove its IP from k8s svc endpoints
6 participants