Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-proxy avoid race condition using LocalModeNodeCIDR #118499

Merged
merged 1 commit into from
Jun 6, 2023

Conversation

aojea
Copy link
Member

@aojea aojea commented Jun 6, 2023

Since kube-proxy in LocalModeNodeCIDR needs to obtain the PodCIDR assigned to the node it watches for the Node object.

However, kube-proxy startup process requires to have these watches in different places, that opens the possibility of having a race condition if the same node is recreated and a different PodCIDR is assigned.

Initializing the second watch with the value obtained in the first one allows us to detect this situation.
Fixes #111321

/kind bug

fix a race condition in kube-proxy when using LocalModeNodeCIDR to avoid dropping Services traffic if the object node is recreated when kube-proxy is starting

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 6, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jun 6, 2023
@aojea
Copy link
Member Author

aojea commented Jun 6, 2023

/assign @danwinship @thockin

Alternative to #118458

@k8s-ci-robot k8s-ci-robot added area/kube-proxy sig/network Categorizes an issue or PR as relevant to SIG Network. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 6, 2023
@k8s-ci-robot k8s-ci-robot requested review from bowei and MrHohn June 6, 2023 10:25
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 6, 2023
@danwinship
Copy link
Contributor

/lgtm
/hold
to keep an hour-old PR from merging before anyone else has had a chance to object but feel free to cancel

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jun 6, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: c28e77b67ca168581469b1fd39b06e24c38a3e9d

@aojea
Copy link
Member Author

aojea commented Jun 6, 2023

/retest
Kubernetes e2e suite: [It] [sig-node] Pods should run through the lifecycle of Pods and PodStatus [Conformance] expand_less

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jun 6, 2023
@aojea
Copy link
Member Author

aojea commented Jun 6, 2023

/lgtm /hold to keep an hour-old PR from merging before anyone else has had a chance to object but feel free to cancel

same diff, just with one additional unit test @danwinship

@danwinship
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 6, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 101f56253e8ee56bd2a26853167daf7f22b5a645

@aojea
Copy link
Member Author

aojea commented Jun 6, 2023

lol

Use k8s.io/utils/net ParseIPSloppy() to parse IP addresses. Kubernetes #100895

caught on my own trap XD

Since kube-proxy in LocalModeNodeCIDR needs to obtain the PodCIDR
assigned to the node it watches for the Node object.

However, kube-proxy startup process requires to have these watches in
different places, that opens the possibility of having a race condition
if the same node is recreated and a different PodCIDR is assigned.

Initializing the second watch with the value obtained in the first one
allows us to detect this situation.

Change-Id: I6adeedb6914ad2afd3e0694dcab619c2a66135f8
Signed-off-by: Antonio Ojea <[email protected]>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 6, 2023
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

/lgtm
/approve

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 6, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: a399b5d94a9347506f4f141bcc4754689be3b0fd

1 similar comment
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: a399b5d94a9347506f4f141bcc4754689be3b0fd

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@aojea
Copy link
Member Author

aojea commented Jun 6, 2023

/hold cancel

two eyes 👀 should be fair

Thanks

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 6, 2023
@@ -754,7 +756,7 @@ func (s *ProxyServer) Run() error {
nodeConfig := config.NewNodeConfig(currentNodeInformerFactory.Core().V1().Nodes(), s.Config.ConfigSyncPeriod.Duration)
// https://issues.k8s.io/111321
if s.Config.DetectLocalMode == kubeproxyconfig.LocalModeNodeCIDR {
nodeConfig.RegisterEventHandler(&proxy.NodePodCIDRHandler{})
nodeConfig.RegisterEventHandler(proxy.NewNodePodCIDRHandler(s.podCIDRs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ProxyServer.Run() called always after ProxyServer.createProxier?. If the order is revered, then node controller might get initialized with nil podCIDRs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, good question, the order is like that #111321 (comment)

the problem is that we have a bit of a chaos right, we do api queries on initialisation and configuration steps, and we end with these problems 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming @aojea

@k8s-ci-robot k8s-ci-robot merged commit 5a5ebfd into kubernetes:master Jun 6, 2023
12 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.28 milestone Jun 6, 2023
k8s-ci-robot added a commit that referenced this pull request Jun 7, 2023
…9-upstream-release-1.27

Automated cherry pick of #118499: kube-proxy avoid race condition using LocalModeNodeCIDR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kube-proxy cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/network Categorizes an issue or PR as relevant to SIG Network. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kube-proxy in LocalModeNodeCIDR mode may cache stale Node.PodCIDR if the Node object is recreated
5 participants