kube-proxy avoid race condition using LocalModeNodeCIDR #118499

aojea · 2023-06-06T10:24:55Z

Since kube-proxy in LocalModeNodeCIDR needs to obtain the PodCIDR assigned to the node it watches for the Node object.

However, kube-proxy startup process requires to have these watches in different places, that opens the possibility of having a race condition if the same node is recreated and a different PodCIDR is assigned.

Initializing the second watch with the value obtained in the first one allows us to detect this situation.
Fixes #111321

/kind bug

fix a race condition in kube-proxy when using LocalModeNodeCIDR to avoid dropping Services traffic if the object node is recreated when kube-proxy is starting

k8s-ci-robot · 2023-06-06T10:25:05Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

aojea · 2023-06-06T10:25:16Z

/assign @danwinship @thockin

Alternative to #118458

danwinship · 2023-06-06T12:15:38Z

/lgtm
/hold
to keep an hour-old PR from merging before anyone else has had a chance to object but feel free to cancel

k8s-ci-robot · 2023-06-06T12:15:44Z

LGTM label has been added.

Git tree hash: c28e77b67ca168581469b1fd39b06e24c38a3e9d

aojea · 2023-06-06T12:56:47Z

/retest
Kubernetes e2e suite: [It] [sig-node] Pods should run through the lifecycle of Pods and PodStatus [Conformance] expand_less

aojea · 2023-06-06T13:20:38Z

/lgtm /hold to keep an hour-old PR from merging before anyone else has had a chance to object but feel free to cancel

same diff, just with one additional unit test @danwinship

danwinship · 2023-06-06T14:06:02Z

/lgtm

k8s-ci-robot · 2023-06-06T14:06:10Z

LGTM label has been added.

Git tree hash: 101f56253e8ee56bd2a26853167daf7f22b5a645

aojea · 2023-06-06T15:02:44Z

lol

Use k8s.io/utils/net ParseIPSloppy() to parse IP addresses. Kubernetes #100895

caught on my own trap XD

Since kube-proxy in LocalModeNodeCIDR needs to obtain the PodCIDR assigned to the node it watches for the Node object. However, kube-proxy startup process requires to have these watches in different places, that opens the possibility of having a race condition if the same node is recreated and a different PodCIDR is assigned. Initializing the second watch with the value obtained in the first one allows us to detect this situation. Change-Id: I6adeedb6914ad2afd3e0694dcab619c2a66135f8 Signed-off-by: Antonio Ojea <[email protected]>

thockin

Thanks!

/lgtm
/approve

thockin

Thanks!

/lgtm
/approve

k8s-ci-robot · 2023-06-06T16:03:15Z

LGTM label has been added.

Git tree hash: a399b5d94a9347506f4f141bcc4754689be3b0fd

k8s-ci-robot · 2023-06-06T16:03:15Z

LGTM label has been added.

Git tree hash: a399b5d94a9347506f4f141bcc4754689be3b0fd

k8s-ci-robot · 2023-06-06T16:03:33Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cmd/kube-proxy/OWNERS~~ [aojea,thockin]
~~pkg/proxy/OWNERS~~ [aojea,thockin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

aojea · 2023-06-06T18:05:05Z

/hold cancel

two eyes 👀 should be fair

Thanks

skmatti · 2023-06-06T18:11:51Z

cmd/kube-proxy/app/server.go

@@ -754,7 +756,7 @@ func (s *ProxyServer) Run() error {
 nodeConfig := config.NewNodeConfig(currentNodeInformerFactory.Core().V1().Nodes(), s.Config.ConfigSyncPeriod.Duration)
 // https://issues.k8s.io/111321
 if s.Config.DetectLocalMode == kubeproxyconfig.LocalModeNodeCIDR {
- nodeConfig.RegisterEventHandler(&proxy.NodePodCIDRHandler{})
+ nodeConfig.RegisterEventHandler(proxy.NewNodePodCIDRHandler(s.podCIDRs))


Does ProxyServer.Run() called always after ProxyServer.createProxier?. If the order is revered, then node controller might get initialized with nil podCIDRs?

yeah, good question, the order is like that #111321 (comment)

the problem is that we have a bit of a chaos right, we do api queries on initialisation and configuration steps, and we end with these problems 🤷

Thanks for confirming @aojea

…9-upstream-release-1.27 Automated cherry pick of #118499: kube-proxy avoid race condition using LocalModeNodeCIDR

k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jun 6, 2023

k8s-ci-robot assigned danwinship and thockin Jun 6, 2023

k8s-ci-robot added area/kube-proxy sig/network Categorizes an issue or PR as relevant to SIG Network. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 6, 2023

k8s-ci-robot requested review from bowei and MrHohn June 6, 2023 10:25

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 6, 2023

aojea mentioned this pull request Jun 6, 2023

kube-proxy detect nodePodCIDR changes in LocalModeNodeCIDR #118458

Closed

k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jun 6, 2023

aojea force-pushed the kproxy_podcidr_alt branch from 7cf5803 to 47b0186 Compare June 6, 2023 13:20

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jun 6, 2023

k8s-ci-robot requested review from danwinship and thockin June 6, 2023 13:20

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 6, 2023

aojea force-pushed the kproxy_podcidr_alt branch from 47b0186 to 26801d6 Compare June 6, 2023 15:03

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 6, 2023

thockin reviewed Jun 6, 2023

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 6, 2023

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 6, 2023

skmatti reviewed Jun 6, 2023

View reviewed changes

k8s-ci-robot merged commit 5a5ebfd into kubernetes:master Jun 6, 2023
12 checks passed

k8s-ci-robot added this to the v1.28 milestone Jun 6, 2023

aojea mentioned this pull request Jun 6, 2023

Automated cherry pick of #118499: kube-proxy avoid race condition using LocalModeNodeCIDR #118515

Merged

k8s-ci-robot added a commit that referenced this pull request Jun 7, 2023

Merge pull request #118515 from aojea/automated-cherry-pick-of-#11849…

e2cc1a3

…9-upstream-release-1.27 Automated cherry pick of #118499: kube-proxy avoid race condition using LocalModeNodeCIDR

aojea mentioned this pull request Sep 13, 2023

service controller: update node UID has changed #120630

Open

aojea mentioned this pull request Feb 15, 2024

kube-proxy: query node from apiserver cache #123286

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kube-proxy avoid race condition using LocalModeNodeCIDR #118499

kube-proxy avoid race condition using LocalModeNodeCIDR #118499

aojea commented Jun 6, 2023

k8s-ci-robot commented Jun 6, 2023

aojea commented Jun 6, 2023

danwinship commented Jun 6, 2023

k8s-ci-robot commented Jun 6, 2023

aojea commented Jun 6, 2023

aojea commented Jun 6, 2023

danwinship commented Jun 6, 2023

k8s-ci-robot commented Jun 6, 2023

aojea commented Jun 6, 2023

thockin left a comment

thockin left a comment

k8s-ci-robot commented Jun 6, 2023

k8s-ci-robot commented Jun 6, 2023

k8s-ci-robot commented Jun 6, 2023

aojea commented Jun 6, 2023

skmatti Jun 6, 2023

aojea Jun 6, 2023

skmatti Jun 7, 2023

kube-proxy avoid race condition using LocalModeNodeCIDR #118499

kube-proxy avoid race condition using LocalModeNodeCIDR #118499

Conversation

aojea commented Jun 6, 2023

k8s-ci-robot commented Jun 6, 2023

aojea commented Jun 6, 2023

danwinship commented Jun 6, 2023

k8s-ci-robot commented Jun 6, 2023

aojea commented Jun 6, 2023

aojea commented Jun 6, 2023

danwinship commented Jun 6, 2023

k8s-ci-robot commented Jun 6, 2023

aojea commented Jun 6, 2023

thockin left a comment

Choose a reason for hiding this comment

thockin left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Jun 6, 2023

k8s-ci-robot commented Jun 6, 2023

k8s-ci-robot commented Jun 6, 2023

aojea commented Jun 6, 2023

skmatti Jun 6, 2023

Choose a reason for hiding this comment

aojea Jun 6, 2023

Choose a reason for hiding this comment

skmatti Jun 7, 2023

Choose a reason for hiding this comment