Skip to content

Kube proxy node manager #130837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

aroradaman
Copy link
Member

@aroradaman aroradaman commented Mar 15, 2025

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This carries the work done in #125382 consolidates and simplifies how kube-proxy deals and watches over Node objects.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

kube-proxy considers timeouts to get the node objects fatal.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 15, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/ipvs area/kube-proxy sig/network Categorizes an issue or PR as relevant to SIG Network. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 15, 2025
@aroradaman
Copy link
Member Author

/test pull-kubernetes-unit
/test pull-kubernetes-typecheck
/test pull-kubernetes-e2e-kind
/test pull-kubernetes-e2e-gce
/test pull-kubernetes-conformance-kind-ga-only-parallel

@aroradaman aroradaman force-pushed the kube-proxy-node-manager branch from be7da13 to a0c5cb5 Compare March 15, 2025 12:26
@aroradaman
Copy link
Member Author

/test pull-kubernetes-unit
/test pull-kubernetes-typecheck
/test pull-kubernetes-e2e-kind
/test pull-kubernetes-e2e-gce
/test pull-kubernetes-conformance-kind-ga-only-parallel

Copy link
Contributor

@danwinship danwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is WIP, but some initial comments...

@aroradaman aroradaman force-pushed the kube-proxy-node-manager branch from a0c5cb5 to 2ccd845 Compare March 18, 2025 15:12
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 18, 2025
@aroradaman aroradaman changed the title [WIP] Kube proxy node manager Kube proxy node manager Mar 18, 2025
@k8s-ci-robot k8s-ci-robot added sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/windows Categorizes an issue or PR as relevant to SIG Windows. labels Mar 18, 2025
@aroradaman aroradaman marked this pull request as ready for review March 18, 2025 15:12
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 18, 2025
@aroradaman aroradaman requested a review from danwinship March 18, 2025 15:12
@k8s-ci-robot k8s-ci-robot requested a review from aojea March 18, 2025 15:12
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 11, 2025
@aroradaman
Copy link
Member Author

/cc @aojea

@aroradaman aroradaman force-pushed the kube-proxy-node-manager branch from b28b193 to 7854882 Compare May 11, 2025 18:15
@aroradaman aroradaman requested a review from danwinship May 11, 2025 19:05
@@ -191,15 +191,3 @@ func filterEndpoints(endpoints []Endpoint, predicate func(Endpoint) bool) []Endp

return filteredEndpoints
}

// ExtractTopologyLabels consumes node labels and extract topology related labels if present.
func ExtractTopologyLabels(nodeLabels map[string]string) map[string]string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but only notifying when there is change in proxy related labels made more sense to me.

Right, but can't NodeTopologyConfig call ExtractTopologyLabels?

How about master/pkg/proxy/apis/well_known_labels.go for a common place?

Hm... maybe rather than having it have to be in the same file as CategorizeEndpoints, you could just add a comment to CategorizeEndpoints pointing out that if you change the code to look at any other endpoints, you need to change ExtractTopologyLabels (wherever that is) as well...

(Also, it would be best to have that code end up in the right place in the initial commit, rather than being added in one commit and then moved in the next commit. Or maybe that means swapping the order of the two commits?)

@aroradaman aroradaman force-pushed the kube-proxy-node-manager branch 2 times, most recently from e5a7b94 to 39d7e6f Compare June 8, 2025 09:40
@aroradaman
Copy link
Member Author

Hm... maybe rather than having it have to be in the same file as CategorizeEndpoints, you could just add a comment to CategorizeEndpoints pointing out that if you change the code to look at any other endpoints, you need to change ExtractTopologyLabels (wherever that is) as well...

(Also, it would be best to have that code end up in the right place in the initial commit, rather than being added in one commit and then moved in the next commit. Or maybe that means swapping the order of the two commits?)

@danwinship I have squashed the two commits together, and added a comment in CategorizeEndpoints.

@aroradaman aroradaman force-pushed the kube-proxy-node-manager branch from 39d7e6f to 000d9d2 Compare June 8, 2025 17:02
Copy link
Contributor

@danwinship danwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the delay. very close now.

}

rawNodeIPs := s.NodeManager.NodeIPs()
logger.Info("Successfully retrieved NodeIPs", "NodeIPs", rawNodeIPs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rawNodeIPs can be nil/empty here if it timed out, right? We need to log an error somewhere if that happens; even if it's not fatal, it's unexpected and possibly bad, so we should warn the user.

One possibility would be to log it from within NewNodeManager. Another would be to have s.NodeManager.NodeIPs() return ([]net.IP, error), and we can log the error, but then keep going.

Also (and this is a problem with the existing code), it's weird that we log rawNodeIPs but then we don't log anything if s.NodeIPs ends up being different from that...

So:

  • if we got no rawNodeIPs, log an error/warning
  • otherwise, log the rawNodeIPs (as above).
  • if s.NodeIPs[s.PrimaryIPFamily].IsLoopback(), log a warning about that (we do this in detectNodeIPs)
  • otherwise, if we used bindAddress to override rawNodeIPs, log the resulting NodeIPs

and probably that should either all be done here, or else all be done in detectNodeIPs, rather than splitting it up and making it hard to see for sure what gets logged

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rawNodeIPs can be nil/empty here if it timed out, right? We need to log an error somewhere if that happens;

rawNodeIPs will never be nil, assuming kubelet always sets some node IPs on the node when it first creates the Node object.
If the node manager times out waiting for existence of node object then it will throw an error that will cause kube-proxy to fail.

In the current code s.NodeManager.NodeIPs() will never be called when node manager times out.

@@ -208,3 +191,10 @@ func (n *NodeManager) OnNodeDelete(node *v1.Node) {

// OnNodeSynced is called after the cache is synced and all pre-existing Nodes have been reported
func (n *NodeManager) OnNodeSynced() {}

// Node returns the latest copy of node object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"You must not modify it"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fine if I skip this comment and return a deep copy instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could, but there are other places where we document that you're not supposed to modify a returned object (eg, Listers) so it's not bad to document that here.

@aroradaman aroradaman force-pushed the kube-proxy-node-manager branch from 000d9d2 to 7841a3e Compare June 22, 2025 10:22
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 22, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: aroradaman
Once this PR has been reviewed and has the lgtm label, please assign aojea for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 22, 2025
@aroradaman aroradaman requested a review from danwinship June 22, 2025 13:30
@danwinship
Copy link
Contributor

you dropped half of the branch on re-push...

This simplifies how the proxier receives update for change in node
labels. Instead of passing the complete Node object we just pass
the proxy relevant topology labels extracted from the complete list
of labels, and the downstream event handlers will only be notified
when there are changes in topology labels.

Signed-off-by: Daman Arora <[email protected]>
NodeManager initialises node informers, waits for cache sync and polls for
node object to retrieve NodeIPs, handle node events and crashes kube-proxy
when change in NodeIPs is detected.

Signed-off-by: Daman Arora <[email protected]>
NodeManager, if configured with to watch for PodCIDR watch, watches
for changes in PodCIDRs and crashes kube-proxy if a change is
detected in PodCIDRs.

Signed-off-by: Daman Arora <[email protected]>
ProxyHealthServer now consumes NodeManager to get the latest
updated node object for determining node eligibility.

Signed-off-by: Daman Arora <[email protected]>
For kube-proxy, node addition and node update is semantically
considered as similar event, we have exactly same handler
logic for these two events resulting in duplicate code and
unit tests.
This merges the `NodeHandler` interface methods OnNodeAdd and
OnNodeUpdate into OnNodeChange along with the implementation
of the interface.

Signed-off-by: Daman Arora <[email protected]>
@aroradaman aroradaman force-pushed the kube-proxy-node-manager branch from 7841a3e to 26a42d6 Compare June 23, 2025 12:27
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 23, 2025
@aroradaman
Copy link
Member Author

you dropped half of the branch on re-push...

damn I messed up my local branch 😓
reflog to the rescue

@aroradaman
Copy link
Member Author

/test pull-kubernetes-e2e-gce

ctr: failed to dial "/run/containerd/containerd.sock": context deadline exceeded: connection error: desc = "transport: error while dialing: dial unix:///run/containerd/containerd.sock: timeout"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ipvs area/kube-proxy cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants