-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KCCM] service controller: add providerID
Node predicate
#117388
[KCCM] service controller: add providerID
Node predicate
#117388
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
803b628
to
9c0153a
Compare
Changelog suggestion Changed the cloud controller manager to make `providerID` a predicate when synchronizing nodes.
This change allows load balancer integrations to ensure that the `providerID` is set when configuring
load balancers and targets. |
How might we account for this? |
We should document the contract between cloud provider and core of Kubernetes; I worry that right now we can't work out which side isn't keeping to the bargain. |
/assign @andrewsykim @nckturner |
I am fine with that. I updated the release-note comment.
I don't have a good answer unfortunately. Tracking/finding cloud provider implementations which are depending on the cloud-controller, currently requires finding which cloud provider is implementing the interface (usually: by grokking GitHub), reading the implementation and interpreting it. There is no contract defined, for which Node fields the cloud provider implementations requires as to successfully sync - at least to best of my knowledge. I would however argue that the |
/lgtm |
LGTM label has been added. Git tree hash: 056acafd4374ed71d7e6664258e56b18541bbbc8
|
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alexanderConstantinescu, dims, mmerkes The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -970,6 +975,10 @@ func nodeIncludedPredicate(node *v1.Node) bool { | |||
return !hasExcludeBalancerLabel | |||
} | |||
|
|||
func nodeHasProviderIDPredicate(node *v1.Node) bool { | |||
return node.Spec.ProviderID != "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no case where someone runs without this set? It is strictly optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good question...I assumed that to be the case, but I can't tell for sure :/
The safest might be to just trigger a resync in shouldSyncUpdatedNode
due to changes to the field, but not actually require that nodes have the field set?
I don't know why it would ever go from having a value to no value, but I guess it's allowed. If we don't demand a value, we are back to the orginal bug report, no? We will try to sync nodes that are not ready to be synced. I just don't know if there's a use-case for no value... |
No, the original bug report was about the fact that we weren't even reacting to the provider ID being set. I am saying we could react to the change, but not filter out nodes based on it. The corollary is as you say: that we will sync nodes that both have and don't have a provider ID set, which will cause some cloud providers to error for the ones that don't have it, but I think that's fine since it will only be temporary. |
Ahh, subtle, yes. I guess they are already tryoing to sync nodes that have no value here, so this would not be worse than before, right? SGTM |
Right, eventually the node will get the provider ID assigned, we'll sync on that change and the cloud provider code will be happy. Are we fine with filing a PR on master, but holding any backport until someone actually reports this as a bug? We don't really know if some clouds run without setting the provider ID but still expect to provision LBs, so not sure a backport is warranted at this point. |
SGTM |
…rry-pick-of-#117388-upstream-release-1.26 Automated cherry pick of #117388: Re-work logic in shouldSyncUpdatedNode
…rry-pick-of-#117388-upstream-release-1.27 Automated cherry pick of #117388: Re-work logic in shouldSyncUpdatedNode
it looks like the change that caused the problem went into 1.25, but the picks of this fix only made it back to 1.26. Does this need to be picked to 1.25 as well? |
No. 1.25 doesn't have the changes which required this fix to begin with. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
The
providerID
is used by the cloud-provider code to correlate the Node to the underlying cloud instance. The service controller in the KCCM does not define the provider ID as a predicate for syncing nodes, hence nodes can be passed to the cloud-provider code with a missing provider ID. Should the cloud-provider not handle the missing provider ID as an error, and instead silently fail: then the load balancers won't be re-synced with the node in question. Moreover: the service controller does not sync the assignment of the providerID on the Node objectThis PR adds the provider ID as a predicate for syncing the node. As such we are sure the cloud provider won't error at any time because of this missing attribute, and we always re-sync the node once it gets the provider ID set.
This was overlooked in #109706 because the providerID was never explicitly defined in the service controller for syncing nodes, it was indirectly covered by the fact that the service controller always synced nodes on the Ready condition...which comes after the setting of the providerID on the Node object. Also, others clouds might have other mechanisms for retrieving the instance besides solely relying on the providerID.
Which issue(s) this PR fixes:
Fixes #117375
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/sig network
/sig cloud-provider
/cc @aojea
/assign @thockin