-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load Balancer Node membership not being updated for externalTrafficPolicy Local #4230
Closed
JoelSpeed opened this issue
Jul 5, 2023
· 1 comment
· Fixed by #4234 or kubernetes/kubernetes#119128
Closed
Load Balancer Node membership not being updated for externalTrafficPolicy Local #4230
JoelSpeed opened this issue
Jul 5, 2023
· 1 comment
· Fixed by #4234 or kubernetes/kubernetes#119128
Comments
Thanks @JoelSpeed for reporting the issues and adding the KEP links. Fully agreed with you that health probes should be improved and aligned with KEP-3836 and KEP-3458. Added to v1.28 milestone for tracking. |
This was referenced Jul 6, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
What happened:
When creating a cluster and creating the first service with
externalTrafficPolicy: Local
, we are observing that the new nodes are not added to the load balancer in a timely manner as they should be.It takes one of three things to trigger the nodes to be added to the load balancer:
What you expected to happen:
Nodes should be added to the load balancer as soon as they are added to the cluster.
How to reproduce it (as minimally and precisely as possible):
Create a cluster with the standard load balancer, and create a single service of eTP: Local. This doesn't always happen, but, we are observing with a good frequency that the worker nodes are not being added to the load balancer when the load balancer is created, and the nodes are freshly added.
Anything else we need to know?:
I believe this to be an incompatibility between the predicates used in the cloud-provider core service controller, and the logic in the Azure provider that excludes nodes from being added to the load balancer when they are unready.
In kubernetes/kubernetes#109706, the service controller was updated so that it monitors the nodes that it's passing into
UpdateLoadBalancer
. The PR aimed to reduce the number of times you would resync the load balancer membership.The UpdateLoadBalancer function will only be called when the set of nodes changes, which means either a new node is added/removed or a node is excluded via label/tainted/has a transition of the provider ID.
The call to update the load balancers eventually calls down to EnsureHostsInPool, which then calls out to ShouldNodeExcludedFromLoadBalancer. This determines whether the node should be included or not.
When a node is not ready, it is excluded which means that, when a node is first added, there is a race. If the CCM observes the update for the load balancers before the Node becomes ready, then the node is excluded from the load balancer. In this case, with the new reduction of calls to
UpdateLoadBalancer
, when the node flicks to being ready, the node no longer gets added to the load balancers, as nothing is callingUpdateLoadBalancer
. According to the core CCM code, nothing has changed.In my opinion, and that of KEP-3458, we need to remove the readiness check from the Azure CCM and allow the nodes to be added independent of their readiness.
To do this though, we must first fix the broken health checks (#3499) which has a PR up already (#3887). This is also related to #3500 which adds another reason not to manage load balancer membership based on whether a node is ready or not.
Environment:
kubectl version
): 1.27cat /etc/os-release
):uname -a
):The text was updated successfully, but these errors were encountered: