-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delay in processing HNS LB policies on kube-proxy start on Windows nodes results in unreachable services #109162
Comments
/sig windows network |
/triage accepted There are more details on timing in the fix that @daschott opened #109124 When doing a sync of Services on a new node joining the cluster, the HNS is queried for state on every endpoint in a service which is expensive. When iterating over thousands of services, this can take hours (!). The fix proposed by @daschott gets the HNS state once per sync instead of each time. This plus a fix in Windows OS the sync is reduced to mins in WS 2019 and ~1 min in WS 2022. |
@daschott @jsturtevant |
What happened?
When starting windows nodes with a high number of HNS LB policies/rules on the cluster, there is a delay in processing them. This leaves services unreachable during the delay, which takes about half a minute per policy. This can be substatial given enough rules.
This occurs when restarting kube-proxy and rebooting the host. Once the system does reach a state where all the policylists are processed, incremental updates to the services are handled fine (ie. endpoint changes).
What did you expect to happen?
HNS policies should not cause a large delay for Windows nodes.
How can we reproduce it (as minimally and precisely as possible)?
With a large number of HNS policies in place, restart kube-proxy on a Windows node.
Anything else we need to know?
No response
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: