-
Notifications
You must be signed in to change notification settings - Fork 40.9k
Promote two EndpointSlice e2e tests to Conformance #132019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Promote two EndpointSlice e2e tests to Conformance #132019
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
um... does anyone understand why gofmt wants an extra level of indentation there? |
it seems there are two tabs in the other tests I checked |
as in the requirements https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md#conformance-test-requirements /lgtm |
Specifically, these ensure that the service proxy works with a service that has only EndpointSlices (no Endpoints).
bf4d9a6
to
5420dce
Compare
/test pull-kubernetes-e2e-aks-engine-azure-windows |
@danwinship: The specified target(s) for
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/lgtm /assign @thockin @dims @johnbelamaric |
The new tests fail on Windows:
kube-proxy logs show that it is doing something that looks at least almost right... |
/sig window @princepereira we need help to understand why the kube-proxy in windows fail these tests, this is an important behavior that all proxies should meet, can you PTAL? |
@aojea: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
hmm, this job is confusing
how come is building from the proxy seems to get the endpoints but I do not know how to debug this proxy
|
kubernetes/pkg/proxy/winkernel/proxier.go Line 1352 in ab3e83f
The code is also somewhat confused about using |
/cc @princepereira @sbangari |
@marosset: GitHub didn't allow me to request PR reviews from the following users: princepereira. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@aojea - Setup is a bit confusing because there is not a kube-proxy image we can target for CI test passes. To work around this the CI jobs for windows
We hope that this can be simplified in the future |
I'm not sure if that's the case here. If it were, we should at least see the below logs in the kube-proxy output - but they're missing. kubernetes/pkg/proxy/winkernel/proxier.go Line 1444 in ab3e83f
I strongly suspect that proxier.endpointsMap isn't being updated with the correct set of endpoint information, so the looping through the map is not happening. kubernetes/pkg/proxy/winkernel/proxier.go Line 1321 in ab3e83f
|
/lgtm cancel (for now! since we have active discussion + hold) |
The test passes on the Linux backends, which use the same svcPortMap/endpointsMap. |
@princepereira can we add some instrumentation to the code in a PR with this commit to debug this a bit more and test it in the CI? |
Yes, I was planning to do something similar - adding some logs to inspect the contents of Also, I noticed that the namespace information is missing from Apologies for the delayed responses - I'm working from the IST time zone. |
we rely on you and sig-windows to maintain the windows proxy, this is not something I can help, sorry, it will be important to get this fixed since this is blocking the work in - [KEP]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/4974-deprecate-endpoints |
@dims @marosset Are we allowed to add new conformance tests that fail on Windows? The conformance docs aren't clear about this; they point out that Windows doesn't pass all existing conformance tests, but they seem to assume that the only reason a new conformance test would fail on Windows would be if it depended on Linux-specific functionality, in which case it just needs to be marked |
@danwinship we have precedent where we have (if we can avoid, it would be better, but if there is no choice and we can document that fact, then we can go ahead) |
Right, but this isn't |
I would be OK with promoting the test and adding a skip in the SIG-Windows test passes if the kube-proxy bug is not addressed by 1.34 code-freeze |
Sure. @aojea . I am working on it. I ran the same code in a separate draft PR with some additional debug logs and discovered that the namespace in the proxier.svcMap is different from the one in the endpointMap. As a result, kube-proxy was attempting to fetch endpoint information from Draft PR: #132073 Kube-Proxy Logs: https://storage.googleapis.com/kubernetes-ci-logs/pr-logs/pull/132073/pull-kubernetes-e2e-capz-windows-master/1929995912881377280/artifacts/clusters/capz-conf-i4w1vd/kube-system/kube-proxy-windows-8cqj2/kube-proxy.log Relevant Logs :
The service name present in proxier.endpointMap is |
I think those are two different e2e tests running in parallel; all the tests in |
you can get the details of each failure from the report If you click
|
The issue is that the ClusterIP load balancer policy was created with an incorrect internal port - 80 (ServicePort) instead of the intended 8080 (ContainerPort).
What caused the internal port to be set incorrectly? This happened because the Service object was defined without explicitly setting the targetPort. According to the Kubernetes specification, when targetPort is omitted, it defaults to the value of port. As a result, the targetPort was automatically set to 80 here.
Since the target port was automatically assigned by the test framework (not hardcoded, but derived from the port field based on the service spec), the Windows Proxier constructed the ServiceInfo object using the targetPort from the servicePort kubernetes/pkg/proxy/winkernel/proxier.go Line 597 in b2f27c0
Since targetPort was already set in the ServiceInfo object, this code had no effect. kubernetes/pkg/proxy/winkernel/proxier.go Line 1360 in b2f27c0
This target port was ultimately used as the internal port when creating the load balancer policy. kubernetes/pkg/proxy/winkernel/proxier.go Line 1534 in b2f27c0
Fix
If you agree with my findings, please let me know how you'd like to proceed. Would you prefer me to update the existing PR, or should I open a new one? |
@princepereira can you send a new PR with the fix and tag Dan and me, during the PR review we can evaluate the next step ... but I'm curious about what is the solution, from your explanation it looks like is a windows proxy only thing? |
@aojea , In Windows kube-proxy, the load balancer policy creation uses the internal port for configuration, whereas the iptables implementation in the Linux kube-proxy directly utilizes EndpointInfo, which already contains the correct target port. kubernetes/pkg/proxy/iptables/proxier.go Line 1416 in 5a45088
Therefore, this issue won’t appear in the Linux case. I haven’t reviewed the IPVS or nftables implementations yet. I’ll be submitting a PR soon with a proposed fix. Thanks! |
Yes, I pointed that out here. The job of the service proxy is to intercept traffic whose source is described by the Service, and deliver it to the destinations described by the EndpointSlices. Yes, the Service also has some information about the destination of the traffic, but that information is not there for the service proxy to use, it's there for the EndpointSlice controller to use. It's certainly weird that the EndpointSlices don't quite match the ServicePorts in this case, but it shoudn't matter, because the service proxy shouldn't be looking at targetPort anyway, any more than it should try to resolve the destination pod IPs by itself. |
@danwinship , @aojea , Please review: #132647 |
This promotes "
[sig-network] EndpointSlice should support a Service with multiple ports specified in multiple EndpointSlices
" and "[sig-network] EndpointSlice should support a Service with multiple endpoint IPs specified in multiple EndpointSlices
" to conformance.AFAICT, as of k8s 1.33, a service proxy that is based on Endpoints rather than EndpointSlices can still pass conformance. While it is unlikely that any currently-maintained service proxies actually do this (since that would imply not supporting dual-stack, topology, or terminating endpoints), we should explicitly require that they don't (since we plan to eventually allow disabling the Endpoints controller, KEP-4974).
These tests (added in #114144 in v1.27) weren't explicitly intended to test "service proxies use EndpointSlices rather than Endpoints", but they do test that as a side effect (while also ensuring that the service proxy doesn't fall victim to some easy EndpointSlice-implementing bugs).
The tests don't appear to have ever been flaky. (There are no issues reference those test names.)
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/kind cleanup
/area conformance
/sig network
/cc @aojea @thockin
@kubernetes/sig-architecture-pr-reviews @kubernetes/cncf-conformance-wg