Skip to content

Promote two EndpointSlice e2e tests to Conformance #132019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

danwinship
Copy link
Contributor

This promotes "[sig-network] EndpointSlice should support a Service with multiple ports specified in multiple EndpointSlices" and "[sig-network] EndpointSlice should support a Service with multiple endpoint IPs specified in multiple EndpointSlices" to conformance.

AFAICT, as of k8s 1.33, a service proxy that is based on Endpoints rather than EndpointSlices can still pass conformance. While it is unlikely that any currently-maintained service proxies actually do this (since that would imply not supporting dual-stack, topology, or terminating endpoints), we should explicitly require that they don't (since we plan to eventually allow disabling the Endpoints controller, KEP-4974).

These tests (added in #114144 in v1.27) weren't explicitly intended to test "service proxies use EndpointSlices rather than Endpoints", but they do test that as a side effect (while also ensuring that the service proxy doesn't fall victim to some easy EndpointSlice-implementing bugs).

The tests don't appear to have ever been flaky. (There are no issues reference those test names.)

Does this PR introduce a user-facing change?

Promoted two EndpointSlice tests to conformance, to require that service
proxy implementations are based on EndpointSlices rather than Endpoints.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/4974-deprecate-endpoints

/kind cleanup
/area conformance
/sig network

/cc @aojea @thockin
@kubernetes/sig-architecture-pr-reviews @kubernetes/cncf-conformance-wg

@k8s-ci-robot k8s-ci-robot requested review from aojea and thockin May 29, 2025 14:46
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. area/conformance Issues or PRs related to kubernetes conformance tests sig/network Categorizes an issue or PR as relevant to SIG Network. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 29, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label May 29, 2025
@danwinship danwinship moved this to Issues To Triage in conformance-definition May 29, 2025
@k8s-ci-robot k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels May 29, 2025
@danwinship
Copy link
Contributor Author

um... does anyone understand why gofmt wants an extra level of indentation there?

@aojea
Copy link
Member

aojea commented May 29, 2025

um... does anyone understand why gofmt wants an extra level of indentation there?

it seems there are two tabs in the other tests I checked

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 29, 2025
Specifically, these ensure that the service proxy works with
a service that has only EndpointSlices (no Endpoints).
@danwinship danwinship force-pushed the endpointslice-only-conformance branch from bf4d9a6 to 5420dce Compare May 30, 2025 00:22
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 30, 2025
@danwinship
Copy link
Contributor Author

/test pull-kubernetes-e2e-aks-engine-azure-windows

@k8s-ci-robot
Copy link
Contributor

@danwinship: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test pull-cos-containerd-e2e-ubuntu-gce
/test pull-kubernetes-cmd
/test pull-kubernetes-cmd-canary
/test pull-kubernetes-cmd-go-canary
/test pull-kubernetes-conformance-kind-ga-only-parallel
/test pull-kubernetes-coverage-unit
/test pull-kubernetes-dependencies
/test pull-kubernetes-dependencies-go-canary
/test pull-kubernetes-e2e-gce
/test pull-kubernetes-e2e-gce-100-performance
/test pull-kubernetes-e2e-gce-cos
/test pull-kubernetes-e2e-gce-cos-canary
/test pull-kubernetes-e2e-gce-cos-no-stage
/test pull-kubernetes-e2e-gce-network-proxy-http-connect
/test pull-kubernetes-e2e-gce-pull-through-cache
/test pull-kubernetes-e2e-gce-scale-performance-manual
/test pull-kubernetes-e2e-kind
/test pull-kubernetes-e2e-kind-ipv6
/test pull-kubernetes-e2e-storage-kind-alpha-beta-features-slow
/test pull-kubernetes-integration
/test pull-kubernetes-integration-canary
/test pull-kubernetes-integration-go-canary
/test pull-kubernetes-kubemark-e2e-gce-scale
/test pull-kubernetes-node-e2e-containerd
/test pull-kubernetes-typecheck
/test pull-kubernetes-unit
/test pull-kubernetes-unit-go-canary
/test pull-kubernetes-update
/test pull-kubernetes-verify
/test pull-kubernetes-verify-go-canary

The following commands are available to trigger optional jobs:

/test check-dependency-stats
/test pull-crio-cgroupv1-node-e2e-eviction
/test pull-crio-cgroupv1-node-e2e-features
/test pull-crio-cgroupv1-node-e2e-hugepages
/test pull-crio-cgroupv1-node-e2e-resource-managers
/test pull-crio-cgroupv2-imagefs-separatedisktest
/test pull-crio-cgroupv2-node-e2e-eviction
/test pull-crio-cgroupv2-node-e2e-hugepages
/test pull-crio-cgroupv2-node-e2e-resource-managers
/test pull-crio-cgroupv2-splitfs-separate-disk
/test pull-e2e-gce-cloud-provider-disabled
/test pull-e2e-gci-gce-alpha-enabled-default
/test pull-kubernetes-apidiff
/test pull-kubernetes-apidiff-client-go
/test pull-kubernetes-conformance-image-test
/test pull-kubernetes-conformance-kind-ga-only
/test pull-kubernetes-conformance-kind-ipv6-parallel
/test pull-kubernetes-cos-cgroupv1-containerd-node-e2e
/test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-eviction
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
/test pull-kubernetes-crio-node-memoryqos-cgrpv2
/test pull-kubernetes-cross
/test pull-kubernetes-e2e-autoscaling-hpa-cm
/test pull-kubernetes-e2e-autoscaling-hpa-cpu
/test pull-kubernetes-e2e-autoscaling-hpa-cpu-alpha-beta
/test pull-kubernetes-e2e-capz-azure-disk
/test pull-kubernetes-e2e-capz-azure-disk-vmss
/test pull-kubernetes-e2e-capz-azure-disk-windows
/test pull-kubernetes-e2e-capz-azure-file
/test pull-kubernetes-e2e-capz-azure-file-vmss
/test pull-kubernetes-e2e-capz-azure-file-windows
/test pull-kubernetes-e2e-capz-conformance
/test pull-kubernetes-e2e-capz-master-windows-nodelogquery
/test pull-kubernetes-e2e-capz-windows-alpha-feature-vpa
/test pull-kubernetes-e2e-capz-windows-alpha-features
/test pull-kubernetes-e2e-capz-windows-master
/test pull-kubernetes-e2e-capz-windows-serial-slow
/test pull-kubernetes-e2e-capz-windows-serial-slow-hpa
/test pull-kubernetes-e2e-containerd-gce
/test pull-kubernetes-e2e-ec2
/test pull-kubernetes-e2e-ec2-arm64
/test pull-kubernetes-e2e-ec2-conformance
/test pull-kubernetes-e2e-ec2-conformance-arm64
/test pull-kubernetes-e2e-ec2-device-plugin-gpu
/test pull-kubernetes-e2e-gce-canary
/test pull-kubernetes-e2e-gce-correctness
/test pull-kubernetes-e2e-gce-cos-alpha-features
/test pull-kubernetes-e2e-gce-csi-serial
/test pull-kubernetes-e2e-gce-device-plugin-gpu
/test pull-kubernetes-e2e-gce-disruptive-canary
/test pull-kubernetes-e2e-gce-kubelet-credential-provider
/test pull-kubernetes-e2e-gce-network-policies
/test pull-kubernetes-e2e-gce-network-proxy-grpc
/test pull-kubernetes-e2e-gce-serial
/test pull-kubernetes-e2e-gce-serial-canary
/test pull-kubernetes-e2e-gce-storage-disruptive
/test pull-kubernetes-e2e-gce-storage-selinux
/test pull-kubernetes-e2e-gce-storage-slow
/test pull-kubernetes-e2e-gce-storage-snapshot
/test pull-kubernetes-e2e-gci-gce-autoscaling
/test pull-kubernetes-e2e-gci-gce-ingress
/test pull-kubernetes-e2e-gci-gce-ipvs
/test pull-kubernetes-e2e-gci-gce-kube-dns-nodecache
/test pull-kubernetes-e2e-gci-gce-nftables
/test pull-kubernetes-e2e-kind-alpha-beta-features
/test pull-kubernetes-e2e-kind-alpha-features
/test pull-kubernetes-e2e-kind-beta-features
/test pull-kubernetes-e2e-kind-canary
/test pull-kubernetes-e2e-kind-cloud-provider-loadbalancer
/test pull-kubernetes-e2e-kind-dependencies
/test pull-kubernetes-e2e-kind-dual-canary
/test pull-kubernetes-e2e-kind-evented-pleg
/test pull-kubernetes-e2e-kind-ipv6-canary
/test pull-kubernetes-e2e-kind-ipvs
/test pull-kubernetes-e2e-kind-kms
/test pull-kubernetes-e2e-kind-multizone
/test pull-kubernetes-e2e-kind-nftables
/test pull-kubernetes-e2e-relaxed-environment-variable-validation
/test pull-kubernetes-e2e-storage-kind-disruptive
/test pull-kubernetes-e2e-storage-kind-volume-group-snapshots
/test pull-kubernetes-kind-dra
/test pull-kubernetes-kind-dra-all
/test pull-kubernetes-kind-dra-all-canary
/test pull-kubernetes-kind-dra-canary
/test pull-kubernetes-kind-json-logging
/test pull-kubernetes-kind-text-logging
/test pull-kubernetes-kubemark-e2e-gce-big
/test pull-kubernetes-linter-hints
/test pull-kubernetes-local-e2e
/test pull-kubernetes-node-arm64-e2e-containerd-ec2
/test pull-kubernetes-node-arm64-e2e-containerd-serial-ec2
/test pull-kubernetes-node-arm64-ubuntu-serial-gce
/test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
/test pull-kubernetes-node-crio-cgrpv2-e2e
/test pull-kubernetes-node-crio-cgrpv2-imagefs-e2e
/test pull-kubernetes-node-crio-cgrpv2-imagevolume-e2e
/test pull-kubernetes-node-crio-cgrpv2-splitfs-e2e
/test pull-kubernetes-node-crio-cgrpv2-userns-e2e-serial
/test pull-kubernetes-node-crio-e2e
/test pull-kubernetes-node-e2e-alpha-ec2
/test pull-kubernetes-node-e2e-containerd-1-7-dra
/test pull-kubernetes-node-e2e-containerd-1-7-dra-canary
/test pull-kubernetes-node-e2e-containerd-2-0-dra
/test pull-kubernetes-node-e2e-containerd-2-0-dra-canary
/test pull-kubernetes-node-e2e-containerd-alpha-features
/test pull-kubernetes-node-e2e-containerd-ec2
/test pull-kubernetes-node-e2e-containerd-features
/test pull-kubernetes-node-e2e-containerd-features-kubetest2
/test pull-kubernetes-node-e2e-containerd-kubelet-psi
/test pull-kubernetes-node-e2e-containerd-kubetest2
/test pull-kubernetes-node-e2e-containerd-serial-ec2
/test pull-kubernetes-node-e2e-containerd-serial-ec2-eks
/test pull-kubernetes-node-e2e-containerd-standalone-mode
/test pull-kubernetes-node-e2e-containerd-standalone-mode-all-alpha
/test pull-kubernetes-node-e2e-cri-proxy-serial
/test pull-kubernetes-node-e2e-crio-cgrpv1-dra
/test pull-kubernetes-node-e2e-crio-cgrpv1-dra-canary
/test pull-kubernetes-node-e2e-crio-cgrpv2-dra
/test pull-kubernetes-node-e2e-crio-cgrpv2-dra-canary
/test pull-kubernetes-node-e2e-resource-health-status
/test pull-kubernetes-node-kubelet-containerd-flaky
/test pull-kubernetes-node-kubelet-credential-provider
/test pull-kubernetes-node-kubelet-serial-containerd
/test pull-kubernetes-node-kubelet-serial-containerd-alpha-features
/test pull-kubernetes-node-kubelet-serial-containerd-kubetest2
/test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers
/test pull-kubernetes-node-kubelet-serial-cpu-manager
/test pull-kubernetes-node-kubelet-serial-cpu-manager-kubetest2
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv1
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv2
/test pull-kubernetes-node-kubelet-serial-hugepages
/test pull-kubernetes-node-kubelet-serial-memory-manager
/test pull-kubernetes-node-kubelet-serial-podresources
/test pull-kubernetes-node-kubelet-serial-topology-manager
/test pull-kubernetes-node-kubelet-serial-topology-manager-kubetest2
/test pull-kubernetes-node-swap-conformance-fedora-serial
/test pull-kubernetes-node-swap-conformance-ubuntu-serial
/test pull-kubernetes-node-swap-fedora
/test pull-kubernetes-node-swap-fedora-serial
/test pull-kubernetes-node-swap-ubuntu-serial
/test pull-kubernetes-scheduler-perf
/test pull-kubernetes-unit-windows-master
/test pull-publishing-bot-validate

Use /test all to run the following jobs that were automatically triggered:

pull-kubernetes-cmd
pull-kubernetes-conformance-image-test
pull-kubernetes-conformance-kind-ga-only-parallel
pull-kubernetes-dependencies
pull-kubernetes-e2e-ec2
pull-kubernetes-e2e-gce
pull-kubernetes-e2e-gce-network-policies
pull-kubernetes-e2e-gci-gce-ingress
pull-kubernetes-e2e-kind
pull-kubernetes-e2e-kind-ipv6
pull-kubernetes-e2e-kind-nftables
pull-kubernetes-integration
pull-kubernetes-linter-hints
pull-kubernetes-node-e2e-containerd
pull-kubernetes-typecheck
pull-kubernetes-unit
pull-kubernetes-verify

In response to this:

/test pull-kubernetes-e2e-aks-engine-azure-windows

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@aojea
Copy link
Member

aojea commented May 30, 2025

/lgtm
/approve

/assign @thockin @dims @johnbelamaric

@danwinship
Copy link
Contributor Author

The new tests fail on Windows:

I0530 13:36:08.236904 88636 util.go:161] Waiting up to 2m0s to get response from 10.108.165.206:80
...
I0530 13:38:08.811227 88636 util.go:181] Unexpected error: 
    <exec.CodeExitError>: 
    error running /usr/local/bin/kubectl --kubeconfig=/home/prow/go/src/k8s.io/windows-testing/capz/capz-conf-5xrwb0.kubeconfig --namespace=endpointslice-4965 exec pause-pod-0 >     -- /bin/sh -x -c curl -q -s --max-time 30 10.108.165.206:80/hostname:
    Command stdout:
    
    stderr:
    + curl -q -s --max-time 30 10.108.165.206:80/hostname
    command terminated with exit code 7

kube-proxy logs show that it is doing something that looks at least almost right...

@aojea
Copy link
Member

aojea commented May 30, 2025

/sig window

@princepereira we need help to understand why the kube-proxy in windows fail these tests, this is an important behavior that all proxies should meet, can you PTAL?

@k8s-ci-robot
Copy link
Contributor

@aojea: The label(s) sig/window cannot be applied, because the repository doesn't have them.

In response to this:

/sig window

@princepereira we need help to understand why the kube-proxy in windows fail these tests, this is an important behavior that all proxies should meet, can you PTAL?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@aojea
Copy link
Member

aojea commented May 30, 2025

hmm, this job is confusing

https://storage.googleapis.com/kubernetes-ci-logs/pr-logs/pull/132019/pull-kubernetes-e2e-capz-windows-master/1928434416195997696/artifacts/clusters/capz-conf-5xrwb0/kube-system/kube-proxy-windows-b4b5k/pod-describe.txt

Normal Scheduled 54m default-scheduler Successfully assigned kube-system/kube-proxy-windows-b4b5k to capz-conf-vt4pr
Normal Pulling 54m kubelet Pulling image "sigwindowstools/kube-proxy:v1.30.1-calico-hostprocess"
Normal Pulled 54m kubelet Successfully pulled image "sigwindowstools/kube-proxy:v1.30.1-calico-hostprocess" in 3.942s (3.942s including waiting). Image size: 17553604 bytes.
Normal Created 54m kubelet Created container: kube-proxy
Normal Started 54m kubelet Started container kube-proxy

however, logs https://storage.googleapis.com/kubernetes-ci-logs/pr-logs/pull/132019/pull-kubernetes-e2e-capz-windows-master/1928434416195997696/artifacts/clusters/capz-conf-5xrwb0/kube-system/kube-proxy-windows-b4b5k/kube-proxy.log

kubeproxy version Kubernetes v1.34.0-alpha.0.842+f1e9766d104a72-dirty

how come is building from dirty?

the proxy seems to get the endpoints but I do not know how to debug this proxy

8 3144 proxier.go:606] "Flags enabled for service" service="example-custom-endpoints" localTrafficDSR=false internalTrafficLocal=false preserveDIP=false winProxyOptimization=true
I0530 13:35:50.463848 3144 proxier.go:606] "Flags enabled for service" service="example-custom-endpoints" localTrafficDSR=false internalTrafficLocal=false preserveDIP=false winProxyOptimization=true
I0530 13:35:50.463848 3144 servicechangetracker.go:103] "Service updated ports" service="endpointslice-4965/example-custom-endpoints" portCount=2
I0530 13:35:50.465838 3144 servicechangetracker.go:202] "Adding new service port" portName="endpointslice-4965/example-custom-endpoints:port80" servicePort="HnsID:, TargetPort:80"
I0530 13:35:50.465838 3144 servicechangetracker.go:202] "Adding new service port" portName="endpointslice-4965/example-custom-endpoints:port81" servicePort="HnsID:, TargetPort:81"
I0530 13:35:50.478720 3144 hns.go:174] "Queried endpoints from network" network="Calico" count=34
I0530 13:35:50.481410 3144 hns.go:345] "Queried load balancers" count=7
I0530 13:35:50.481410 3144 proxier.go:1267] "Syncing Policies"
I0530 13:35:50.481410 3144 proxier.go:1278] "Policy already applied" serviceInfo="HnsID:da3bf23f-0736-4183-9b47-d88fe5d1abfe, TargetPort:5473"
I0530 13:35:50.481410 3144 proxier.go:1278] "Policy already applied" serviceInfo="HnsID:192f4384-ca05-4d44-a153-3cd9775ee5a9, TargetPort:9153"
I0530 13:35:50.481410 3144 proxier.go:1285] "No existing remote endpoint" IP="10.108.165.206"
I0530 13:35:50.483260 3144 proxier.go:1308] "Applying Policy" serviceInfo="endpointslice-4965/example-custom-endpoints:port80"
I0530 13:35:50.483260 3144 proxier.go:1323] "Skipped terminating status check for all endpoints" svcClusterIP="10.108.165.206" ingressLBCount=0
I0530 13:35:50.483260 3144 proxier.go:1465] "Associated endpoints for service" endpointInfo="[]" serviceName="endpointslice-4965/example-custom-endpoints:port80"
I0530 13:35:50.483260 3144 proxier.go:1477] "Cleanup existing " endpointInfo=null serviceName="endpointslice-4965/example-custom-endpoints:port80"
E0530 13:35:50.483260 3144 proxier.go:1480] "Endpoint information not available for service, not applying any policy" serviceName="endpointslice-4965/example-custom-endpoints:port80"
I0530 13:35:50.483783 3144 proxier.go:1278] "Policy already applied" serviceInfo="HnsID:59d398f3-ef19-4260-9819-91f5d81150e7, TargetPort:6443"
I0530 13:35:50.483783 3144 proxier.go:1278] "Policy already applied" serviceInfo="HnsID:067931c0-4e42-4a82-a2d0-01e230e505a2, TargetPort:53"
I0530 13:35:50.483783 3144 proxier.go:1278] "Policy already applied" serviceInfo="HnsID:68e1f7ee-7a50-431f-b5af-17e8d97ec040, TargetPort:53"
I0530 13:35:50.483829 3144 proxier.go:1278] "Policy already applied" serviceInfo="HnsID:0beb4332-a6d7-4176-bfb1-116aa1feed05, TargetPort:4443"
I0530 13:35:50.483829 3144 proxier.go:1308] "Applying Policy" serviceInfo="endpointslice-4965/example-custom-endpoints:port81"
I0530 13:35:50.483829 3144 proxier.go:1323] "Skipped terminating status check for all endpoints" svcClusterIP="10.108.165.206" ingressLBCount=0
I0530 13:35:50.483912 3144 proxier.go:1465] "Associated endpoints for service" endpointInfo="[]" serviceName="endpointslice-4965/example-custom-endpoints:port81"
I0530 13:35:50.483912 3144 proxier.go:1477] "Cleanup existing " endpointInfo=null serviceName="endpointslice-4965/example-custom-endpoints:port81"
E0530 13:35:50.484151 3144 proxier.go:1480] "Endpoint information not available for service, not applying any policy" serviceName="endpointslice-4965/example-custom-endpoints:port81"
I0530 13:35:50.484151 3144 proxier.go:1278] "Policy already applied" serviceInfo="HnsID:b0c0c1de-bbad-459a-aae9-aaeccd4f1c7d, TargetPort:5443"
I0530 13:35:50.484151 3144 proxier.go:1210] "Syncing proxy rules complete" elapsed="20.3033ms"
I0530 13:35:50.484151 3144 bounded_frequency_runner.go:296] sync-runner: ran, next possible in 1s, periodic in 30s
I0530 13:36:03.656148 3144 config.go:124] "Calling handler.OnEndpointSliceAdd" endpoints="endpointslice-4965/e2e-custom-slicekwq5p"
I0530 13:36:03.658377 3144 endpointslicecache.go:296] "Setting endpoints for service port name" portName="endpointslice-4965/example-custom-endpoints:port80" endpoints=["HnsID:, Address:192.168.210.201:8090"]
I0530 13:36:03.658377 3144 proxier.go:446] "Endpoints are modified. Service is stale" servicePortName="endpointslice-4965/example-custom

@danwinship
Copy link
Contributor Author

// Note that hnslib.AddLoadBalancer() doesn't support endpoints with different ports, so only port from first endpoint is used.

The code is also somewhat confused about using TargetPort from the service; TargetPort is an input to the EndpointSlice controller that affects the generated Ports in the EndpointSlices; kube-proxy should only be looking at the EndpointSlice Ports, not at the Service TargetPort, because (as in this test case), it's not even required to set TargetPort if you're constructing the EndpointSlices by hand.

@marosset
Copy link
Contributor

/cc @princepereira @sbangari
Can either of you chime in here?

@k8s-ci-robot k8s-ci-robot requested a review from sbangari May 30, 2025 16:23
@k8s-ci-robot
Copy link
Contributor

@marosset: GitHub didn't allow me to request PR reviews from the following users: princepereira.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @princepereira @sbangari
Can either of you chime in here?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@marosset
Copy link
Contributor

hmm, this job is confusing

https://storage.googleapis.com/kubernetes-ci-logs/pr-logs/pull/132019/pull-kubernetes-e2e-capz-windows-master/1928434416195997696/artifacts/clusters/capz-conf-5xrwb0/kube-system/kube-proxy-windows-b4b5k/pod-describe.txt

Normal Scheduled 54m default-scheduler Successfully assigned kube-system/kube-proxy-windows-b4b5k to capz-conf-vt4pr
Normal Pulling 54m kubelet Pulling image "sigwindowstools/kube-proxy:v1.30.1-calico-hostprocess"
Normal Pulled 54m kubelet Successfully pulled image "sigwindowstools/kube-proxy:v1.30.1-calico-hostprocess" in 3.942s (3.942s including waiting). Image size: 17553604 bytes.
Normal Created 54m kubelet Created container: kube-proxy
Normal Started 54m kubelet Started container kube-proxy

however, logs https://storage.googleapis.com/kubernetes-ci-logs/pr-logs/pull/132019/pull-kubernetes-e2e-capz-windows-master/1928434416195997696/artifacts/clusters/capz-conf-5xrwb0/kube-system/kube-proxy-windows-b4b5k/kube-proxy.log

kubeproxy version Kubernetes v1.34.0-alpha.0.842+f1e9766d104a72-dirty

how come is building from dirty?

the proxy seems to get the endpoints but I do not know how to debug this proxy

8 3144 proxier.go:606] "Flags enabled for service" service="example-custom-endpoints" localTrafficDSR=false internalTrafficLocal=false preserveDIP=false winProxyOptimization=true
I0530 13:35:50.463848 3144 proxier.go:606] "Flags enabled for service" service="example-custom-endpoints" localTrafficDSR=false internalTrafficLocal=false preserveDIP=false winProxyOptimization=true
I0530 13:35:50.463848 3144 servicechangetracker.go:103] "Service updated ports" service="endpointslice-4965/example-custom-endpoints" portCount=2
I0530 13:35:50.465838 3144 servicechangetracker.go:202] "Adding new service port" portName="endpointslice-4965/example-custom-endpoints:port80" servicePort="HnsID:, TargetPort:80"
I0530 13:35:50.465838 3144 servicechangetracker.go:202] "Adding new service port" portName="endpointslice-4965/example-custom-endpoints:port81" servicePort="HnsID:, TargetPort:81"
I0530 13:35:50.478720 3144 hns.go:174] "Queried endpoints from network" network="Calico" count=34
I0530 13:35:50.481410 3144 hns.go:345] "Queried load balancers" count=7
I0530 13:35:50.481410 3144 proxier.go:1267] "Syncing Policies"
I0530 13:35:50.481410 3144 proxier.go:1278] "Policy already applied" serviceInfo="HnsID:da3bf23f-0736-4183-9b47-d88fe5d1abfe, TargetPort:5473"
I0530 13:35:50.481410 3144 proxier.go:1278] "Policy already applied" serviceInfo="HnsID:192f4384-ca05-4d44-a153-3cd9775ee5a9, TargetPort:9153"
I0530 13:35:50.481410 3144 proxier.go:1285] "No existing remote endpoint" IP="10.108.165.206"
I0530 13:35:50.483260 3144 proxier.go:1308] "Applying Policy" serviceInfo="endpointslice-4965/example-custom-endpoints:port80"
I0530 13:35:50.483260 3144 proxier.go:1323] "Skipped terminating status check for all endpoints" svcClusterIP="10.108.165.206" ingressLBCount=0
I0530 13:35:50.483260 3144 proxier.go:1465] "Associated endpoints for service" endpointInfo="[]" serviceName="endpointslice-4965/example-custom-endpoints:port80"
I0530 13:35:50.483260 3144 proxier.go:1477] "Cleanup existing " endpointInfo=null serviceName="endpointslice-4965/example-custom-endpoints:port80"
E0530 13:35:50.483260 3144 proxier.go:1480] "Endpoint information not available for service, not applying any policy" serviceName="endpointslice-4965/example-custom-endpoints:port80"
I0530 13:35:50.483783 3144 proxier.go:1278] "Policy already applied" serviceInfo="HnsID:59d398f3-ef19-4260-9819-91f5d81150e7, TargetPort:6443"
I0530 13:35:50.483783 3144 proxier.go:1278] "Policy already applied" serviceInfo="HnsID:067931c0-4e42-4a82-a2d0-01e230e505a2, TargetPort:53"
I0530 13:35:50.483783 3144 proxier.go:1278] "Policy already applied" serviceInfo="HnsID:68e1f7ee-7a50-431f-b5af-17e8d97ec040, TargetPort:53"
I0530 13:35:50.483829 3144 proxier.go:1278] "Policy already applied" serviceInfo="HnsID:0beb4332-a6d7-4176-bfb1-116aa1feed05, TargetPort:4443"
I0530 13:35:50.483829 3144 proxier.go:1308] "Applying Policy" serviceInfo="endpointslice-4965/example-custom-endpoints:port81"
I0530 13:35:50.483829 3144 proxier.go:1323] "Skipped terminating status check for all endpoints" svcClusterIP="10.108.165.206" ingressLBCount=0
I0530 13:35:50.483912 3144 proxier.go:1465] "Associated endpoints for service" endpointInfo="[]" serviceName="endpointslice-4965/example-custom-endpoints:port81"
I0530 13:35:50.483912 3144 proxier.go:1477] "Cleanup existing " endpointInfo=null serviceName="endpointslice-4965/example-custom-endpoints:port81"
E0530 13:35:50.484151 3144 proxier.go:1480] "Endpoint information not available for service, not applying any policy" serviceName="endpointslice-4965/example-custom-endpoints:port81"
I0530 13:35:50.484151 3144 proxier.go:1278] "Policy already applied" serviceInfo="HnsID:b0c0c1de-bbad-459a-aae9-aaeccd4f1c7d, TargetPort:5443"
I0530 13:35:50.484151 3144 proxier.go:1210] "Syncing proxy rules complete" elapsed="20.3033ms"
I0530 13:35:50.484151 3144 bounded_frequency_runner.go:296] sync-runner: ran, next possible in 1s, periodic in 30s
I0530 13:36:03.656148 3144 config.go:124] "Calling handler.OnEndpointSliceAdd" endpoints="endpointslice-4965/e2e-custom-slicekwq5p"
I0530 13:36:03.658377 3144 endpointslicecache.go:296] "Setting endpoints for service port name" portName="endpointslice-4965/example-custom-endpoints:port80" endpoints=["HnsID:, Address:192.168.210.201:8090"]
I0530 13:36:03.658377 3144 proxier.go:446] "Endpoints are modified. Service is stale" servicePortName="endpointslice-4965/example-custom

@aojea - Setup is a bit confusing because there is not a kube-proxy image we can target for CI test passes.

To work around this the CI jobs for windows

  1. Have a pre-kubeadm task to download a windows kube-proxy binary from the k8s-release-dev/ci storage bucket for the specified $CI_VERSION

https://github.com/kubernetes-sigs/windows-testing/blob/458825d6fd361a55c6f4a4dd0a0c2c85af73fc46/capz/templates/windows-ci.yaml#L96-L106

  1. Use a special kube-proxy container image that will start a kube-proxy binary that is on the node at a given location if

https://github.com/kubernetes-sigs/sig-windows-tools/blob/dc4b838507a66dfa0ed10f9b7088eaa443030886/hostprocess/calico/kube-proxy/start.ps1#L17-L20

We hope that this can be simplified in the future

@princepereira
Copy link
Contributor

princepereira commented Jun 2, 2025

// Note that hnslib.AddLoadBalancer() doesn't support endpoints with different ports, so only port from first endpoint is used.

The code is also somewhat confused about using TargetPort from the service; TargetPort is an input to the EndpointSlice controller that affects the generated Ports in the EndpointSlices; kube-proxy should only be looking at the EndpointSlice Ports, not at the Service TargetPort, because (as in this test case), it's not even required to set TargetPort if you're constructing the EndpointSlices by hand.

I'm not sure if that's the case here. If it were, we should at least see the below logs in the kube-proxy output - but they're missing.

klog.V(1).InfoS("Hns endpoint resource", "endpointInfo", newHnsEndpoint)

I strongly suspect that proxier.endpointsMap isn't being updated with the correct set of endpoint information, so the looping through the map is not happening.

for _, epInfo := range proxier.endpointsMap[svcName] {

@dims
Copy link
Member

dims commented Jun 2, 2025

/lgtm cancel

(for now! since we have active discussion + hold)

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 2, 2025
@danwinship
Copy link
Contributor Author

I strongly suspect that proxier.svcPortMap isn't being updated with the correct set of endpoint information, so the looping through the map is not happening.

The test passes on the Linux backends, which use the same svcPortMap/endpointsMap.

@aojea
Copy link
Member

aojea commented Jun 2, 2025

@princepereira can we add some instrumentation to the code in a PR with this commit to debug this a bit more and test it in the CI?

@princepereira
Copy link
Contributor

@princepereira can we add some instrumentation to the code in a PR with this commit to debug this a bit more and test it in the CI?

Yes, I was planning to do something similar - adding some logs to inspect the contents of proxier.endpointsMap . Please feel free to go ahead with your proposal.

Also, I noticed that the namespace information is missing from ObjectMeta: metav1.ObjectMeta{ in the svc object , even though the namespace is being passed to the createServiceReportErr function . Not sure if that’s contributing to the issue, but I thought I’d point it out.

Apologies for the delayed responses - I'm working from the IST time zone.

@aojea
Copy link
Member

aojea commented Jun 3, 2025

Yes, I was planning to do something similar - adding some logs to inspect the contents of proxier.endpointsMap . Please feel free to go ahead with your proposal.

we rely on you and sig-windows to maintain the windows proxy, this is not something I can help, sorry, it will be important to get this fixed since this is blocking the work in - [KEP]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/4974-deprecate-endpoints

@danwinship
Copy link
Contributor Author

it will be important to get this fixed since this is blocking the work in KEP-4974

@dims @marosset Are we allowed to add new conformance tests that fail on Windows? The conformance docs aren't clear about this; they point out that Windows doesn't pass all existing conformance tests, but they seem to assume that the only reason a new conformance test would fail on Windows would be if it depended on Linux-specific functionality, in which case it just needs to be marked [LinuxOnly].

@dims
Copy link
Member

dims commented Jun 3, 2025

@danwinship we have precedent where we have LinuxOnly + Conformance - So yes it's possible to have new ones marked as such:
https://testgrid.k8s.io/sig-release-master-blocking#conformance-ga-only&width=20&include-filter-by-regex=LinuxOnly

(if we can avoid, it would be better, but if there is no choice and we can document that fact, then we can go ahead)

@danwinship
Copy link
Contributor Author

Right, but this isn't [LinuxOnly]; it's not that the test is "using Linux-specific features" or "unable to run on Windows nodes", it's just that there's currently a bug in the Windows kube-proxy implementation. If the bug doesn't get fixed by code freeze, can we still promote to conformance for 1.34 anyway and just force them to skip the test?

@marosset
Copy link
Contributor

marosset commented Jun 3, 2025

Right, but this isn't [LinuxOnly]; it's not that the test is "using Linux-specific features" or "unable to run on Windows nodes", it's just that there's currently a bug in the Windows kube-proxy implementation. If the bug doesn't get fixed by code freeze, can we still promote to conformance for 1.34 anyway and just force them to skip the test?

I would be OK with promoting the test and adding a skip in the SIG-Windows test passes if the kube-proxy bug is not addressed by 1.34 code-freeze

@princepereira
Copy link
Contributor

princepereira commented Jun 4, 2025

Yes, I was planning to do something similar - adding some logs to inspect the contents of proxier.endpointsMap . Please feel free to go ahead with your proposal.

we rely on you and sig-windows to maintain the windows proxy, this is not something I can help, sorry, it will be important to get this fixed since this is blocking the work in - [KEP]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/4974-deprecate-endpoints

Sure. @aojea . I am working on it.

I ran the same code in a separate draft PR with some additional debug logs and discovered that the namespace in the proxier.svcMap is different from the one in the endpointMap. As a result, kube-proxy was attempting to fetch endpoint information from proxier.endpointsMap using an incorrect key. I'm currently investigating why there's a mismatch in namespaces. If anyone has any insights, please let me know.

Draft PR: #132073

Kube-Proxy Logs: https://storage.googleapis.com/kubernetes-ci-logs/pr-logs/pull/132073/pull-kubernetes-e2e-capz-windows-master/1929995912881377280/artifacts/clusters/capz-conf-i4w1vd/kube-system/kube-proxy-windows-8cqj2/kube-proxy.log

Relevant Logs :

I0603 21:43:30.968055    5840 servicechangetracker.go:103] "Service updated ports" service="endpointslice-2021/example-custom-endpoints" portCount=2
I0603 21:43:30.970419    5840 servicechangetracker.go:202] "Adding new service port" portName="endpointslice-2021/example-custom-endpoints:port80" servicePort="HnsID:, TargetPort:80"
I0603 21:43:30.970419    5840 servicechangetracker.go:202] "Adding new service port" portName="endpointslice-2021/example-custom-endpoints:port81" servicePort="HnsID:, TargetPort:81"
I0603 21:43:30.990622    5840 hns.go:174] "Queried endpoints from network" network="Calico" count=43
I0603 21:43:30.994633    5840 hns.go:345] "Queried load balancers" count=9
I0603 21:43:30.995466    5840 proxier.go:1267] "Syncing Policies"
I0603 21:43:30.995499    5840 proxier.go:1278] "Policy already applied" serviceInfo="HnsID:03f6e1a7-375d-4714-ae67-662020b7bb81, TargetPort:53"
I0603 21:43:30.995499    5840 proxier.go:1278] "Policy already applied" serviceInfo="HnsID:be3ca265-126e-45a1-962c-73753536dd3a, TargetPort:4443"
I0603 21:43:30.995499    5840 proxier.go:1278] "Policy already applied" serviceInfo="HnsID:0a875478-bc4f-42ed-bb48-c3406b16be3c, TargetPort:5443"
I0603 21:43:30.995499    5840 proxier.go:1278] "Policy already applied" serviceInfo="HnsID:dbbed1bb-8218-40e3-886a-a11544398767, TargetPort:5473"
I0603 21:43:30.995499    5840 proxier.go:1278] "Policy already applied" serviceInfo="HnsID:a31a02b6-1932-4bd7-9240-fd1bf7fdc489, TargetPort:6443"
I0603 21:43:30.995499    5840 proxier.go:1278] "Policy already applied" serviceInfo="HnsID:1db5eca5-6705-47f1-9256-a52de4d59c38, TargetPort:53"
I0603 21:43:30.995499    5840 proxier.go:1278] "Policy already applied" serviceInfo="HnsID:a3e113c1-3eab-4531-9289-2a585fb53551, TargetPort:80"
I0603 21:43:30.995499    5840 proxier.go:1285] "No existing remote endpoint" IP="10.111.212.62"
I0603 21:43:31.001075    5840 proxier.go:1308] "Applying Policy" serviceInfo="endpointslice-2021/example-custom-endpoints:port80"
I0603 21:43:31.001075    5840 proxier.go:1323] "Skipped terminating status check for all endpoints" svcClusterIP="10.111.212.62" ingressLBCount=0
I0603 21:43:31.001075    5840 proxier.go:1327] "TEST: Endpoints map for service" svcName="endpointslice-2021/example-custom-endpoints:port80" endpointsCount=0
I0603 21:43:31.001075    5840 proxier.go:1329] "TEST: Available services in endpointmap" svcName="kube-system/kube-dns:metrics" endpointsCount=2
I0603 21:43:31.001075    5840 proxier.go:1329] "TEST: Available services in endpointmap" svcName="calico-system/calico-typha:calico-typha" endpointsCount=1
I0603 21:43:31.001075    5840 proxier.go:1329] "TEST: Available services in endpointmap" svcName="endpointslice-8664/example-custom-endpoints:port81" endpointsCount=1
I0603 21:43:31.001075    5840 proxier.go:1329] "TEST: Available services in endpointmap" svcName="kube-system/kube-dns:dns" endpointsCount=2
I0603 21:43:31.001075    5840 proxier.go:1329] "TEST: Available services in endpointmap" svcName="calico-apiserver/calico-api:apiserver" endpointsCount=2
I0603 21:43:31.001075    5840 proxier.go:1329] "TEST: Available services in endpointmap" svcName="kube-system/metrics-server:https" endpointsCount=1
I0603 21:43:31.001075    5840 proxier.go:1329] "TEST: Available services in endpointmap" svcName="kube-system/kube-dns:dns-tcp" endpointsCount=2
I0603 21:43:31.001075    5840 proxier.go:1329] "TEST: Available services in endpointmap" svcName="endpointslice-8664/example-custom-endpoints:port80" endpointsCount=1
I0603 21:43:31.001075    5840 proxier.go:1329] "TEST: Available services in endpointmap" svcName="default/kubernetes:https" endpointsCount=1
I0603 21:43:31.001075    5840 proxier.go:1471] "Associated endpoints for service" endpointInfo="[]" serviceName="endpointslice-2021/example-custom-endpoints:port80"
I0603 21:43:31.001075    5840 proxier.go:1483] "Cleanup existing " endpointInfo=null serviceName="endpointslice-2021/example-custom-endpoints:port80"
E0603 21:43:31.001773    5840 proxier.go:1486] "Endpoint information not available for service, not applying any policy" serviceName="endpointslice-2021/example-custom-endpoints:port80"
I0603 21:43:31.001773    5840 proxier.go:1308] "Applying Policy" serviceInfo="endpointslice-2021/example-custom-endpoints:port81"
I0603 21:43:31.001773    5840 proxier.go:1323] "Skipped terminating status check for all endpoints" svcClusterIP="10.111.212.62" ingressLBCount=0
I0603 21:43:31.001773    5840 proxier.go:1327] "TEST: Endpoints map for service" svcName="endpointslice-2021/example-custom-endpoints:port81" endpointsCount=0
I0603 21:43:31.001773    5840 proxier.go:1329] "TEST: Available services in endpointmap" svcName="kube-system/kube-dns:metrics" endpointsCount=2
I0603 21:43:31.001773    5840 proxier.go:1329] "TEST: Available services in endpointmap" svcName="calico-system/calico-typha:calico-typha" endpointsCount=1
I0603 21:43:31.001773    5840 proxier.go:1329] "TEST: Available services in endpointmap" svcName="endpointslice-8664/example-custom-endpoints:port81" endpointsCount=1
I0603 21:43:31.001773    5840 proxier.go:1329] "TEST: Available services in endpointmap" svcName="kube-system/kube-dns:dns" endpointsCount=2
I0603 21:43:31.001773    5840 proxier.go:1329] "TEST: Available services in endpointmap" svcName="calico-apiserver/calico-api:apiserver" endpointsCount=2
I0603 21:43:31.001773    5840 proxier.go:1329] "TEST: Available services in endpointmap" svcName="kube-system/metrics-server:https" endpointsCount=1
I0603 21:43:31.001773    5840 proxier.go:1329] "TEST: Available services in endpointmap" svcName="kube-system/kube-dns:dns-tcp" endpointsCount=2
I0603 21:43:31.001773    5840 proxier.go:1329] "TEST: Available services in endpointmap" svcName="endpointslice-8664/example-custom-endpoints:port80" endpointsCount=1
I0603 21:43:31.001773    5840 proxier.go:1329] "TEST: Available services in endpointmap" svcName="default/kubernetes:https" endpointsCount=1
I0603 21:43:31.001773    5840 proxier.go:1471] "Associated endpoints for service" endpointInfo="[]" serviceName="endpointslice-2021/example-custom-endpoints:port81"

The service name present in proxier.endpointMap is endpointslice-8664/example-custom-endpoints:port80 whereas the one added in the test framework is endpointslice-2021/example-custom-endpoints:port80

@danwinship
Copy link
Contributor Author

danwinship commented Jun 5, 2025

I think those are two different e2e tests running in parallel; all the tests in test/e2e/network/endpointslice.go would end up with a namespace that starts with "endpointslice-". (And the two tests being promoted here both use a service name of "example-custom-endpoints".)

@aojea
Copy link
Member

aojea commented Jun 5, 2025

checking https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/132073/pull-kubernetes-e2e-capz-windows-master/1930259596274831360

you can get the details of each failure from the report

image

If you click open stderr then you can identify the node from where the requests are sent based on the pod name ause-pod-0 , so you can get the kube-proxy logs that should be relevant, in this case it should be capz-conf-smzm6

0604 14:25:39.796145 4118 dump.go:53] At 2025-06-04 14:23:31 +0000 UTC - event for pause-pod-0: {default-scheduler } Scheduled: Successfully assigned endpointslice-9695/pause-pod-0 to capz-conf-smzm6

@princepereira
Copy link
Contributor

The issue is that the ClusterIP load balancer policy was created with an incorrect internal port - 80 (ServicePort) instead of the intended 8080 (ContainerPort).

I0530 13:36:03.677982    3144 proxier.go:1465] "Associated endpoints for service" endpointInfo="[HnsID:569257cf-a695-4300-80e8-5e7069016fbb, Address:192.168.210.201:0]" serviceName="endpointslice-4965/example-custom-endpoints:port80"
I0530 13:36:03.678017    3144 proxier.go:1484] "Trying to apply Policies for service" serviceInfo="HnsID:, TargetPort:80"
I0530 13:36:03.679385    3144 hns.go:431] "Created Hns loadbalancer policy resource" loadBalancer={"ID":"9d3006f7-b3b5-4f5e-9ccd-329f04cc8a3b","HostComputeEndpoints":["569257cf-a695-4300-80e8-5e7069016fbb"],"SourceVIP":"10.1.0.4","FrontendVIPs":["10.108.165.206"],"PortMappings":[{"Protocol":6,"InternalPort":80,"ExternalPort":80}],"SchemaVersion":{"Major":2,"Minor":0},"Flags":1}

What caused the internal port to be set incorrectly?

This happened because the Service object was defined without explicitly setting the targetPort.

image

According to the Kubernetes specification, when targetPort is omitted, it defaults to the value of port.

https://github.com/kubernetes/kubernetes/blob/b2f27c0649fc0f3d2a4a6dd29135ecc81781f7e4/staging/src/k8s.io/api/core/v1/types.go#L5865C1-L5877C100

As a result, the targetPort was automatically set to 80 here.

I0628 18:24:21.106588 92359 endpointslice.go:561] Prince new Svc created: &Service{ObjectMeta:{example-custom-endpoints endpointslice-9016 090628cf-d8b4-4e85-b1d4-0bbe7f1ef892 27252 0 2025-06-28 18:24:21 +0000 UTC <nil> <nil> map[] map[] [] [] [{e2e.test Update v1 2025-06-28 18:24:21 +0000 UTC FieldsV1 {"f:spec":{"f:internalTrafficPolicy":{},"f:ports":{".":{},"k:{\"port\":80,\"protocol\":\"TCP\"}":{".":{},"f:name":{},"f:port":{},"f:protocol":{},"f:targetPort":{}},"k:{\"port\":81,\"protocol\":\"TCP\"}":{".":{},"f:name":{},"f:port":{},"f:protocol":{},"f:targetPort":{}}},"f:sessionAffinity":{},"f:type":{}}} }]},Spec:ServiceSpec{Ports:[]ServicePort{ServicePort{Name:port80prince,Protocol:TCP,Port:80,TargetPort:{0 80 },NodePort:0,AppProtocol:nil,},ServicePort{Name:port81prince,Protocol:TCP,Port:81,TargetPort:{0 81 },NodePort:0,AppProtocol:nil,},},Selector:map[string]string{},ClusterIP:10.110.74.242,Type:ClusterIP,ExternalIPs:[],SessionAffinity:None,LoadBalancerIP:,LoadBalancerSourceRanges:[],ExternalName:,ExternalTrafficPolicy:,HealthCheckNodePort:0,PublishNotReadyAddresses:false,SessionAffinityConfig:nil,IPFamilyPolicy:*SingleStack,ClusterIPs:[10.110.74.242],IPFamilies:[IPv4],AllocateLoadBalancerNodePorts:nil,LoadBalancerClass:nil,InternalTrafficPolicy:*Cluster,TrafficDistribution:nil,},Status:ServiceStatus{LoadBalancer:LoadBalancerStatus{Ingress:[]LoadBalancerIngress{},},Conditions:[]Condition{},},}

Since the target port was automatically assigned by the test framework (not hardcoded, but derived from the port field based on the service spec), the Windows Proxier constructed the ServiceInfo object using the targetPort from the servicePort

targetPort = port.TargetPort.IntValue()

Since targetPort was already set in the ServiceInfo object, this code had no effect.

svcInfo.targetPort = int(ep.port)

This target port was ultimately used as the internal port when creating the load balancer policy.

uint16(svcInfo.targetPort),

Fix

@danwinship

If you agree with my findings, please let me know how you'd like to proceed. Would you prefer me to update the existing PR, or should I open a new one?

@aojea
Copy link
Member

aojea commented Jun 29, 2025

@princepereira can you send a new PR with the fix and tag Dan and me, during the PR review we can evaluate the next step ... but I'm curious about what is the solution, from your explanation it looks like is a windows proxy only thing?

@princepereira
Copy link
Contributor

princepereira commented Jun 30, 2025

@aojea ,

In Windows kube-proxy, the load balancer policy creation uses the internal port for configuration, whereas the iptables implementation in the Linux kube-proxy directly utilizes EndpointInfo, which already contains the correct target port.

args = append(args, "-m", protocol, "-p", protocol, "-j", "DNAT", "--to-destination", epInfo.String())

Therefore, this issue won’t appear in the Linux case. I haven’t reviewed the IPVS or nftables implementations yet. I’ll be submitting a PR soon with a proposed fix.

Thanks!

@danwinship
Copy link
Contributor Author

the Windows Proxier constructed the ServiceInfo object using the targetPort from the servicePort

Yes, I pointed that out here.

The job of the service proxy is to intercept traffic whose source is described by the Service, and deliver it to the destinations described by the EndpointSlices. Yes, the Service also has some information about the destination of the traffic, but that information is not there for the service proxy to use, it's there for the EndpointSlice controller to use. It's certainly weird that the EndpointSlices don't quite match the ServicePorts in this case, but it shoudn't matter, because the service proxy shouldn't be looking at targetPort anyway, any more than it should try to resolve the destination pod IPs by itself.

@princepereira
Copy link
Contributor

@danwinship , @aojea ,

Please review: #132647

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/conformance Issues or PRs related to kubernetes conformance tests area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
Status: Issues To Triage
Development

Successfully merging this pull request may close these issues.

9 participants