[Kube-proxy]: Implement KEP-3836 #116470

alexanderConstantinescu · 2023-03-10T13:50:23Z

I am filing this PR, but I definitely not convinced that this is safe to go in as-is. I am filing it because I want to track the limitation which currently blocks it from being safe. Hence, if the KEP does miss its deadline: that we know why. I am therefore putting:

/hold

until we've resolved the issues mentioned below.

This patch implements kubernetes/enhancements#3836

This PR implements exactly what was agreed on the KEP, that is to say:

for eTP:Cluster services we start failing the HC when the node is unschedulable or marked as deleted by means of having the deletionTimestamp set

The goal is to allow connection draining of terminating nodes to happen.

The current problem: the unschedulable field is not a good indicator for "the node is terminating". It is true that cordoning a node (making it unschedulable) is usually followed by a drain and then a delete, but there is no guarantee for that. In fact: there are cases where I believe this would completely break cluster ingress, specifically for this case which was discussed on the KEP:

I think my company actually does that when we upgrade Kube at one point. We manage the node pools where our workloads run and we do this in manual mode: so when we need to upgrade them we create a new node pool with version N + 1 , then we cordon all existing Nodes in the cluster, but don't evict them, we call some service which will evict them later. But we expect ingress connectivity to work on the unschedulable nodes until that eviction service has kicked in a decided to trigger a restart (which might be minutes / hours)....killing ingress to these workloads for that time, would be bad.

This PR would connection drain ingress for all eTP:Cluster services on the cluster in the case mentioned above.

/cc @thockin @danwinship @aojea

I believe this can't really be implemented until Kube has another (and more clear-cut) way for expressing "the node is terminating and about to be deleted"

What type of PR is this?

/kind feature

What this PR does / why we need it:

As to implement connection draining for terminating nodes

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

[Kube-proxy]: implement connection draining for terminating nodes, KEP-3836

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2023-03-10T13:50:32Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

alexanderConstantinescu · 2023-03-10T13:50:51Z

/kind feature
/sig network

alexanderConstantinescu · 2023-03-10T14:22:49Z

/retitle [Kube-proxy]: Implement KEP-3836

thockin

Will you add "/livez" (or "/currentz" or something else more obvious) in this or a different PR?

I believe this can't really be implemented until Kube has another (and more clear-cut) way for expressing "the node is terminating and about to be deleted"

@bobbypage xref https://github.com/kubernetes/kubernetes/issues/115139

thockin · 2023-03-10T23:28:09Z

pkg/proxy/healthcheck/proxier_health.go

@@ -62,6 +68,7 @@ type proxierHealthServer struct {

 lastUpdated atomic.Value
 oldestPendingQueued atomic.Value
+ nodeHealthy atomic.Value


atomic.Bool ?

Yes, will change

thockin · 2023-03-10T23:30:42Z

pkg/proxy/healthcheck/proxier_health.go

@@ -156,7 +170,14 @@ type healthzHandler struct {
 }

 func (h healthzHandler) ServeHTTP(resp http.ResponseWriter, req *http.Request) {
+ var nodeHealthy bool


I think we should not talk about node being "healthly" here but "eligible" or "viable" or something?

Yeah, I didn't like "healthy" either. I'd like something which goes well with the path: /livez - nodeLive?

It's not about being alive either though - I find "eligible" to be the least awkward so far ?

aojea · 2023-03-13T14:00:33Z

pkg/proxy/iptables/proxier.go

@@ -656,6 +660,9 @@ func (proxier *Proxier) OnNodeDelete(node *v1.Node) {
 "eventNode", node.Name, "currentNode", proxier.hostname)
 return
 }
+
+ proxier.healthzServer.SyncNode(node)


it is better if you add your own handler, per example NodeHealthzHandlerand register it during the kube-proxy initialization, adding the feature gate on registration, so you don't have to plumb it in all the proxies and doesn't gate executed despite is feature gated

See #111344 for reference

But that doesn't respect the criteria of having the feature gate enabled/disabled and immediately experiencing changing behavior as a consequence, right? I mean: if the watcher handler is added depending on if the feature gate is on/off, then we'd need to restart/re-initialize kube-proxy if the feature gate is flipped. In this case we always want to compute/react to the Node event, but consider/not consider it depending on the feature gate.

I can't follow, but I'm a bit slow these days, so I may be missing something

n this case we always want to compute/react to the Node event, but consider/not consider it depending on the feature gate.

why do you want to compute it if you never going to use it, you always have to restart ... the feature gate just does that, not execute code that is under feature gate

Feature gates can't (today) change live - they always require a restart. That said, this plumbing is minor - I can go either way. Antonio has more context on current best-practice :)

alexanderConstantinescu · 2023-03-13T15:57:29Z

Will you add "/livez" (or "/currentz" or something else more obvious) in this or a different PR?

Sorry, forgot about that. I will update

thockin · 2023-03-14T17:03:09Z

code-freeze in about 6 hours - are we kicking this to next release?

alexanderConstantinescu · 2023-03-14T20:05:54Z

code-freeze in about 6 hours - are we kicking this to next release?

Yeah, unfortunately I doubt we will have the time to update this PR + implement the metrics we agreed on the KEP + review it + possibly address the review. I had urgent non-upstream things I needed to focus on today, sorry about that

alexanderConstantinescu · 2023-07-07T16:59:51Z

@thockin / @aojea : just wanted to remind + ask for a review given the code freeze deadline which is approaching. FYI: I am on PTO until the 19th of July - but I don't think there's anything left to address for what concerns this PR.

TL;DR: we want to start failing the LB HC if a node is tainted with ToBeDeletedByClusterAutoscaler. This field might need refinement, but currently is deemed our best way of understanding if a node is about to get deleted. We want to do this only for eTP:Cluster services. The goal is to connection draining terminating nodes

k8s-ci-robot · 2023-07-10T09:10:08Z

@alexanderConstantinescu: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kubernetes-e2e-gci-gce-ipvs	`08dd657`	link	false	`/test pull-kubernetes-e2e-gci-gce-ipvs`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

aroradaman · 2023-07-10T12:59:35Z

/retest-required

thockin

We need a good place to document these URLs and semantics. https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/ is auto-generated. Where should such docs go? Ideally every command would have something like "kube-proxy --man" which would show flags and more.

@sftim ideas?

thockin · 2023-07-11T21:19:58Z

pkg/proxy/healthcheck/proxier_health.go

 healthy, lastUpdated, currentTime := h.hs.isHealthy()
 resp.Header().Set("Content-Type", "application/json")
 resp.Header().Set("X-Content-Type-Options", "nosniff")
 if !healthy {
+ metrics.ProxyLivez503Total.Inc()


Do we want 2 metrics or just 2 labels on one metric?

two labels is much better IMHO

https://github.com/kubernetes/kubernetes/pull/116470/files#r1262282769

thockin · 2023-07-11T21:29:53Z

I'll be OOO for code-freeze, so I am going to approve this and hold, and we can either merge as-is, or fixup, or decline changes.

/approve
/lgtm
/hold

k8s-ci-robot · 2023-07-11T21:30:00Z

LGTM label has been added.

Git tree hash: 64db7ca76d9ee0c7f820ad33a14f5f26eff3c4b1

k8s-ci-robot · 2023-07-11T21:30:17Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexanderConstantinescu, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cmd/kube-proxy/OWNERS~~ [thockin]
~~pkg/features/OWNERS~~ [thockin]
~~pkg/proxy/OWNERS~~ [thockin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sftim · 2023-07-11T22:07:37Z

We need a good place to document these URLs and semantics. https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/ is auto-generated. Where should such docs go? Ideally every command would have something like "kube-proxy --man" which would show flags and more.

If we can make an artefact at build time (eg: JSON with some embedded Markdown), then SIG Docs can consume it.

If we mean URL paths like /healthz, maybe OpenAPI format could work?

thockin · 2023-07-11T22:33:42Z

Where do we publish it in a way that people who care can find it? We don't have "man pages" for our binaries, but probably should.

…

On Tue, Jul 11, 2023, 3:07 PM Tim Bannister ***@***.***> wrote: We need a good place to document these URLs and semantics. https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/ is auto-generated. Where should such docs go? Ideally every command would have something like "kube-proxy --man" which would show flags and more. If we can make an artefact at build time (eg: JSON with some embedded Markdown), then SIG Docs can consume it. If we mean URL paths like /healthz, maybe OpenAPI format could work? — Reply to this email directly, view it on GitHub <#116470 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKWAVGRIRXYUL4O7XQBO2TXPXFDLANCNFSM6AAAAAAVWPQ3GA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

aojea · 2023-07-13T09:25:50Z

pkg/proxy/metrics/metrics.go

+ // ProxyLivez200Total is the number of returned HTTP Status 200 for each
+ // livez probe.
+ ProxyLivez200Total = metrics.NewCounter(
+ &metrics.CounterOpts{
+ Subsystem: kubeProxySubsystem,
+ Name: "proxy_livez_200_total",
+ Help: "Cumulative proxy livez HTTP status 200",
+ StabilityLevel: metrics.ALPHA,
+ },
+ )
+
+ // ProxyLivez503Total is the number of returned HTTP Status 503 for each
+ // livez probe.
+ ProxyLivez503Total = metrics.NewCounter(
+ &metrics.CounterOpts{
+ Subsystem: kubeProxySubsystem,
+ Name: "proxy_livez_503_total",
+ Help: "Cumulative proxy livez HTTP status 503",
+ StabilityLevel: metrics.ALPHA,
+ },
+ )
+


Suggested change

// ProxyLivez200Total is the number of returned HTTP Status 200 for each

// livez probe.

ProxyLivez200Total = metrics.NewCounter(

&metrics.CounterOpts{

Subsystem: kubeProxySubsystem,

Name: "proxy_livez_200_total",

Help: "Cumulative proxy livez HTTP status 200",

StabilityLevel: metrics.ALPHA,

},

)

// ProxyLivez503Total is the number of returned HTTP Status 503 for each

// livez probe.

ProxyLivez503Total = metrics.NewCounter(

&metrics.CounterOpts{

Subsystem: kubeProxySubsystem,

Name: "proxy_livez_503_total",

Help: "Cumulative proxy livez HTTP status 503",

StabilityLevel: metrics.ALPHA,

},

)

// ProxyLivezTotal is the number of returned HTTP Status for each

// livez probe.

ProxyLivezTotal = metrics.NewCounterVec(

&metrics.CounterOpts{

Subsystem: kubeProxySubsystem,

Name: "proxy_livez_total",

Help: "Cumulative proxy livez HTTP status",

StabilityLevel: metrics.ALPHA,

},

[]string{"code"},

)

aojea · 2023-07-13T09:28:52Z

just the metrics question https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/3836-kube-proxy-improved-ingress-connectivity-reliability @dgrisonnet @logicalhan can you. please advise us here?

aojea · 2023-07-14T11:28:43Z

@thockin if @alexanderConstantinescu is on vacation and can't get to this before code freeze and since this is feature gated, I think we can merge and iterate

thockin · 2023-07-14T15:58:12Z

ACK - clear the hold if you are happy!

aojea · 2023-07-15T11:25:51Z

/hold cancel

k8s-ci-robot requested a review from aojea March 10, 2023 13:50

k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Mar 10, 2023

k8s-ci-robot requested review from danwinship and thockin March 10, 2023 13:50

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 10, 2023

alexanderConstantinescu force-pushed the kep-3836-impl branch from dfe102f to c5a53c9 Compare March 10, 2023 14:19

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 10, 2023

alexanderConstantinescu force-pushed the kep-3836-impl branch from c5a53c9 to 6afdcf3 Compare March 10, 2023 14:22

k8s-ci-robot changed the title ~~Implement KEP-3836~~ [Kube-proxy]: Implement KEP-3836 Mar 10, 2023

thockin reviewed Mar 10, 2023

View reviewed changes

thockin self-assigned this Mar 10, 2023

aojea reviewed Mar 13, 2023

View reviewed changes

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 12, 2023

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 10, 2023

alexanderConstantinescu added 2 commits July 10, 2023 10:30

Implement metrics agreed on the KEP

08dd657

alexanderConstantinescu force-pushed the kep-3836-impl branch from 3c1bbd9 to 08dd657 Compare July 10, 2023 08:32

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 10, 2023

thockin reviewed Jul 11, 2023

View reviewed changes

k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jul 11, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 11, 2023

sftim mentioned this pull request Jul 11, 2023

auto-generate man pages for commands (or remove the man pages) #3924

Closed

aojea reviewed Jul 13, 2023

View reviewed changes

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 15, 2023

k8s-ci-robot merged commit f343657 into kubernetes:master Jul 15, 2023
13 of 14 checks passed

k8s-ci-robot added this to the v1.28 milestone Jul 15, 2023

aojea mentioned this pull request Jul 16, 2023

aggregate kube-proxy metrics #119353

Merged

alexanderConstantinescu mentioned this pull request Feb 22, 2024

KEP-3836 documentation for 1.30 kubernetes/website#45290

Closed

alexanderConstantinescu mentioned this pull request Mar 26, 2024

KEP-3836 documentation for 1.30 kubernetes/website#45678

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kube-proxy]: Implement KEP-3836 #116470

[Kube-proxy]: Implement KEP-3836 #116470

alexanderConstantinescu commented Mar 10, 2023 •

edited

k8s-ci-robot commented Mar 10, 2023

alexanderConstantinescu commented Mar 10, 2023

alexanderConstantinescu commented Mar 10, 2023

thockin left a comment

thockin Mar 10, 2023

alexanderConstantinescu Mar 13, 2023

thockin Mar 10, 2023

alexanderConstantinescu Mar 13, 2023

thockin Mar 13, 2023

aojea Mar 13, 2023

alexanderConstantinescu Mar 13, 2023

aojea Mar 13, 2023

thockin Mar 13, 2023

alexanderConstantinescu commented Mar 13, 2023

thockin commented Mar 14, 2023

alexanderConstantinescu commented Mar 14, 2023

alexanderConstantinescu commented Jul 7, 2023

k8s-ci-robot commented Jul 10, 2023 •

edited

aroradaman commented Jul 10, 2023

thockin left a comment

thockin Jul 11, 2023

aojea Jul 13, 2023

aojea Jul 13, 2023

thockin commented Jul 11, 2023

k8s-ci-robot commented Jul 11, 2023

k8s-ci-robot commented Jul 11, 2023

sftim commented Jul 11, 2023

thockin commented Jul 11, 2023 via email

aojea Jul 13, 2023

aojea commented Jul 13, 2023

aojea commented Jul 14, 2023

thockin commented Jul 14, 2023

aojea commented Jul 15, 2023

[Kube-proxy]: Implement KEP-3836 #116470

[Kube-proxy]: Implement KEP-3836 #116470

Conversation

alexanderConstantinescu commented Mar 10, 2023 • edited

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Mar 10, 2023

alexanderConstantinescu commented Mar 10, 2023

alexanderConstantinescu commented Mar 10, 2023

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexanderConstantinescu commented Mar 13, 2023

thockin commented Mar 14, 2023

alexanderConstantinescu commented Mar 14, 2023

alexanderConstantinescu commented Jul 7, 2023

k8s-ci-robot commented Jul 10, 2023 • edited

aroradaman commented Jul 10, 2023

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin commented Jul 11, 2023

k8s-ci-robot commented Jul 11, 2023

k8s-ci-robot commented Jul 11, 2023

sftim commented Jul 11, 2023

thockin commented Jul 11, 2023 via email

Choose a reason for hiding this comment

aojea commented Jul 13, 2023

aojea commented Jul 14, 2023

thockin commented Jul 14, 2023

aojea commented Jul 15, 2023

alexanderConstantinescu commented Mar 10, 2023 •

edited

k8s-ci-robot commented Jul 10, 2023 •

edited