-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kube-proxy]: Implement KEP-3836 #116470
[Kube-proxy]: Implement KEP-3836 #116470
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/kind feature |
dfe102f
to
c5a53c9
Compare
c5a53c9
to
6afdcf3
Compare
/retitle [Kube-proxy]: Implement KEP-3836 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Will you add "/livez" (or "/currentz" or something else more obvious) in this or a different PR?
I believe this can't really be implemented until Kube has another (and more clear-cut) way for expressing "the node is terminating and about to be deleted"
@bobbypage xref https://github.com/kubernetes/kubernetes/issues/115139
@@ -62,6 +68,7 @@ type proxierHealthServer struct { | |||
|
|||
lastUpdated atomic.Value | |||
oldestPendingQueued atomic.Value | |||
nodeHealthy atomic.Value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atomic.Bool ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will change
@@ -156,7 +170,14 @@ type healthzHandler struct { | |||
} | |||
|
|||
func (h healthzHandler) ServeHTTP(resp http.ResponseWriter, req *http.Request) { | |||
var nodeHealthy bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not talk about node being "healthly" here but "eligible" or "viable" or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I didn't like "healthy" either. I'd like something which goes well with the path: /livez
- nodeLive
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not about being alive either though - I find "eligible" to be the least awkward so far ?
pkg/proxy/iptables/proxier.go
Outdated
@@ -656,6 +660,9 @@ func (proxier *Proxier) OnNodeDelete(node *v1.Node) { | |||
"eventNode", node.Name, "currentNode", proxier.hostname) | |||
return | |||
} | |||
|
|||
proxier.healthzServer.SyncNode(node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is better if you add your own handler, per example NodeHealthzHandler
and register it during the kube-proxy initialization, adding the feature gate on registration, so you don't have to plumb it in all the proxies and doesn't gate executed despite is feature gated
See #111344 for reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that doesn't respect the criteria of having the feature gate enabled/disabled and immediately experiencing changing behavior as a consequence, right? I mean: if the watcher handler is added depending on if the feature gate is on/off, then we'd need to restart/re-initialize kube-proxy if the feature gate is flipped. In this case we always want to compute/react to the Node event, but consider/not consider it depending on the feature gate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't follow, but I'm a bit slow these days, so I may be missing something
n this case we always want to compute/react to the Node event, but consider/not consider it depending on the feature gate.
why do you want to compute it if you never going to use it, you always have to restart ... the feature gate just does that, not execute code that is under feature gate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feature gates can't (today) change live - they always require a restart. That said, this plumbing is minor - I can go either way. Antonio has more context on current best-practice :)
Sorry, forgot about that. I will update |
code-freeze in about 6 hours - are we kicking this to next release? |
Yeah, unfortunately I doubt we will have the time to update this PR + implement the metrics we agreed on the KEP + review it + possibly address the review. I had urgent non-upstream things I needed to focus on today, sorry about that |
TL;DR: we want to start failing the LB HC if a node is tainted with ToBeDeletedByClusterAutoscaler. This field might need refinement, but currently is deemed our best way of understanding if a node is about to get deleted. We want to do this only for eTP:Cluster services. The goal is to connection draining terminating nodes
3c1bbd9
to
08dd657
Compare
@alexanderConstantinescu: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest-required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a good place to document these URLs and semantics. https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/ is auto-generated. Where should such docs go? Ideally every command would have something like "kube-proxy --man" which would show flags and more.
@sftim ideas?
healthy, lastUpdated, currentTime := h.hs.isHealthy() | ||
resp.Header().Set("Content-Type", "application/json") | ||
resp.Header().Set("X-Content-Type-Options", "nosniff") | ||
if !healthy { | ||
metrics.ProxyLivez503Total.Inc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want 2 metrics or just 2 labels on one metric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two labels is much better IMHO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be OOO for code-freeze, so I am going to approve this and hold, and we can either merge as-is, or fixup, or decline changes. /approve |
LGTM label has been added. Git tree hash: 64db7ca76d9ee0c7f820ad33a14f5f26eff3c4b1
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alexanderConstantinescu, thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
If we can make an artefact at build time (eg: JSON with some embedded Markdown), then SIG Docs can consume it. If we mean URL paths like |
Where do we publish it in a way that people who care can find it? We don't
have "man pages" for our binaries, but probably should.
…On Tue, Jul 11, 2023, 3:07 PM Tim Bannister ***@***.***> wrote:
We need a good place to document these URLs and semantics.
https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/
is auto-generated. Where should such docs go? Ideally every command would
have something like "kube-proxy --man" which would show flags and more.
If we can make an artefact at build time (eg: JSON with some embedded
Markdown), then SIG Docs can consume it.
If we mean URL paths like /healthz, maybe OpenAPI format could work?
—
Reply to this email directly, view it on GitHub
<#116470 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKWAVGRIRXYUL4O7XQBO2TXPXFDLANCNFSM6AAAAAAVWPQ3GA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
// ProxyLivez200Total is the number of returned HTTP Status 200 for each | ||
// livez probe. | ||
ProxyLivez200Total = metrics.NewCounter( | ||
&metrics.CounterOpts{ | ||
Subsystem: kubeProxySubsystem, | ||
Name: "proxy_livez_200_total", | ||
Help: "Cumulative proxy livez HTTP status 200", | ||
StabilityLevel: metrics.ALPHA, | ||
}, | ||
) | ||
|
||
// ProxyLivez503Total is the number of returned HTTP Status 503 for each | ||
// livez probe. | ||
ProxyLivez503Total = metrics.NewCounter( | ||
&metrics.CounterOpts{ | ||
Subsystem: kubeProxySubsystem, | ||
Name: "proxy_livez_503_total", | ||
Help: "Cumulative proxy livez HTTP status 503", | ||
StabilityLevel: metrics.ALPHA, | ||
}, | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// ProxyLivez200Total is the number of returned HTTP Status 200 for each | |
// livez probe. | |
ProxyLivez200Total = metrics.NewCounter( | |
&metrics.CounterOpts{ | |
Subsystem: kubeProxySubsystem, | |
Name: "proxy_livez_200_total", | |
Help: "Cumulative proxy livez HTTP status 200", | |
StabilityLevel: metrics.ALPHA, | |
}, | |
) | |
// ProxyLivez503Total is the number of returned HTTP Status 503 for each | |
// livez probe. | |
ProxyLivez503Total = metrics.NewCounter( | |
&metrics.CounterOpts{ | |
Subsystem: kubeProxySubsystem, | |
Name: "proxy_livez_503_total", | |
Help: "Cumulative proxy livez HTTP status 503", | |
StabilityLevel: metrics.ALPHA, | |
}, | |
) | |
// ProxyLivezTotal is the number of returned HTTP Status for each | |
// livez probe. | |
ProxyLivezTotal = metrics.NewCounterVec( | |
&metrics.CounterOpts{ | |
Subsystem: kubeProxySubsystem, | |
Name: "proxy_livez_total", | |
Help: "Cumulative proxy livez HTTP status", | |
StabilityLevel: metrics.ALPHA, | |
}, | |
[]string{"code"}, | |
) | |
just the metrics question https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/3836-kube-proxy-improved-ingress-connectivity-reliability @dgrisonnet @logicalhan can you. please advise us here? |
@thockin if @alexanderConstantinescu is on vacation and can't get to this before code freeze and since this is feature gated, I think we can merge and iterate |
ACK - clear the hold if you are happy! |
/hold cancel |
I am filing this PR, but I definitely not convinced that this is safe to go in as-is. I am filing it because I want to track the limitation which currently blocks it from being safe. Hence, if the KEP does miss its deadline: that we know why. I am therefore putting:
/hold
until we've resolved the issues mentioned below.
This patch implements kubernetes/enhancements#3836
This PR implements exactly what was agreed on the KEP, that is to say:
deletionTimestamp
setThe goal is to allow connection draining of terminating nodes to happen.
The current problem: the unschedulable field is not a good indicator for "the node is terminating". It is true that cordoning a node (making it unschedulable) is usually followed by a drain and then a delete, but there is no guarantee for that. In fact: there are cases where I believe this would completely break cluster ingress, specifically for this case which was discussed on the KEP:
This PR would connection drain ingress for all eTP:Cluster services on the cluster in the case mentioned above.
/cc @thockin @danwinship @aojea
I believe this can't really be implemented until Kube has another (and more clear-cut) way for expressing "the node is terminating and about to be deleted"
What type of PR is this?
/kind feature
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: