-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix AddUpdateAPIService for openapiv2 #120814
Fix AddUpdateAPIService for openapiv2 #120814
Conversation
/assign @apelisse |
/triage accepted |
70c2e14
to
eb867b9
Compare
@Jefftree will use this PR to try some CI jobs. hope that's ok :) |
/test pull-kubernetes-e2e-containerd-gce |
Please add the exact patch version where the regression was introduced into the release note |
staging/src/k8s.io/kube-aggregator/pkg/controllers/openapi/controller.go
Show resolved
Hide resolved
d.mutex.Lock() | ||
defer d.mutex.Unlock() | ||
d.handler = handler | ||
} | ||
|
||
func (d *cacheableDownloader) Get() (*spec.Swagger, string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get is long-running, right? the mutex addition makes that block UpdateHandler, which now blocks specAggregator#AddUpdateAPIService
That means a hang fetching one service's openapi can block processing of other services, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm that's true. What if we have the mutex only be required when we fetch the handler instead of the entire get operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the only thing we need to guard the handler, or is the etag / spec also concurrently modifiable? (can Get be called from multiple threads?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently don't call Get() from multiple threads but the defacto cached.Value
interface that we follow is supposed to be thread safe (https://github.com/kubernetes/kube-openapi/blob/master/pkg/cached/cache.go#L17-L22).
@apelisse I think this ties into what you mentioned earlier that since we don't evaluate aggregated apiservices lazily, maybe we shouldn't follow cached library interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leaving this open before merge just to get @apelisse's feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't then you'll have to deal with the etag manually, I'm not sure it would better.
staging/src/k8s.io/kube-aggregator/pkg/controllers/openapi/aggregator/aggregator.go
Show resolved
Hide resolved
staging/src/k8s.io/kube-aggregator/pkg/controllers/openapi/aggregator/downloader.go
Outdated
Show resolved
Hide resolved
@@ -218,16 +219,21 @@ func (s *specAggregator) AddUpdateAPIService(apiService *v1.APIService, handler | |||
s.mutex.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if an API Service changes from remote to local (service != nil → service == nil), will the short-circuit return above prevent cleanup from happening? should we call RemoveAPIService in that case?
just noticed this, it's pre-existing in 1.27, so probably best to handle separately in a follow up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, will do that in a follow up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, link the issue here when you open it to close the loop
/retest |
lgtm but will defer to @apelisse on #120814 (comment) before merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments are nits and don't have to be fixed here since they mostly relate to style and have no behavior impact. LGTM for me, thanks.
return spec, etag, err | ||
} | ||
|
||
func (d *cacheableDownloader) getWithHandler(h http.Handler) (*spec.Swagger, string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a get
without handler? The name sounds like there would be. But wait, I think I see where this is coming from. You shouldn't have to receive the handler here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated with the atomic pointer, but kept this function. Separating the two functions lets us decorate the error with the apiservice name. lmk if you have thoughts, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I still think it's weird to pass to getWithHandler
one of it's own member though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just do ...
spec, etag, err := d.get()
...
func (d *cacheableDownloader) get() (*spec.Swagger, string, error) {
h := *d.handler.Load()
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
staging/src/k8s.io/kube-aggregator/pkg/controllers/openapi/aggregator/downloader.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/kube-aggregator/pkg/controllers/openapi/aggregator/downloader.go
Outdated
Show resolved
Hide resolved
d.mutex.Lock() | ||
defer d.mutex.Unlock() | ||
d.handler = handler | ||
} | ||
|
||
func (d *cacheableDownloader) Get() (*spec.Swagger, string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't then you'll have to deal with the etag manually, I'm not sure it would better.
staging/src/k8s.io/kube-aggregator/pkg/controllers/openapi/aggregator/aggregator.go
Show resolved
Hide resolved
/retest |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: apelisse, Jefftree, liggitt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold in case you want to incorporate @apelisse's comments here before backport |
3e23beb
to
14e3a3a
Compare
…nd fix AddUpdateAPIService to update handler
14e3a3a
to
89adbb4
Compare
Thanks |
LGTM label has been added. Git tree hash: 4720303ed0593197c1a1e5ab658781aacb568cfc
|
/hold cancel |
…0814-upstream-release-1.28 Automated cherry pick of #120814: Fix 120758 - prevent cache Load on uninitialized spec
What type of PR is this?
/kind bug
/kind regression
What this PR does / why we need it:
A couple of issues were discovered and they've been combined into this PR since they all need a backport.
For #120758: Triggered by a race when a GET to /openapi/v2 is called between https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/kube-aggregator/pkg/controllers/openapi/aggregator/aggregator.go#L227 and https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/kube-aggregator/pkg/controllers/openapi/aggregator/aggregator.go#L230
Fix is to initialize the cache with empty data so aggregation won't have problems.
For #120739: AddUpdateAPIService was adding and also updating the apiservice spec. The apiservice spec update should be done async by the controller.
Fix is to not update the spec in the addupdateapiservice flow during startup. This is handled async by the openapi controller.
I've discovered #120878 while working on the above two issues: The code path when an apiservice exists and needs to be updated was not handled properly. I've refactored the CacheableDownloader to have an new function
UpdateHandler()
and call that inAddUpdateAPIService()
if the apiservice already exists. I've also removed thedecorateError
function and integrated the better logging directly in the downloader.I've fixed the
AddUpdateAPIService
flow and added integration tests to simulate the cases described in #120758 and #120739. Added additional unit tests to catch #120878. Tried to keep the changes self contained since we need to backport.Which issue(s) this PR fixes:
Fixes #120758
Fixes #120739
Fixes #120878
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: