-
Notifications
You must be signed in to change notification settings - Fork 40.9k
WIP: DRA: automated upgrade/downgrade testing #132295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Skipping CI for Draft Pull Request. |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
A special test (for example, one which manages its own cluster) could almost construct a Framework instance because most fields are exported. The `clientConfig` field isn't because REST configs often need to be deep-copied to avoid accidentally updating some shared copy, so for this special case a SetClientConfig is needed.
EOF occurs after restarting the API server and, despite a retry loop in client-go/rest/request.go, sometimes is returned to the application.
Getting slices can be done with the helper.
Can be done via -vmodule, albeit not precisely because other controllers also have a controller.go file.
During upgrade/downgrade testing, errors are encountered while the apiserver is down. This is normal and handled via retrying, so we don't need to be verbose.
Hiding the error in WithError is the right choice for example when it is used inside ktesting.Eventually. Most callers probably want to deal with the unexpected error themselves. For those who don't, WithErrorLogging continues to log it.
This allows declaring a code region as one step without having to use an anonymous callback function, which has the advantage that variables set during the step are visible afterwards. In Python, this would be done as with ktesting.Step(tctx) as tcxt: // some code code inside step // code not in the same step But Go has no such construct. In contrast to WithStep, the start and end of the step are logged, including timing information.
This is a DRA-specific stop-gap solution for using the E2E framework together with ktesting. Long-term this should better land in the E2E framework itself.
We can recover from exec failing, the portproxy code already retries port forwarding.
That WithCancel added a deferred cleanup which cancels on test termination was unexpected. This automatic cancellation makes sense only for the initial root TContext.
This closes a gap compared to the context package. It's useful when combined with Ginkgo to keep something running beyond the end of the Ginkgo BeforeEach or It node.
Showing the configuration (= variable assignments) without going all the way to KUBE_VERBOSE > 4 is useful.
Some ports (apiserver, one kubelet port) were already configurable. Several others were not. Primarily this is done to document the ports which are in used by the different components.
This may be useful during manual invocations to see what commands would be executed and with which parameters, without actually running them. But the main purpose is to use this mode in automated upgrade/downgrade testing where the caller parses the output to execute those commands under its own control. Such a caller can then replaced individual component binaries with those from other releases.
If we know that the test binary shares the filesystem with the cluster (for example, when using local-up-cluster.sh), then we can avoid the whole complicated portproxy solution and work directly with the paths on the host. Only works with suitable permissions! /var/lib/kubelet/plugins, /var/lib/kubelet/plugin_registry, and /var/run/cdi must be writable. portproxy remains the default because it automatically gains sufficient permissions also when combined with local-up-cluster.sh.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pohly The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-kubernetes-local-e2e |
/test pull-kubernetes-local-e2e pull-kubernetes-kind-dra |
@pohly: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
In upgrade/downgrade testings, sometimes kube-proxy doesn't get started. It's not clear why, perhaps this additional output will show the reason.
The helper code is useful for a separate Ginkgo suite for upgrade/downgrade testing. We don't want to import test/e2e/dra there because that would also define additional tests.
The test brings up the cluster and uses that power to run through an upgrade/downgrade scenario. Version skew testing (running tests while the cluster is partially up- or downgraded) could be added. The new helper code for managing the cluster is written so that it could be used both in an integration test and an E2E test. kubernetes#122481 could make that a bit easier in an E2E test, but is not absolutely required.
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Promotion of any feature, in this case core DRA to GA, depends on (so far) manually testing the upgrade/downgrade path. It's hard to document what exactly was tested in a way that others can verify the procedure and/or replicate it.
This PR adds helper packages for upgrading/downgrading a kind cluster and running E2E tests against, and uses that to test some DRA scenarios. It runs as part of the normal e2e.test invocation in pull/ci-kubernetes-kind-dra.
Which issue(s) this PR is related to:
#128965
KEP: kubernetes/enhancements#4381
Special notes for your reviewer:
It is debatable whether this should be an E2E test at all. Technically this could also be an integration test. It's currently done as E2E test mostly for pragmatic reasons:
The new helper code for managing a kind cluster is written so that it could be used both in an integration test and an E2E test. #122481 could make that a bit easier in an E2E test, but is not absolutely required.
Does this PR introduce a user-facing change?