Skip to content

E2E tests: avoid generic poll functions #106575

Open
@pohly

Description

@pohly

What would you like to be added?

Tests that use wait.PollImmediate (directly or indirectly, as in

// WaitForPodNotFoundInNamespace returns an error if it takes too long for the pod to fully terminate.
// Unlike `waitForPodTerminatedInNamespace`, the pod's Phase and Reason are ignored. If the pod Get
// api returns IsNotFound then the wait stops and nil is returned. If the Get api returns an error other
// than "not found" then that error is returned and the wait stops.
func WaitForPodNotFoundInNamespace(c clientset.Interface, podName, ns string, timeout time.Duration) error {
return wait.PollImmediate(poll, timeout, func() (bool, error) {
_, err := c.CoreV1().Pods(ns).Get(context.TODO(), podName, metav1.GetOptions{})
if apierrors.IsNotFound(err) {
return true, nil // done
}
if err != nil {
return true, err // stop wait with error
}
return false, nil
})
}
// WaitForPodToDisappear waits the given timeout duration for the specified pod to disappear.
func WaitForPodToDisappear(c clientset.Interface, ns, podName string, label labels.Selector, interval, timeout time.Duration) error {
return wait.PollImmediate(interval, timeout, func() (bool, error) {
e2elog.Logf("Waiting for pod %s to disappear", podName)
options := metav1.ListOptions{LabelSelector: label.String()}
pods, err := c.CoreV1().Pods(ns).List(context.TODO(), options)
if err != nil {
return false, err
}
found := false
for _, pod := range pods.Items {
if pod.Name == podName {
e2elog.Logf("Pod %s still exists", podName)
found = true
break
}
}
if !found {
e2elog.Logf("Pod %s no longer exists", podName)
return true, nil
}
return false, nil
})
}
) with a custom test function fail poorly when polling times out. Example:

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/storage/pvc_protection.go:72
Nov 13 14:20:10.683: While creating pod that uses the PVC or waiting for the Pod to become Running
Unexpected error:
    <*errors.errorString | 0xc003d86070>: {
        s: "pod \"pvc-tester-w2jjx\" is not Running: timed out waiting for the condition",
    }
    pod "pvc-tester-w2jjx" is not Running: timed out waiting for the condition
occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/storage/pvc_protection.go:96

This is actually one of the better examples. In other cases the only failure message is "timed out waiting for the condition" without any indication what the condition was expected to be. Even in this example it is not clear what the state of the pod was in the end. Were there some events for the pod? What's its status?

Example from 2025, after conversion to wait.PollUntilContextTimeout:

[FAILED] client rate limiter Wait returned an error: context deadline exceeded
In [It] at: k8s.io/kubernetes/test/e2e/apimachinery/resource_quota.go:544

A better way to write such tests is with gomega.Eventually and a custom matcher. Eventually handles the polling. In case of a failure, the matcher is called to produce a failure report which then can include arbitrary additional information that currently developers have to start hunting for in log files, if they can be found there at all.

Metrics testing is another example where some data is checked against certain expectations. Because the data is never logged, it is impossible to tell after a failure what went wrong.

Why is this needed?

Easier debugging after an E2E test failure.

Metadata

Metadata

Labels

area/e2e-test-frameworkIssues or PRs related to refactoring the kubernetes e2e test frameworkkind/featureCategorizes issue or PR as related to a new feature.sig/testingCategorizes an issue or PR as relevant to SIG Testing.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions