Skip to content

testing.StartTestServer doesn't tear down cleanly #49489

Open
@ironcladlou

Description

@ironcladlou

What happened:

The new k8s.io/kubernetes/cmd/kube-apiserver/app/testing.StartTestServerOrDie function (introduced in #46865) returns a teardown function which doesn't cleanly shut down the test server. This results in the accumulation of goroutines and log spam which prevents effective use of the test server across multiple test functions within the same process.

What you expected to happen:

Calling the teardown function should gracefully terminate everything that started up when StartTestServerOrDie was called.

How to reproduce it (as minimally and precisely as possible):

Using the following sample integration test code:

import (
	"fmt"
	"runtime"
	"testing"

	apitesting "k8s.io/kubernetes/cmd/kube-apiserver/app/testing"
)

func TestTeardown(t *testing.T) {
	_, tearDown := apitesting.StartTestServerOrDie(t)
	tearDown()
	stack := make([]byte, 8196)
	size := 0
	for {
		size = runtime.Stack(stack, true)
		if size < len(stack) {
			break
		}
		stack = make([]byte, len(stack)*2)
	}
	fmt.Printf("%s\n", string(stack[0:size]))
}

After tearDown() returns, there are several lingering goroutines which forever attempt to maintain etcd connections to the terminated etcd instance. Here's an example stack dump:

https://gist.github.com/ironcladlou/52b3e3306948db3943b426c70ce7f85b

Among all the etcd connection threads, some things you'll notice are lingering Cacher instances (which are created due to the default EnableWatchCache storage setting) which seem to try and hold watches, and configuration_manager (which may or may not hold connections; I'm not sure yet). This seems to indicate various components started during apiserver setup aren't actually shutting down.

Anything else we need to know?:

This is important for enabling integration testing of custom resource garbage collection (#47665).

Environment:

  • Kubernetes version (use kubectl version): master (as of 088141c)
  • Cloud provider or hardware configuration**:
  • OS (e.g. from /etc/os-release): darwin/amd64
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

/cc @sttts @caesarxuchao @deads2k @liggitt @kubernetes/sig-api-machinery-bugs
/kind bug

Metadata

Metadata

Assignees

Labels

area/apiserverkind/bugCategorizes issue or PR as related to a bug.lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.priority/backlogHigher priority than priority/awaiting-more-evidence.sig/api-machineryCategorizes an issue or PR as relevant to SIG API Machinery.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions