Skip to content

PostStartHook "scheduling/bootstrap-system-priority-classes" failed: unable to add default system priority classes: timed out waiting for the condition` #123089

Open
@lancuixian

Description

@lancuixian

What happened?

PostStartHook failed
E0122 06:54:22.018964 10 writers.go:131] apiserver was unable to write a fallback JSON response: http: Handler timeout W0122 06:54:22.457480 10 storage_scheduling.go:106] unable to get PriorityClass system-node-critical: Get "https://*.*.*.*:8443/apis/scheduling.k8s.io/v1/priorityclasses/system-node-critical": net/http: TLS handshake timeout. Retrying... F0122 06:54:22.457615 10 hooks.go:203] PostStartHook "scheduling/bootstrap-system-priority-classes" failed: unable to add default system priority classes: timed out waiting for the condition

What did you expect to happen?

PostStartHook not failed

How can we reproduce it (as minimally and precisely as possible)?

1.Apiserver is deployed using static Pods
2.The other service accesses Apiserver through Apiserver's serviceip
3.Limit the CPU of the Apiserver as much as possible
4.Simulate as many requests as possible to overload Apiserver's CPU

Anything else we need to know?

All Poststarthooks are called asynchronously by the go coroutine
Through reading the code, I found that the service and endpoint tuning process of Apiserver is also carried out by goroutine(pkg/controlplane/instance.go:508:)
his leads to the problem of executing other poststarthooks after the Apiserver has been placed on the endpoint back end to provide service. Since Apiserver has already provided services at this time, if there is a high concurrency and many requests, the load of Apiserver will be too high, and eventually the PostStartHook request will time out, and Apiserver will eventually kill itself

Kubernetes version

[root@master1 ~]# kubectl version
Client Version: v1.28.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.1

Cloud provider

nil

OS version

nil

Install tools

look:How can we reproduce it (as minimally and precisely as possible)?

Container runtime (CRI) and version (if applicable)

containerd

Related plugins (CNI, CSI, ...) and versions (if applicable)

CNI:calico

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.sig/api-machineryCategorizes an issue or PR as relevant to SIG API Machinery.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions