PostStartHook "start-service-ip-repair-controllers" failed: unable to perform initial IP and Port allocation check

### What happened?

Hi everyone,
I encountered an issue when restarting one of our API servers (v1.29.10). When I restart it, it never comes back up and remains unready. Its post-start hook fails with the following error:
`F0223 14:49:57.253137       1 hooks.go:203] PostStartHook "start-service-ip-repair-controllers" failed: unable to perform initial IP and Port allocation check`
Below is the output of its liveness endpoint (`https://127.0.0.1:6443/livez`):
```
curl -k https://127.0.0.1:6443/livez
[+]ping ok
[+]log ok
[+]etcd ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/storage-object-count-tracker-hook ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[-]poststarthook/start-service-ip-repair-controllers failed: reason withheld
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/priority-and-fairness-config-producer ok
[+]poststarthook/start-system-namespaces-controller ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/start-kube-apiserver-identity-lease-controller ok
[+]poststarthook/start-kube-apiserver-identity-lease-garbage-collector ok
[+]poststarthook/start-legacy-token-tracking-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
[+]poststarthook/apiservice-openapiv3-controller ok
[+]poststarthook/apiservice-discovery-controller ok

livez check failed
```

I also encounter the following errors on my currently running API server instances:

```
E0223 14:48:58.900100       1 repair.go:85] Operation cannot be fulfilled on servicenodeportallocations: the provided resource version does not match
E0223 14:48:59.153223       1 repair.go:127] Operation cannot be fulfilled on serviceipallocations: the provided resource version does not match
```
I also checked my etcd cluster, and everything is OK—there is no latency or I/O wait and read/write time is under 1 millisecond.

![Image](https://github.com/user-attachments/assets/a9bf48a9-2e7e-46f0-8b43-8f4d802dffbf)


### What did you expect to happen?

I expect the API server to work correctly after a restart.

### How can we reproduce it (as minimally and precisely as possible)?

I don't know how to reproduce this situation. I tried it in my staging environment, and everything was fine.

### Anything else we need to know?

I also checked kube-apiserver code and realized that this error may relates to this part of code:
https://github.com/kubernetes/kubernetes/blob/v1.29.10/pkg/registry/core/rest/storage_core.go#L466
or this part:
https://github.com/kubernetes/kubernetes/blob/v1.29.10/pkg/registry/core/service/allocator/storage/storage.go#L203


### Kubernetes version

<details>

```console
$ kubectl version
Client Version: v1.29.10
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.10
```

</details>


### Cloud provider

<details>
self-hosted bare-metal using kubespray.
</details>


### OS version

<details>

```console
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
$ uname -a
Linux controlplane1 5.15.0-92-generic #102-Ubuntu SMP Wed Jan 10 09:33:48 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
```

</details>


### Install tools

<details>
kubespray
</details>


### Container runtime (CRI) and version (if applicable)

<details>
containerd v1.7.22
</details>


### Related plugins (CNI, CSI, ...) and versions (if applicable)

<details>
calico
</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PostStartHook "start-service-ip-repair-controllers" failed: unable to perform initial IP and Port allocation check #130377

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PostStartHook "start-service-ip-repair-controllers" failed: unable to perform initial IP and Port allocation check #130377

Description

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions