Solved: LeaderElection within a GKE cluster Unauthorized e...

wojpol

I run the very basic app in a GKE cluster. This app uses `leaderelection.RunOrDie` function. I noticed a strange behavior that sometimes when I perform a rolling update in the logs from the pod that is being terminated I can find such errors:

error retrieving resource lock demo/app: Unauthorized
Failed to release lock: Unauthorized

When a new pod is spawned, all is good, and the election works perfectly fine.
It is not always the case but I can't find any reliable reproduction path. I use the following settings for that election:

ReleaseOnCancel: true,
LeaseDuration: 60 * time.Second, //nolint: gomnd
RenewDeadline: 20 * time.Second, //nolint: gomnd
RetryPeriod: 10 * time.Second, //nolint: gomnd

Once I got the SIGTERM signal from K8S, I immediately cancel context used for the election. Afterward I wait 30s and restart my pod.

Do you have any ideas about what may be wrong? It is worth adding that I run the same app in another cloud provider and have never seen such an error.

wojpol

I owe you an explanation. The root cause of my issue was my helm deployment, which was recreating the service account. When I deployed the app, the new service account was created, but the old pod was still running. I fixed my helm chart, and the problem was solved.

View solution in original post

garisingh

Does you pod use a persistent volume?

wojpol

No, I don't use persistent volumes in that app at all.

wojpol

I owe you an explanation. The root cause of my issue was my helm deployment, which was recreating the service account. When I deployed the app, the new service account was created, but the old pod was still running. I fixed my helm chart, and the problem was solved.

LeaderElection within a GKE cluster Unauthorized error