I run the very basic app in a GKE cluster. This app uses `leaderelection.RunOrDie` function. I noticed a strange behavior that sometimes when I perform a rolling update in the logs from the pod that is being terminated I can find such errors:
error retrieving resource lock demo/app: Unauthorized
Failed to release lock: Unauthorized
When a new pod is spawned, all is good, and the election works perfectly fine.
It is not always the case but I can't find any reliable reproduction path. I use the following settings for that election:
ReleaseOnCancel: true,
LeaseDuration: 60 * time.Second, //nolint: gomnd
RenewDeadline: 20 * time.Second, //nolint: gomnd
RetryPeriod: 10 * time.Second, //nolint: gomnd
Once I got the SIGTERM signal from K8S, I immediately cancel context used for the election. Afterward I wait 30s and restart my pod.
Do you have any ideas about what may be wrong? It is worth adding that I run the same app in another cloud provider and have never seen such an error.
Solved! Go to Solution.
I owe you an explanation. The root cause of my issue was my helm deployment, which was recreating the service account. When I deployed the app, the new service account was created, but the old pod was still running. I fixed my helm chart, and the problem was solved.
Does you pod use a persistent volume?
No, I don't use persistent volumes in that app at all.
I owe you an explanation. The root cause of my issue was my helm deployment, which was recreating the service account. When I deployed the app, the new service account was created, but the old pod was still running. I fixed my helm chart, and the problem was solved.