Skip to content

Resource Quota race condition between resourcequota-controller and kube-apiserver. #132248

Open
@carlory

Description

@carlory

What happened?

A flaking test [It] [sig-api-machinery] ResourceQuota should create a ResourceQuota and capture the life of a service. [Conformance] is found in this link

Timeline from the kube-apiserver-audit.log

  1. e2e test create a resource quota.
{
  ...
  "requestURI": "/api/v1/namespaces/resourcequota-9291/resourcequotas",
  "verb": "create",
  ...
  "userAgent": "e2e.test/v1.34.0 (linux/amd64) kubernetes/8264af2 -- [sig-api-machinery] ResourceQuota should create a ResourceQuota and capture the life of a service. [Conformance]",
  "requestObject": {
    "kind": "ResourceQuota",
    "apiVersion": "v1",
    "metadata": {
      "name": "test-quota"
    },
    "spec": {
      "hard": {
        ...
        "services": "10",
        "services.loadbalancers": "1",
        "services.nodeports": "1"
        ...
      }
    },
    "status": {}
  },
  ...
"requestReceivedTimestamp": "2025-06-12T03:44:59.743945Z",
  "stageTimestamp": "2025-06-12T03:44:59.852738Z",
}
  1. resourcequota-controller update the resource quota status.
{
  ...
  "requestURI": "/api/v1/namespaces/resourcequota-9291/resourcequotas/test-quota/status",
  "verb": "update",
  ...
  "userAgent": "kube-controller-manager/v1.34.0 (linux/amd64) kubernetes/8264af2/system:serviceaccount:kube-system:resourcequota-controller",
  ...
  "responseStatus": {
    "metadata": {},
    "code": 200
  },
  "requestObject": {
    ...
    "status": {
      "hard": {
        ...
        "services": "10",
        "services.loadbalancers": "1",
        "services.nodeports": "1"
      },
      "used": {
        ...
        "services": "0",
        "services.loadbalancers": "0",
        "services.nodeports": "0"
      }
    }
  },
  ...
  "requestReceivedTimestamp": "2025-06-12T03:45:05.135122Z",
  "stageTimestamp": "2025-06-12T03:45:05.321646Z",
}
  1. the quota is updated by kube-apiserver when a create request of service with type ClusterIP is received.
{
  ...
  "requestURI": "/api/v1/namespaces/resourcequota-9291/resourcequotas/test-quota/status",
  "verb": "update",
  ...
  "userAgent": "kube-apiserver/v1.34.0 (linux/amd64) kubernetes/8264af2",
  ...
  "responseStatus": {
    "metadata": {},
    "code": 200
  },
  ...
  "requestObject": {
    ...
    "status": {
      "hard": {
        ...
        "services": "10",
        "services.loadbalancers": "1",
        "services.nodeports": "1"
      },
      "used": {
        ...
        "services": "1",
        "services.loadbalancers": "0",
        "services.nodeports": "0"
      }
    }
  },
  ...
  "requestReceivedTimestamp": "2025-06-12T03:45:07.512752Z",
  "stageTimestamp": "2025-06-12T03:45:07.549885Z",
  ...
}
  1. a ClusterIP service is created.
{
  ...
  "requestURI": "/api/v1/namespaces/resourcequota-9291/services",
  "verb": "create",
  ...
  "userAgent": "e2e.test/v1.34.0 (linux/amd64) kubernetes/8264af2 -- [sig-api-machinery] ResourceQuota should create a ResourceQuota and capture the life of a service. [Conformance]",
  ...
  "responseStatus": {
    "metadata": {},
    "code": 200
  },
  ...
  "requestReceivedTimestamp": "2025-06-12T03:45:07.399390Z",
  "stageTimestamp": "2025-06-12T03:45:07.579933Z",
}
  1. the quota is updated by kube-apiserver when a create request of service with type NodePort is received.
{
  ...
  "requestURI": "/api/v1/namespaces/resourcequota-9291/resourcequotas/test-quota/status",
  "verb": "update",
  ...
 "userAgent": "kube-apiserver/v1.34.0 (linux/amd64) kubernetes/8264af2",
  ...
  "responseStatus": {
    "metadata": {},
    "code": 200
  },
  ...
  "requestObject": {
    ...
    "status": {
      "hard": {
        ...
        "services": "10",
        "services.loadbalancers": "1",
        "services.nodeports": "1"
      },
      "used": {
        ...
        "services": "2",
        "services.loadbalancers": "0",
        "services.nodeports": "1"
      }
    }
  },
  ...
  "requestReceivedTimestamp": "2025-06-12T03:45:07.727076Z",
  "stageTimestamp": "2025-06-12T03:45:07.755933Z",
  ...
}
  1. a NodePort service is created.
{
  ...
  "requestURI": "/api/v1/namespaces/resourcequota-9291/services",
  "verb": "create",
  ...
  "userAgent": "e2e.test/v1.34.0 (linux/amd64) kubernetes/8264af2 -- [sig-api-machinery] ResourceQuota should create a ResourceQuota and capture the life of a service. [Conformance]",
  ...
  "responseStatus": {
    "metadata": {},
    "code": 200
  },
  "requestObject": {
    "kind": "Service",
    "apiVersion": "v1",
    "metadata": {
      "name": "test-service-np"
    },
    "spec": {
      ...
      "type": "NodePort",
      ...
    },
  },
  ...
  "requestReceivedTimestamp": "2025-06-12T03:45:07.585975Z",
  "stageTimestamp": "2025-06-12T03:45:07.802275Z",
  ...
}
  1. ⚠️ The quota status is updated by resourcequota-controller but the resource quota controller is not aware of the NodePort service before the informer is updated.
{
  ...
  "requestURI": "/api/v1/namespaces/resourcequota-9291/resourcequotas/test-quota/status",
  "verb": "update",
  ...
  "userAgent": "kube-controller-manager/v1.34.0 (linux/amd64) kubernetes/8264af2/system:serviceaccount:kube-system:resourcequota-controller",
  ...
  "responseStatus": {
    "metadata": {},
    "code": 200
  },
  ...
  "requestObject": {
    ...
    "status": {
      ...
      "used": {
        ...
        "services": "1",
        "services.loadbalancers": "0",
        "services.nodeports": "0"
      }
    }
  },
  "requestReceivedTimestamp": "2025-06-12T03:45:07.820111Z",
  "stageTimestamp": "2025-06-12T03:45:07.857385Z",
}

So, the LoadBalance will be allowed to be created, and the e2e test will fail.

What did you expect to happen?

No failure.

How can we reproduce it (as minimally and precisely as possible)?

Hard to reproduce. Please see What happened?

Anything else we need to know?

Similar issues:

Kubernetes version

all

Cloud provider

N/A

OS version

H/A

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.sig/api-machineryCategorizes an issue or PR as relevant to SIG API Machinery.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions