Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reference ContainerCluster is not ready #578

Closed
travisrandolph-bestbuy opened this issue Dec 13, 2021 · 13 comments
Closed

reference ContainerCluster is not ready #578

travisrandolph-bestbuy opened this issue Dec 13, 2021 · 13 comments
Labels
bug Something isn't working

Comments

@travisrandolph-bestbuy
Copy link

Hi,

I need help understanding ContainerCluster. It seems like it's updating itself every 10th minute. This leads to our ContainerNodePool's sends out warnings:

image

reference ContainerCluster {{REDACTED}} is not ready

Additional Diagnostic Information

Kubernetes Cluster Version

Client Version: v1.19.2
Server Version: v1.20.10-gke.1600

Config Connector Version

1.60.0

Config Connector Mode

namespaced

@travisrandolph-bestbuy travisrandolph-bestbuy added the bug Something isn't working label Dec 13, 2021
@mbzomowski
Copy link

Hi @travisrandolph-bestbuy, the updating is normal behavior for config connector and can't be changed; all of our resources attempt to reconcile roughly every 10 minutes.

As for the warnings, can you provide some more detail? Is that the entirety of the warning message?

@mbzomowski
Copy link

Also, can you check if your resources have label leaser on, you'll need to turn if off to avoid the API calls for releasing the leases.

@travisrandolph-bestbuy
Copy link
Author

@mbzomowski We are not doing anything to manage conflict. Below is an example of the update in the logs.

{
  "insertId": "{{REDACTED}}",
  "jsonPayload": {
    "reason": "Updating",
    "metadata": {
      "resourceVersion": "{{REDACTED}}",
      "managedFields": [
        {
          "fieldsV1": {
            "f:count": {},
            "f:reason": {},
            "f:involvedObject": {
              "f:apiVersion": {},
              "f:uid": {},
              "f:namespace": {},
              "f:name": {},
              "f:resourceVersion": {},
              "f:kind": {}
            },
            "f:message": {},
            "f:source": {
              "f:component": {}
            },
            "f:lastTimestamp": {},
            "f:firstTimestamp": {},
            "f:type": {}
          },
          "apiVersion": "v1",
          "time": "2021-11-10T08:07:25Z",
          "manager": "manager",
          "fieldsType": "FieldsV1",
          "operation": "Update"
        }
      ],
      "name": "{{REDACTED}}",
      "namespace": "{{REDACTED}}",
      "uid": "{{REDACTED}}",
      "creationTimestamp": "2021-11-10T08:07:25Z"
    },
    "eventTime": null,
    "reportingComponent": "",
    "kind": "Event",
    "type": "Normal",
    "reportingInstance": "",
    "apiVersion": "v1",
    "message": "Update in progress",
    "involvedObject": {
      "kind": "ContainerCluster",
      "name": "{{REDACTED}}",
      "apiVersion": "container.cnrm.cloud.google.com/v1beta1",
      "resourceVersion": "{{REDACTED}}",
      "uid": "{{REDACTED}}",
      "namespace": "{{REDACTED}}"
    },
    "source": {
      "component": "containercluster-controller"
    }
  },
  "resource": {
    "type": "k8s_cluster",
    "labels": {
      "cluster_name": "{{REDACTED}}",
      "project_id": "{{REDACTED}}",
      "location": "us-central1"
    }
  },
  "timestamp": "2021-12-22T20:32:24Z",
  "severity": "INFO",
  "logName": "projects/{{REDACTED}}/logs/events",
  "receiveTimestamp": "2021-12-22T20:32:29.085056647Z"
}

{
  "insertId": "{{REDACTED}}",
  "jsonPayload": {
    "source": {
      "component": "containernodepool-controller"
    },
    "eventTime": null,
    "reportingComponent": "",
    "message": "reference ContainerCluster {{REDACTED}}/{{REDACTED}} is not ready",
    "apiVersion": "v1",
    "reportingInstance": "",
    "type": "Warning",
    "involvedObject": {
      "name": "{{REDACTED}}",
      "apiVersion": "container.cnrm.cloud.google.com/v1beta1",
      "namespace": "{{REDACTED}}",
      "resourceVersion": "{{REDACTED}}",
      "uid": "{{REDACTED}}",
      "kind": "ContainerNodePool"
    },
    "metadata": {
      "namespace": "{{REDACTED}}",
      "uid": "{{REDACTED}}",
      "managedFields": [
        {
          "operation": "Update",
          "fieldsType": "FieldsV1",
          "apiVersion": "v1",
          "time": "2021-12-22T17:11:33Z",
          "manager": "manager",
          "fieldsV1": {
            "f:type": {},
            "f:involvedObject": {
              "f:apiVersion": {},
              "f:resourceVersion": {},
              "f:uid": {},
              "f:kind": {},
              "f:name": {},
              "f:namespace": {}
            },
            "f:count": {},
            "f:firstTimestamp": {},
            "f:source": {
              "f:component": {}
            },
            "f:message": {},
            "f:reason": {},
            "f:lastTimestamp": {}
          }
        }
      ],
      "name": "{{REDACTED}}",
      "creationTimestamp": "2021-12-22T17:11:33Z",
      "resourceVersion": "{{REDACTED}}"
    },
    "kind": "Event",
    "reason": "DependencyNotReady"
  },
  "resource": {
    "type": "k8s_cluster",
    "labels": {
      "project_id": "{{REDACTED}}",
      "cluster_name": "{{REDACTED}}",
      "location": "us-central1"
    }
  },
  "timestamp": "2021-12-22T20:32:24Z",
  "severity": "WARNING",
  "logName": "projects/{{REDACTED}}logs/events",
  "receiveTimestamp": "2021-12-22T20:32:29.085056647Z"
}

{
  "insertId": "{{REDACTED}}",
  "jsonPayload": {
    "logger": "containercluster-controller",
    "msg": "creating/updating underlying resource",
    "timestamp": "2021-12-22T20:32:24.197Z",
    "resource": {
      "namespace": "{{REDACTED}}",
      "name": "{{REDACTED}}"
    }
  },
  "resource": {
    "type": "k8s_container",
    "labels": {
      "container_name": "manager",
      "cluster_name": "{{REDACTED}}",
      "pod_name": "{{REDACTED}}",
      "location": "us-central1",
      "project_id": "{{REDACTED}}",
      "namespace_name": "cnrm-system"
    }
  },
  "timestamp": "2021-12-22T20:32:24.197560787Z",
  "severity": "INFO",
  "labels": {
    "k8s-pod/controller-revision-hash": "{{REDACTED}}",
    "k8s-pod/cnrm_cloud_google_com/scoped-namespace": "{{REDACTED}}",
    "k8s-pod/cnrm_cloud_google_com/system": "true",
    "k8s-pod/cnrm_cloud_google_com/component": "cnrm-controller-manager",
    "compute.googleapis.com/resource_name": "{{REDACTED}}",
    "k8s-pod/statefulset_kubernetes_io/pod-name": "{{REDACTED}}"
  },
  "logName": "projects/{{REDACTED}}/logs/stderr",
  "receiveTimestamp": "2021-12-22T20:32:25.106930208Z"
}

@toumorokoshi
Copy link
Contributor

Hello! As a quick clarification of what @mbzomowski was saying: although it is true that there will be an API GET every 10 minutes to ensure that the resource is up to date, it is not necessarily true that the update will occur every 10 minutes: only in the case where there was a change that needed to be applied.

So update requests every 10 minutes implies that there is something up with the resource: either it is not updating properly, or perhaps some erroneous diff detection on the part of Config Connector.

@travisrandolph-bestbuy can you send us a redacted version of the resource so we can see if we can reproduce locally? The output you sent looks to be logs from the controller, which can be helpful for diagnostics but doesn't help us spot issues like whether label leasing is on or reproducing locally.

Redacted output from a kubebctl get -o yaml would be perfect.

@travisrandolph-bestbuy
Copy link
Author

travisrandolph-bestbuy commented Jan 4, 2022

@toumorokoshi Here's the output.

apiVersion: container.cnrm.cloud.google.com/v1beta1
kind: ContainerCluster
metadata:
  annotations:
    cnrm.cloud.google.com/management-conflict-prevention-policy: none
    cnrm.cloud.google.com/mutable-but-unreadable-fields: '{}'
    cnrm.cloud.google.com/observed-secret-versions: '{}'
    cnrm.cloud.google.com/project-id: {{ REDACTED }}
    cnrm.cloud.google.com/remove-default-node-pool: "true"
    cnrm.cloud.google.com/state-into-spec: merge
    meta.helm.sh/release-name: {{ REDACTED }}
    meta.helm.sh/release-namespace: {{ REDACTED }}
  creationTimestamp: "2021-11-04T22:44:04Z"
  finalizers:
  - cnrm.cloud.google.com/finalizer
  - cnrm.cloud.google.com/deletion-defender
  generation: 15
  labels:
    app.kubernetes.io/managed-by: Helm
  managedFields:
  - apiVersion: container.cnrm.cloud.google.com/v1beta1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:cnrm.cloud.google.com/remove-default-node-pool: {}
          f:meta.helm.sh/release-name: {}
          f:meta.helm.sh/release-namespace: {}
        f:labels:
          .: {}
          f:app.kubernetes.io/managed-by: {}
      f:spec:
        .: {}
        f:addonsConfig:
          .: {}
          f:httpLoadBalancing:
            .: {}
            f:disabled: {}
        f:databaseEncryption:
          .: {}
          f:keyName: {}
          f:state: {}
        f:enableBinaryAuthorization: {}
        f:enableShieldedNodes: {}
        f:initialNodeCount: {}
        f:ipAllocationPolicy:
          .: {}
          f:clusterIpv4CidrBlock: {}
          f:clusterSecondaryRangeName: {}
          f:servicesIpv4CidrBlock: {}
          f:servicesSecondaryRangeName: {}
        f:location: {}
        f:maintenancePolicy:
          .: {}
          f:recurringWindow:
            .: {}
            f:endTime: {}
            f:recurrence: {}
            f:startTime: {}
        f:masterAuthorizedNetworksConfig:
          .: {}
          f:cidrBlocks: {}
        f:networkRef:
          .: {}
          f:name: {}
        f:nodeLocations: {}
        f:notificationConfig:
          .: {}
          f:pubsub:
            .: {}
            f:enabled: {}
            f:topicRef:
              .: {}
              f:name: {}
        f:privateClusterConfig:
          .: {}
          f:enablePrivateEndpoint: {}
          f:enablePrivateNodes: {}
          f:masterIpv4CidrBlock: {}
        f:releaseChannel:
          .: {}
          f:channel: {}
        f:subnetworkRef:
          .: {}
          f:name: {}
        f:verticalPodAutoscaling:
          .: {}
          f:enabled: {}
        f:workloadIdentityConfig:
          .: {}
          f:identityNamespace: {}
    manager: helm
    operation: Update
    time: "2021-11-04T22:44:04Z"
  - apiVersion: container.cnrm.cloud.google.com/v1beta1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:cnrm.cloud.google.com/mutable-but-unreadable-fields: {}
          f:cnrm.cloud.google.com/observed-secret-versions: {}
        f:finalizers:
          .: {}
          v:"cnrm.cloud.google.com/deletion-defender": {}
          v:"cnrm.cloud.google.com/finalizer": {}
      f:spec:
        f:addonsConfig:
          f:gcePersistentDiskCsiDriverConfig:
            .: {}
            f:enabled: {}
          f:networkPolicyConfig:
            .: {}
            f:disabled: {}
        f:clusterAutoscaling:
          .: {}
          f:autoscalingProfile: {}
          f:enabled: {}
        f:clusterIpv4Cidr: {}
        f:clusterTelemetry:
          .: {}
          f:type: {}
        f:defaultMaxPodsPerNode: {}
        f:defaultSnatStatus:
          .: {}
          f:disabled: {}
        f:loggingService: {}
        f:masterAuth:
          .: {}
          f:clientCertificateConfig:
            .: {}
            f:issueClientCertificate: {}
          f:clusterCaCertificate: {}
        f:monitoringService: {}
        f:networkPolicy:
          .: {}
          f:enabled: {}
          f:provider: {}
        f:networkingMode: {}
        f:nodeConfig:
          .: {}
          f:diskSizeGb: {}
          f:diskType: {}
          f:imageType: {}
          f:machineType: {}
          f:metadata:
            .: {}
            f:block-project-ssh-keys: {}
            f:disable-legacy-endpoints: {}
          f:oauthScopes: {}
          f:serviceAccountRef:
            .: {}
            f:external: {}
          f:shieldedInstanceConfig:
            .: {}
            f:enableIntegrityMonitoring: {}
            f:enableSecureBoot: {}
          f:tags: {}
          f:workloadMetadataConfig:
            .: {}
            f:nodeMetadata: {}
        f:nodeVersion: {}
        f:podSecurityPolicyConfig:
          .: {}
          f:enabled: {}
        f:privateClusterConfig:
          f:masterGlobalAccessConfig:
            .: {}
            f:enabled: {}
          f:peeringName: {}
          f:privateEndpoint: {}
          f:publicEndpoint: {}
        f:resourceID: {}
      f:status:
        .: {}
        f:conditions: {}
        f:endpoint: {}
        f:instanceGroupUrls: {}
        f:labelFingerprint: {}
        f:masterVersion: {}
        f:observedGeneration: {}
        f:selfLink: {}
        f:servicesIpv4Cidr: {}
    manager: cnrm-controller-manager
    operation: Update
    time: "2021-12-13T22:14:03Z"
  name: kubeflow
  namespace: {{ REDACTED }}
  resourceVersion: {{ REDACTED }}
  uid: {{ REDACTED }}
spec:
  addonsConfig:
    gcePersistentDiskCsiDriverConfig:
      enabled: true
    httpLoadBalancing:
      disabled: false
    networkPolicyConfig:
      disabled: true
  clusterAutoscaling:
    autoscalingProfile: BALANCED
    enabled: false
  clusterIpv4Cidr: {{ REDACTED }}
  clusterTelemetry:
    type: ENABLED
  databaseEncryption:
    keyName: {{ REDACTED }}
    state: ENCRYPTED
  defaultMaxPodsPerNode: 110
  defaultSnatStatus:
    disabled: false
  enableBinaryAuthorization: true
  enableShieldedNodes: true
  initialNodeCount: 1
  ipAllocationPolicy:
    clusterIpv4CidrBlock: {{ REDACTED }}
    clusterSecondaryRangeName: {{ REDACTED }}
    servicesIpv4CidrBlock: {{ REDACTED }}
    servicesSecondaryRangeName: {{ REDACTED }}
  location: us-central1
  loggingService: logging.googleapis.com/kubernetes
  maintenancePolicy:
    recurringWindow:
      endTime: {{ REDACTED }}
      recurrence: {{ REDACTED }}
      startTime: {{ REDACTED }}
  masterAuth:
    clientCertificateConfig:
      issueClientCertificate: false
    clusterCaCertificate: {{ REDACTED }}
  masterAuthorizedNetworksConfig:
    cidrBlocks:
    - cidrBlock: {{ REDACTED }}
    - cidrBlock: {{ REDACTED }}
  monitoringService: monitoring.googleapis.com/kubernetes
  networkPolicy:
    enabled: false
    provider: PROVIDER_UNSPECIFIED
  networkRef:
    name: {{ REDACTED }}
  networkingMode: VPC_NATIVE
  nodeConfig:
    diskSizeGb: 100
    diskType: pd-standard
    imageType: COS
    machineType: n2-standard-2
    metadata:
      block-project-ssh-keys: "true"
      disable-legacy-endpoints: "true"
    oauthScopes:
    - https://www.googleapis.com/auth/cloud-platform
    serviceAccountRef:
      external: {{ REDACTED }}
    shieldedInstanceConfig:
      enableIntegrityMonitoring: true
      enableSecureBoot: true
    tags:
    - {{ REDACTED }}
    - {{ REDACTED }}
    workloadMetadataConfig:
      nodeMetadata: GKE_METADATA_SERVER
  nodeLocations:
  - us-central1-b
  - us-central1-c
  nodeVersion: {{ REDACTED }}
  notificationConfig:
    pubsub:
      enabled: true
      topicRef:
        name: {{ REDACTED }}
  podSecurityPolicyConfig:
    enabled: false
  privateClusterConfig:
    enablePrivateEndpoint: false
    enablePrivateNodes: true
    masterGlobalAccessConfig:
      enabled: false
    masterIpv4CidrBlock: {{ REDACTED }}
    peeringName: {{ REDACTED }}
    privateEndpoint: {{ REDACTED }}
    publicEndpoint: {{ REDACTED }}
  releaseChannel:
    channel: stable
  resourceID: {{ REDACTED }}
  subnetworkRef:
    name: {{ REDACTED }}
  verticalPodAutoscaling:
    enabled: true
  workloadIdentityConfig:
    identityNamespace: {{ REDACTED }}.svc.id.goog
status:
  conditions:
  - lastTransitionTime: "2022-01-04T17:31:44Z"
    message: Update in progress
    reason: Updating
    status: "False"
    type: Ready
  endpoint: {{ REDACTED }}
  instanceGroupUrls:
  - {{ REDACTED }}
  labelFingerprint: {{ REDACTED }}
  masterVersion: {{ REDACTED }}
  observedGeneration: 15
  selfLink: {{ REDACTED }}
  servicesIpv4Cidr: {{ REDACTED }}

@toumorokoshi
Copy link
Contributor

Thanks for the info! Based on this annotation:

cnrm.cloud.google.com/management-conflict-prevention-policy: none

there is no conflict prevention on, and most likely there is some field that is being detected as being modified over and over again.

Let me reach to our current bug rotation to follow-up.

@maqiuyujoyce
Copy link
Collaborator

Hi @travisrandolph-bestbuy , sorry for the delayed response. I was trying to create a GKE cluster with the configuration you provided (except for enableBinaryAuthorization: true due to some issues that might require further digging), but was not able to reproduce this issue. The created cluster was not updated regularly.

On the other hand, I noticed that if there is any update to the cluster, there should be a corresponding log for the update operation. Could you check the GKE operation logs associated with the cluster to get more information about what has been updated and whether the changes in the operations match the yaml?

An example query to search for GKE operations in Logs Explorer would be:

resource.type="gke_cluster"
resource.labels.project_id="[your project ID]"
resource.labels.location="[your cluster region]"
resource.labels.cluster_name="[your cluster name]"

@travisrandolph-bestbuy
Copy link
Author

@maqiuyujoyce Here is the update I'm seeing regularly.

{
insertId: "{{REDACTED}}"
logName: "{{REDACTED}}"
operation: {
first: true
id: "{{REDACTED}}"
producer: "container.googleapis.com"
}
protoPayload: {
@type: "type.googleapis.com/google.cloud.audit.AuditLog"
authenticationInfo: {
principalEmail: "{{REDACTED}}"
principalSubject: "{{REDACTED}}"
serviceAccountDelegationInfo: [
0: {
principalSubject: "{{REDACTED}}"
}]}
authorizationInfo: [
0: {
granted: true
permission: "container.clusters.update"
resourceAttributes: {
}}]
methodName: "google.container.v1beta1.ClusterManager.UpdateCluster"
request: {
@type: "type.googleapis.com/google.container.v1alpha1.UpdateClusterRequest"
name: "{{REDACTED}}"
update: {
desiredMasterAuthorizedNetworksConfig: {
cidrBlocks: [
0: {
cidrBlock: "{{REDACTED}}"
}]
enabled: true
}}}
requestMetadata: {
callerIp: "gce-internal-ip"
callerSuppliedUserAgent: "google-api-go-client/0.5 Terraform/ (+https://www.terraform.io) Terraform-Plugin-SDK/2.5.0 terraform-provider-google-beta/kcc/controller-manager,gzip(gfe)"
destinationAttributes: {
}
requestAttributes: {
auth: {
}
time: "{{REDACTED}}"
}}
resourceLocation: {
currentLocations: [
0: "us-central1"
]}
resourceName: "{{REDACTED}}"
response: {
@type: "type.googleapis.com/google.container.v1alpha1.Operation"
name: "{{REDACTED}}"
operationType: "UPDATE_CLUSTER"
selfLink: "{{REDACTED}}"
startTime: "{{REDACTED}}"
status: "RUNNING"
targetLink: "{{REDACTED}}"
}
serviceName: "container.googleapis.com"
}
receiveTimestamp: "{{REDACTED}}"
resource: {
labels: {
cluster_name: "{{REDACTED}}"
location: "us-central1"
project_id: "{{REDACTED}}"
}
type: "gke_cluster"
}
severity: "NOTICE"
timestamp: "{{REDACTED}}"
}
`

@jcanseco
Copy link
Member

Hi @travisrandolph-bestbuy, I think this issue is probably being caused by spec.releaseChannel.channel: stable. Can you try setting the field to STABLE instead?

Officially, the field only recognizes all-caps values as per the field's description in the docs. I can file a bug to allow for all-lower-case values as well to avoid this issue in the future, but we probably won't be able to prioritize this improvement for a while if the all-caps values do work as intended.

If using STABLE is not enough to fix the issue, it's possible that the diff is coming from one of the {{REDACTED}} values. In that case, I would try comparing your K8s resource state against the output of gcloud container clusters describe and look for any differences. This is, of course, not a perfect way to debug the issue since the two object models are different, but they are similar enough that you can sometimes use them to look for unintended diffs.

@travisrandolph-bestbuy
Copy link
Author

travisrandolph-bestbuy commented Jan 21, 2022

@jcanseco Updating the release channel to use upper case resolved some of our clusters. One of the logs looks like it was trying to update the maintenance window. After looking into it we were leaving the Zulu time specification off the end of the start and end times. Once I updated my configs to '2019-09-02T15:00:00Z', it stopped trying to update. After these changes most of our clusters are fine now. The last problem seems to be with 'masterAuthorizedNetworksConfig'. I've added the mismatch in the second code box below. We can't add the boolean, so I'm not sure we can do anything to fix that.

authorizationInfo: [
0: {
granted: true
permission: "container.clusters.update"
resourceAttributes: {
}}]
methodName: "google.container.v1beta1.ClusterManager.SetMaintenancePolicy"
request: {
@type: "type.googleapis.com/google.container.v1alpha1.SetMaintenancePolicyRequest"
maintenancePolicy: {
resourceVersion: "33b59b10"
window: {
recurringWindow: {
recurrence: "FREQ=WEEKLY;BYDAY=MO,TU,WE,TH,FR"
window: {
endTime: "2089-09-02T23:00:00Z"
startTime: "2019-09-02T15:00:00Z"
}}}}
Our config...
masterAuthorizedNetworksConfig:
  cidrBlocks:
  - cidrBlock: {{REDACTED_MATCH}}

gcloud output
masterAuthorizedNetworksConfig:
  cidrBlocks:
  - cidrBlock: {{REDACTED_MATCH}}
  enabled: true

@jcanseco
Copy link
Member

Hey @travisrandolph-bestbuy, I'm glad to hear that most of your clusters are no longer seeing the issue.

Re: masterAuthorizedNetworksConfig.enabled: this is actually one case where the "diff against gcloud output" approach doesn't work perfectly since KCC hides the enabled flag from the user (KCC sets it to true as long as masterAuthorizedNetworksConfig is set, and sets it to false otherwise).

Would you be willing to share the kubectl get -o yaml and gcloud container clusters describe outputs of the last problematic cluster(s)? We can try to take a look and help check for unintended diffs.

@travisrandolph-bestbuy
Copy link
Author

@jcanseco After pushing both the prior changes to all clusters our resources are no longer updating every 10 minutes. I'm closing this one out!

@jcanseco
Copy link
Member

@travisrandolph-bestbuy, that's great to hear! Thanks for keeping us updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants