Skip to content

CNI not writing service route to nodes (EKS) #132575

Closed
@pratiksk

Description

@pratiksk

What happened?

The Amazon VPC CNI is not injecting the route to the Kubernetes service CIDR (172.20.0.0/16) into the node's route table. As a result, nodes cannot reach Kubernetes internal services, including the API server via its service IP. This breaks service discovery and authentication for workloads like Vault that rely on the TokenReview API.

Ping from node does not work

[ec2-user@ip-10-0-1-77 ~]$ ping -c 3 172.20.0.1
PING 172.20.0.1 (172.20.0.1) 56(84) bytes of data.

--- 172.20.0.1 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2055ms 

AMI is ami_type = "AL2_x86_64" (yes old but should work, have faced this issue in AL as well) - deployed using TF.

I want to understand why CNI is not doing its work of injecting this route. Or this has to go in user data only?
Also, It's not in a racing condition (tried manually restarting aws-node pods, but still they did not inject)

What did you expect to happen?

Expect CNI to put the route in node

How can we reproduce it (as minimally and precisely as possible)?

can run following terraform (or simply create cluster and try ping
ping -c 3 172.20.0.1

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.1.0"

  name = "abc-vpc"
  cidr = "10.0.0.0/16"

  azs             = ["${var.region}a", "${var.region}b"]
  public_subnets  = ["10.0.1.0/24", "10.0.2.0/24"]
  private_subnets = ["10.0.3.0/24", "10.0.4.0/24"]

  enable_dns_support   = true
  enable_dns_hostnames = true
  enable_nat_gateway   = true
  single_nat_gateway   = true

  public_subnet_tags = {
    "kubernetes.io/cluster/abc-eks" = "shared"
    "kubernetes.io/role/elb"           = "1"
  }

  private_subnet_tags = {
    "kubernetes.io/cluster/abc-eks" = "shared"
    "kubernetes.io/role/internal-elb"  = "1"
  }
}
module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "20.30.0"

  cluster_name    = "abc-eks"
  cluster_version = "1.31"
  authentication_mode = "API"
  bootstrap_self_managed_addons = true

  subnet_ids      = module.vpc.private_subnets
  vpc_id          = module.vpc.vpc_id

  # Enable IRSA support
  enable_irsa = true

  eks_managed_node_groups = {
    abc-ng = {
      desired_size   = 2
      max_size       = 3
      min_size       = 1
      instance_types = ["m6i.2xlarge"]
      capacity_type  = "ON_DEMAND"
      ami_type       = "AL2_x86_64"  # Explicitly specify AL2
      # Optional: add key_name if needed for debugging
      key_name = "clueterABC"
    }
  }
  # to be removed - for using helm locally
  cluster_endpoint_public_access       = true
  cluster_endpoint_public_access_cidrs = ["0.0.0.0/0"]

  # Admin access using access entries
  access_entries = {
    admin = {
      principal_arn = var.iam_user_arn
      type          = "STANDARD"

      policy_associations = {
        admin = {
          policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
          access_scope = {
            type       = "cluster"
            namespaces = []
          }
        }
      }
    }
  }
}

Anything else we need to know?

Chaging AMI does not resolve it

Kubernetes version

$ kubectl version
Client Version: v1.33.0
Kustomize Version: v5.6.0
Server Version: v1.31.9-eks-5d4a308
WARNING: version difference between client (1.33) and server (1.31) exceeds the supported minor version skew of +/-1```

</details>


### Cloud provider

<details>
AWS
</details>


### OS version

<details>

```console
# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.1 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo

$ uname -a
Linux len 5.15.167.4-microsoft-standard-WSL2 #1 SMP Tue Nov 5 00:21:55 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

amazon-k8s-cni:v1.19.0-eksbuild.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/supportCategorizes issue or PR as a support question.needs-sigIndicates an issue or PR lacks a `sig/foo` label and requires one.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions