Description
What happened?
The Amazon VPC CNI is not injecting the route to the Kubernetes service CIDR (172.20.0.0/16) into the node's route table. As a result, nodes cannot reach Kubernetes internal services, including the API server via its service IP. This breaks service discovery and authentication for workloads like Vault that rely on the TokenReview API.
Ping from node does not work
[ec2-user@ip-10-0-1-77 ~]$ ping -c 3 172.20.0.1
PING 172.20.0.1 (172.20.0.1) 56(84) bytes of data.
--- 172.20.0.1 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2055ms
AMI is ami_type = "AL2_x86_64" (yes old but should work, have faced this issue in AL as well) - deployed using TF.
I want to understand why CNI is not doing its work of injecting this route. Or this has to go in user data only?
Also, It's not in a racing condition (tried manually restarting aws-node pods, but still they did not inject)
What did you expect to happen?
Expect CNI to put the route in node
How can we reproduce it (as minimally and precisely as possible)?
can run following terraform (or simply create cluster and try ping
ping -c 3 172.20.0.1
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.1.0"
name = "abc-vpc"
cidr = "10.0.0.0/16"
azs = ["${var.region}a", "${var.region}b"]
public_subnets = ["10.0.1.0/24", "10.0.2.0/24"]
private_subnets = ["10.0.3.0/24", "10.0.4.0/24"]
enable_dns_support = true
enable_dns_hostnames = true
enable_nat_gateway = true
single_nat_gateway = true
public_subnet_tags = {
"kubernetes.io/cluster/abc-eks" = "shared"
"kubernetes.io/role/elb" = "1"
}
private_subnet_tags = {
"kubernetes.io/cluster/abc-eks" = "shared"
"kubernetes.io/role/internal-elb" = "1"
}
}
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "20.30.0"
cluster_name = "abc-eks"
cluster_version = "1.31"
authentication_mode = "API"
bootstrap_self_managed_addons = true
subnet_ids = module.vpc.private_subnets
vpc_id = module.vpc.vpc_id
# Enable IRSA support
enable_irsa = true
eks_managed_node_groups = {
abc-ng = {
desired_size = 2
max_size = 3
min_size = 1
instance_types = ["m6i.2xlarge"]
capacity_type = "ON_DEMAND"
ami_type = "AL2_x86_64" # Explicitly specify AL2
# Optional: add key_name if needed for debugging
key_name = "clueterABC"
}
}
# to be removed - for using helm locally
cluster_endpoint_public_access = true
cluster_endpoint_public_access_cidrs = ["0.0.0.0/0"]
# Admin access using access entries
access_entries = {
admin = {
principal_arn = var.iam_user_arn
type = "STANDARD"
policy_associations = {
admin = {
policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
access_scope = {
type = "cluster"
namespaces = []
}
}
}
}
}
}
Anything else we need to know?
Chaging AMI does not resolve it
Kubernetes version
$ kubectl version
Client Version: v1.33.0
Kustomize Version: v5.6.0
Server Version: v1.31.9-eks-5d4a308
WARNING: version difference between client (1.33) and server (1.31) exceeds the supported minor version skew of +/-1```
</details>
### Cloud provider
<details>
AWS
</details>
### OS version
<details>
```console
# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.1 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo
$ uname -a
Linux len 5.15.167.4-microsoft-standard-WSL2 #1 SMP Tue Nov 5 00:21:55 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here