-
Notifications
You must be signed in to change notification settings - Fork 40.9k
bugfix(hpa): introduce buildQuantity helper for consistent resource quantity #132351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
before change: ➜ ~ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
test-hpa Deployment/test-deployment cpu: 25%/50%, memory: 71421952/1Gi 1 10 1 4h
➜ ~ kubectl get hpa -oyaml
apiVersion: v1
items:
- apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"autoscaling/v2","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"test-hpa","namespace":"default"},"spec":{"maxReplicas":10,"metrics":[{"resource":{"name":"cpu","target":{"averageUtilization":50,"type":"Utilization"}},"type":"Resource"},{"resource":{"name":"memory","target":{"averageValue":"1Gi","type":"AverageValue"}},"type":"Resource"}],"minReplicas":1,"scaleTargetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"test-deployment"}}}
creationTimestamp: "2025-06-17T07:16:03Z"
name: test-hpa
namespace: default
resourceVersion: "3841389"
uid: b8b9a7d2-32e2-4b29-861c-7a489ecc5c5f
spec:
maxReplicas: 10
metrics:
- resource:
name: cpu
target:
averageUtilization: 50
type: Utilization
type: Resource
- resource:
name: memory
target:
averageValue: 1Gi
type: AverageValue
type: Resource
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: test-deployment
status:
conditions:
- lastTransitionTime: "2025-06-17T10:55:21Z"
message: recommended size matches current size
reason: ReadyForNewScale
status: "True"
type: AbleToScale
- lastTransitionTime: "2025-06-17T11:06:24Z"
message: the HPA was able to successfully calculate a replica count from cpu
resource utilization (percentage of request)
reason: ValidMetricFound
status: "True"
type: ScalingActive
- lastTransitionTime: "2025-06-17T07:16:33Z"
message: the desired count is within the acceptable range
reason: DesiredWithinRange
status: "False"
type: ScalingLimited
currentMetrics:
- resource:
current:
averageUtilization: 25
averageValue: 38m
name: cpu
type: Resource
- resource:
current:
averageValue: "71421952"
name: memory
type: Resource
currentReplicas: 1
desiredReplicas: 1
lastScaleTime: "2025-06-17T10:11:17Z"
kind: List
metadata:
resourceVersion: ""
after changed: ➜ ~ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
test-hpa Deployment/test-deployment cpu: 36%/50%, memory: 108236Ki/1Gi 1 10 1 4h2m
➜ ~ kubectl get hpa -oyaml
apiVersion: v1
items:
- apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"autoscaling/v2","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"test-hpa","namespace":"default"},"spec":{"maxReplicas":10,"metrics":[{"resource":{"name":"cpu","target":{"averageUtilization":50,"type":"Utilization"}},"type":"Resource"},{"resource":{"name":"memory","target":{"averageValue":"1Gi","type":"AverageValue"}},"type":"Resource"}],"minReplicas":1,"scaleTargetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"test-deployment"}}}
creationTimestamp: "2025-06-17T07:16:03Z"
name: test-hpa
namespace: default
resourceVersion: "3841487"
uid: b8b9a7d2-32e2-4b29-861c-7a489ecc5c5f
spec:
maxReplicas: 10
metrics:
- resource:
name: cpu
target:
averageUtilization: 50
type: Utilization
type: Resource
- resource:
name: memory
target:
averageValue: 1Gi
type: AverageValue
type: Resource
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: test-deployment
status:
conditions:
- lastTransitionTime: "2025-06-17T10:55:21Z"
message: recommended size matches current size
reason: ReadyForNewScale
status: "True"
type: AbleToScale
- lastTransitionTime: "2025-06-17T11:06:24Z"
message: the HPA was able to successfully calculate a replica count from cpu
resource utilization (percentage of request)
reason: ValidMetricFound
status: "True"
type: ScalingActive
- lastTransitionTime: "2025-06-17T07:16:33Z"
message: the desired count is within the acceptable range
reason: DesiredWithinRange
status: "False"
type: ScalingLimited
currentMetrics:
- resource:
current:
averageUtilization: 36
averageValue: 55m
name: cpu
type: Resource
- resource:
current:
averageValue: 108236Ki
name: memory
type: Resource
currentReplicas: 1
desiredReplicas: 1
lastScaleTime: "2025-06-17T10:11:17Z"
kind: List
metadata:
resourceVersion: "" |
/test pull-kubernetes-e2e-autoscaling-hpa-cm |
/retest |
1 similar comment
/retest |
func buildQuantity(resourceName v1.ResourceName, rawProposal int64) resource.Quantity { | ||
if resourceName == v1.ResourceMemory { | ||
// Convert bytes to KiB | ||
kib := rawProposal / 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be divided by 1024, not 1000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very interesting question. 🤔 I looked at it for some time and found that the metrics are all obtained using the MilliValue()
, so 1000 is not 1024 (binary). In addition, after debugging locally, I used /1000 and it was also the correct value.
kubernetes/pkg/controller/podautoscaler/horizontal.go
Lines 602 to 613 in ced19ff
if target.AverageValue != nil { | |
var rawProposal int64 | |
replicaCountProposal, rawProposal, timestampProposal, err := a.replicaCalc.GetRawResourceReplicas(ctx, currentReplicas, target.AverageValue.MilliValue(), resourceName, tolerances, namespace, selector, container) | |
if err != nil { | |
return 0, nil, time.Time{}, "", condition, fmt.Errorf("failed to get %s usage: %v", resourceName, err) | |
} | |
metricNameProposal = fmt.Sprintf("%s resource", resourceName.String()) | |
status := autoscalingv2.MetricValueStatus{ | |
AverageValue: resource.NewMilliQuantity(rawProposal, resource.DecimalSI), | |
} | |
return replicaCountProposal, &status, timestampProposal, metricNameProposal, autoscalingv2.HorizontalPodAutoscalerCondition{}, nil | |
} |
kubernetes/pkg/controller/podautoscaler/metrics/client.go
Lines 110 to 131 in ced19ff
func getPodMetrics(ctx context.Context, rawMetrics []metricsapi.PodMetrics, resource v1.ResourceName) PodMetricsInfo { | |
res := make(PodMetricsInfo, len(rawMetrics)) | |
for _, m := range rawMetrics { | |
podSum := int64(0) | |
missing := len(m.Containers) == 0 | |
for _, c := range m.Containers { | |
resValue, found := c.Usage[resource] | |
if !found { | |
missing = true | |
klog.FromContext(ctx).V(2).Info("Missing resource metric", "resourceMetric", resource, "pod", klog.KRef(m.Namespace, m.Name)) | |
break | |
} | |
podSum += resValue.MilliValue() | |
} | |
if !missing { | |
res[m.Name] = PodMetric{ | |
Timestamp: m.Timestamp.Time, | |
Window: m.Window.Duration, | |
Value: podSum, | |
} | |
} | |
} |
➜ ~ kubectl get PodMetrics
NAME CPU MEMORY WINDOW
test-deployment-6d56d679c5-4zmtl 28395987n 69600Ki 15.023s
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"autoscaling/v2","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"test-hpa","namespace":"default"},"spec":{"maxReplicas":10,"metrics":[{"resource":{"name":"cpu","target":{"averageUtilization":50,"type":"Utilization"}},"type":"Resource"},{"resource":{"name":"memory","target":{"averageUtilization":50,"type":"Utilization"}},"type":"Resource"}],"minReplicas":1,"scaleTargetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"test-deployment"}}}
creationTimestamp: "2025-06-17T14:45:26Z"
name: test-hpa
namespace: default
resourceVersion: "3849312"
uid: d3cd047c-54af-40fd-af10-2cc15011658e
spec:
maxReplicas: 10
metrics:
- resource:
name: cpu
target:
averageUtilization: 50
type: Utilization
type: Resource
- resource:
name: memory
target:
averageUtilization: 50
type: Utilization
type: Resource
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: test-deployment
status:
conditions:
- lastTransitionTime: "2025-06-17T14:46:37Z"
message: the HPA controller was able to update the target scale to 3
reason: SucceededRescale
status: "True"
type: AbleToScale
- lastTransitionTime: "2025-06-17T15:07:38Z"
message: the HPA was able to successfully calculate a replica count from memory
resource utilization (percentage of request)
reason: ValidMetricFound
status: "True"
type: ScalingActive
- lastTransitionTime: "2025-06-17T14:46:52Z"
message: the desired count is within the acceptable range
reason: DesiredWithinRange
status: "False"
type: ScalingLimited
currentMetrics:
- resource:
current:
averageUtilization: 19
averageValue: 29m
name: cpu
type: Resource
- resource:
current:
averageUtilization: 106
averageValue: 69600Ki
name: memory
type: Resource
currentReplicas: 1
desiredReplicas: 3
lastScaleTime: "2025-06-17T15:07:38Z"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. Yeah I think you are right :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the returned value from metrics client is MiliValue()
the above example by @googs1025 clearly points to that, I don't think that dividing by 1000 is necessary here. You should just invoke resource.NewMilliQuantity...
just like it was done previously.
The one change I do agree with is the differentiation between memory (being BinarySI
) and cpu (being DecimalSI
) which matches what the metrics server returns here https://github.com/kubernetes-sigs/metrics-server/blob/55b4961bc1eceffd0a37809dc271e9ae38de9deb/pkg/storage/types.go#L63-L64.
Iow. I believe this method should look like this:
func buildQuantity(resourceName v1.ResourceName, rawProposal int64) resource.Quantity {
format := resource.DecimalSI
// to match what we return in the metrics server, see https://github.com/kubernetes-sigs/metrics-server/blob/55b4961bc1eceffd0a37809dc271e9ae38de9deb/pkg/storage/types.go#L63-L64
if resourceName == v1.ResourceMemory {
format = resource.BinarySI
}
return *resource.NewMilliQuantity(rawProposal, format)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation. I test it locally and it will return correctly. 😄
59e4fc9
to
4cc1dc4
Compare
func buildQuantity(resourceName v1.ResourceName, rawProposal int64) resource.Quantity { | ||
if resourceName == v1.ResourceMemory { | ||
// Convert bytes to KiB | ||
kib := rawProposal / 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the returned value from metrics client is MiliValue()
the above example by @googs1025 clearly points to that, I don't think that dividing by 1000 is necessary here. You should just invoke resource.NewMilliQuantity...
just like it was done previously.
The one change I do agree with is the differentiation between memory (being BinarySI
) and cpu (being DecimalSI
) which matches what the metrics server returns here https://github.com/kubernetes-sigs/metrics-server/blob/55b4961bc1eceffd0a37809dc271e9ae38de9deb/pkg/storage/types.go#L63-L64.
Iow. I believe this method should look like this:
func buildQuantity(resourceName v1.ResourceName, rawProposal int64) resource.Quantity {
format := resource.DecimalSI
// to match what we return in the metrics server, see https://github.com/kubernetes-sigs/metrics-server/blob/55b4961bc1eceffd0a37809dc271e9ae38de9deb/pkg/storage/types.go#L63-L64
if resourceName == v1.ResourceMemory {
format = resource.BinarySI
}
return *resource.NewMilliQuantity(rawProposal, format)
}
t.Errorf("expected quantity %v, got %v", tt.expected.String(), q.String()) | ||
} | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test isn't sufficient, since it passes with and without your change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modify it as follows:
if !q.Equal(tt.expected) || (q.Format != tt.expected.Format) {
t.Errorf("expected quantity %v (Format: %v), got %v (Format: %v)",
tt.expected.String(), tt.expected.Format,
q.String(), q.Format)
}
But I'm not sure if there are other ways to test 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this is better now.
4cc1dc4
to
513d4ee
Compare
…uantity creation Signed-off-by: googs1025 <[email protected]>
513d4ee
to
b50d508
Compare
/test pull-kubernetes-e2e-autoscaling-hpa-cpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
@omerap12 has the final tag
/triage accepted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/lgtm
LGTM label has been added. Git tree hash: 208371e2590f632142d1453f9e45b61af0af6fe0
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: googs1025, omerap12, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR introduces a new helper function, buildQuantity in the HPA controller.
Previously, quantities were created inline using NewMilliQuantity directly, which made it harder to:
Now, all resource quantity creation goes through buildQuantity:
Which issue(s) this PR is related to:
Fix: #130584
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: