Skip to content

bugfix(hpa): introduce buildQuantity helper for consistent resource quantity #132351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 26, 2025

Conversation

googs1025
Copy link
Member

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR introduces a new helper function, buildQuantity in the HPA controller.

Previously, quantities were created inline using NewMilliQuantity directly, which made it harder to:

  • Ensure consistency between CPU and memory handling

Now, all resource quantity creation goes through buildQuantity:

  • For memory: raw bytes are converted to KiB and use BinarySI
  • For CPU or other resources: milli-units are used with DecimalSI

Which issue(s) this PR is related to:

Fix: #130584

Special notes for your reviewer:

Does this PR introduce a user-facing change?

HPA status now displays memory metrics using Ki

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

None

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jun 17, 2025
@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 17, 2025
@github-project-automation github-project-automation bot moved this to Needs Triage in SIG Apps Jun 17, 2025
@googs1025
Copy link
Member Author

before change:

~ kubectl get hpa
NAME       REFERENCE                    TARGETS                              MINPODS   MAXPODS   REPLICAS   AGE
test-hpa   Deployment/test-deployment   cpu: 25%/50%, memory: 71421952/1Gi   1         10        1          4h
➜  ~ kubectl get hpa -oyaml
apiVersion: v1
items:
- apiVersion: autoscaling/v2
  kind: HorizontalPodAutoscaler
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"autoscaling/v2","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"test-hpa","namespace":"default"},"spec":{"maxReplicas":10,"metrics":[{"resource":{"name":"cpu","target":{"averageUtilization":50,"type":"Utilization"}},"type":"Resource"},{"resource":{"name":"memory","target":{"averageValue":"1Gi","type":"AverageValue"}},"type":"Resource"}],"minReplicas":1,"scaleTargetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"test-deployment"}}}
    creationTimestamp: "2025-06-17T07:16:03Z"
    name: test-hpa
    namespace: default
    resourceVersion: "3841389"
    uid: b8b9a7d2-32e2-4b29-861c-7a489ecc5c5f
  spec:
    maxReplicas: 10
    metrics:
    - resource:
        name: cpu
        target:
          averageUtilization: 50
          type: Utilization
      type: Resource
    - resource:
        name: memory
        target:
          averageValue: 1Gi
          type: AverageValue
      type: Resource
    minReplicas: 1
    scaleTargetRef:
      apiVersion: apps/v1
      kind: Deployment
      name: test-deployment
  status:
    conditions:
    - lastTransitionTime: "2025-06-17T10:55:21Z"
      message: recommended size matches current size
      reason: ReadyForNewScale
      status: "True"
      type: AbleToScale
    - lastTransitionTime: "2025-06-17T11:06:24Z"
      message: the HPA was able to successfully calculate a replica count from cpu
        resource utilization (percentage of request)
      reason: ValidMetricFound
      status: "True"
      type: ScalingActive
    - lastTransitionTime: "2025-06-17T07:16:33Z"
      message: the desired count is within the acceptable range
      reason: DesiredWithinRange
      status: "False"
      type: ScalingLimited
    currentMetrics:
    - resource:
        current:
          averageUtilization: 25
          averageValue: 38m
        name: cpu
      type: Resource
    - resource:
        current:
          averageValue: "71421952"
        name: memory
      type: Resource
    currentReplicas: 1
    desiredReplicas: 1
    lastScaleTime: "2025-06-17T10:11:17Z"
kind: List
metadata:
  resourceVersion: ""

after changed:

~ kubectl get hpa
NAME       REFERENCE                    TARGETS                              MINPODS   MAXPODS   REPLICAS   AGE
test-hpa   Deployment/test-deployment   cpu: 36%/50%, memory: 108236Ki/1Gi   1         10        1          4h2m
➜  ~ kubectl get hpa -oyaml
apiVersion: v1
items:
- apiVersion: autoscaling/v2
  kind: HorizontalPodAutoscaler
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"autoscaling/v2","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"test-hpa","namespace":"default"},"spec":{"maxReplicas":10,"metrics":[{"resource":{"name":"cpu","target":{"averageUtilization":50,"type":"Utilization"}},"type":"Resource"},{"resource":{"name":"memory","target":{"averageValue":"1Gi","type":"AverageValue"}},"type":"Resource"}],"minReplicas":1,"scaleTargetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"test-deployment"}}}
    creationTimestamp: "2025-06-17T07:16:03Z"
    name: test-hpa
    namespace: default
    resourceVersion: "3841487"
    uid: b8b9a7d2-32e2-4b29-861c-7a489ecc5c5f
  spec:
    maxReplicas: 10
    metrics:
    - resource:
        name: cpu
        target:
          averageUtilization: 50
          type: Utilization
      type: Resource
    - resource:
        name: memory
        target:
          averageValue: 1Gi
          type: AverageValue
      type: Resource
    minReplicas: 1
    scaleTargetRef:
      apiVersion: apps/v1
      kind: Deployment
      name: test-deployment
  status:
    conditions:
    - lastTransitionTime: "2025-06-17T10:55:21Z"
      message: recommended size matches current size
      reason: ReadyForNewScale
      status: "True"
      type: AbleToScale
    - lastTransitionTime: "2025-06-17T11:06:24Z"
      message: the HPA was able to successfully calculate a replica count from cpu
        resource utilization (percentage of request)
      reason: ValidMetricFound
      status: "True"
      type: ScalingActive
    - lastTransitionTime: "2025-06-17T07:16:33Z"
      message: the desired count is within the acceptable range
      reason: DesiredWithinRange
      status: "False"
      type: ScalingLimited
    currentMetrics:
    - resource:
        current:
          averageUtilization: 36
          averageValue: 55m
        name: cpu
      type: Resource
    - resource:
        current:
          averageValue: 108236Ki
        name: memory
      type: Resource
    currentReplicas: 1
    desiredReplicas: 1
    lastScaleTime: "2025-06-17T10:11:17Z"
kind: List
metadata:
  resourceVersion: ""

@googs1025
Copy link
Member Author

/test pull-kubernetes-e2e-autoscaling-hpa-cm

@googs1025
Copy link
Member Author

/retest

1 similar comment
@googs1025
Copy link
Member Author

/retest

func buildQuantity(resourceName v1.ResourceName, rawProposal int64) resource.Quantity {
if resourceName == v1.ResourceMemory {
// Convert bytes to KiB
kib := rawProposal / 1000
Copy link
Member

@omerap12 omerap12 Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be divided by 1024, not 1000

Copy link
Member Author

@googs1025 googs1025 Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very interesting question. 🤔 I looked at it for some time and found that the metrics are all obtained using the MilliValue(), so 1000 is not 1024 (binary). In addition, after debugging locally, I used /1000 and it was also the correct value.

if target.AverageValue != nil {
var rawProposal int64
replicaCountProposal, rawProposal, timestampProposal, err := a.replicaCalc.GetRawResourceReplicas(ctx, currentReplicas, target.AverageValue.MilliValue(), resourceName, tolerances, namespace, selector, container)
if err != nil {
return 0, nil, time.Time{}, "", condition, fmt.Errorf("failed to get %s usage: %v", resourceName, err)
}
metricNameProposal = fmt.Sprintf("%s resource", resourceName.String())
status := autoscalingv2.MetricValueStatus{
AverageValue: resource.NewMilliQuantity(rawProposal, resource.DecimalSI),
}
return replicaCountProposal, &status, timestampProposal, metricNameProposal, autoscalingv2.HorizontalPodAutoscalerCondition{}, nil
}

func getPodMetrics(ctx context.Context, rawMetrics []metricsapi.PodMetrics, resource v1.ResourceName) PodMetricsInfo {
res := make(PodMetricsInfo, len(rawMetrics))
for _, m := range rawMetrics {
podSum := int64(0)
missing := len(m.Containers) == 0
for _, c := range m.Containers {
resValue, found := c.Usage[resource]
if !found {
missing = true
klog.FromContext(ctx).V(2).Info("Missing resource metric", "resourceMetric", resource, "pod", klog.KRef(m.Namespace, m.Name))
break
}
podSum += resValue.MilliValue()
}
if !missing {
res[m.Name] = PodMetric{
Timestamp: m.Timestamp.Time,
Window: m.Window.Duration,
Value: podSum,
}
}
}

~ kubectl get PodMetrics
NAME                               CPU         MEMORY    WINDOW
test-deployment-6d56d679c5-4zmtl   28395987n   69600Ki   15.023s
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"autoscaling/v2","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"test-hpa","namespace":"default"},"spec":{"maxReplicas":10,"metrics":[{"resource":{"name":"cpu","target":{"averageUtilization":50,"type":"Utilization"}},"type":"Resource"},{"resource":{"name":"memory","target":{"averageUtilization":50,"type":"Utilization"}},"type":"Resource"}],"minReplicas":1,"scaleTargetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"test-deployment"}}}
  creationTimestamp: "2025-06-17T14:45:26Z"
  name: test-hpa
  namespace: default
  resourceVersion: "3849312"
  uid: d3cd047c-54af-40fd-af10-2cc15011658e
spec:
  maxReplicas: 10
  metrics:
  - resource:
      name: cpu
      target:
        averageUtilization: 50
        type: Utilization
    type: Resource
  - resource:
      name: memory
      target:
        averageUtilization: 50
        type: Utilization
    type: Resource
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: test-deployment
status:
  conditions:
  - lastTransitionTime: "2025-06-17T14:46:37Z"
    message: the HPA controller was able to update the target scale to 3
    reason: SucceededRescale
    status: "True"
    type: AbleToScale
  - lastTransitionTime: "2025-06-17T15:07:38Z"
    message: the HPA was able to successfully calculate a replica count from memory
      resource utilization (percentage of request)
    reason: ValidMetricFound
    status: "True"
    type: ScalingActive
  - lastTransitionTime: "2025-06-17T14:46:52Z"
    message: the desired count is within the acceptable range
    reason: DesiredWithinRange
    status: "False"
    type: ScalingLimited
  currentMetrics:
  - resource:
      current:
        averageUtilization: 19
        averageValue: 29m
      name: cpu
    type: Resource
  - resource:
      current:
        averageUtilization: 106
        averageValue: 69600Ki
      name: memory
    type: Resource
  currentReplicas: 1
  desiredReplicas: 3
  lastScaleTime: "2025-06-17T15:07:38Z"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. Yeah I think you are right :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the returned value from metrics client is MiliValue() the above example by @googs1025 clearly points to that, I don't think that dividing by 1000 is necessary here. You should just invoke resource.NewMilliQuantity... just like it was done previously.

The one change I do agree with is the differentiation between memory (being BinarySI) and cpu (being DecimalSI) which matches what the metrics server returns here https://github.com/kubernetes-sigs/metrics-server/blob/55b4961bc1eceffd0a37809dc271e9ae38de9deb/pkg/storage/types.go#L63-L64.

Iow. I believe this method should look like this:

func buildQuantity(resourceName v1.ResourceName, rawProposal int64) resource.Quantity {
    format := resource.DecimalSI
	// to match what we return in the metrics server, see https://github.com/kubernetes-sigs/metrics-server/blob/55b4961bc1eceffd0a37809dc271e9ae38de9deb/pkg/storage/types.go#L63-L64
	if resourceName == v1.ResourceMemory {
		format = resource.BinarySI
	}
	return *resource.NewMilliQuantity(rawProposal, format)
}

Copy link
Member Author

@googs1025 googs1025 Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation. I test it locally and it will return correctly. 😄

@googs1025 googs1025 changed the title bugfix(hpa): introduce buildQuantity helper for consistent resource quantity creation bugfix(hpa): introduce buildQuantity helper for consistent resource quantity Jun 19, 2025
func buildQuantity(resourceName v1.ResourceName, rawProposal int64) resource.Quantity {
if resourceName == v1.ResourceMemory {
// Convert bytes to KiB
kib := rawProposal / 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the returned value from metrics client is MiliValue() the above example by @googs1025 clearly points to that, I don't think that dividing by 1000 is necessary here. You should just invoke resource.NewMilliQuantity... just like it was done previously.

The one change I do agree with is the differentiation between memory (being BinarySI) and cpu (being DecimalSI) which matches what the metrics server returns here https://github.com/kubernetes-sigs/metrics-server/blob/55b4961bc1eceffd0a37809dc271e9ae38de9deb/pkg/storage/types.go#L63-L64.

Iow. I believe this method should look like this:

func buildQuantity(resourceName v1.ResourceName, rawProposal int64) resource.Quantity {
    format := resource.DecimalSI
	// to match what we return in the metrics server, see https://github.com/kubernetes-sigs/metrics-server/blob/55b4961bc1eceffd0a37809dc271e9ae38de9deb/pkg/storage/types.go#L63-L64
	if resourceName == v1.ResourceMemory {
		format = resource.BinarySI
	}
	return *resource.NewMilliQuantity(rawProposal, format)
}

t.Errorf("expected quantity %v, got %v", tt.expected.String(), q.String())
}
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test isn't sufficient, since it passes with and without your change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modify it as follows:

if !q.Equal(tt.expected) || (q.Format != tt.expected.Format) {
  t.Errorf("expected quantity %v (Format: %v), got %v (Format: %v)",
	  tt.expected.String(), tt.expected.Format,
	  q.String(), q.Format)
}

But I'm not sure if there are other ways to test 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this is better now.

@github-project-automation github-project-automation bot moved this from Needs Triage to In Progress in SIG Apps Jun 24, 2025
@googs1025
Copy link
Member Author

/test pull-kubernetes-e2e-autoscaling-hpa-cpu

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@omerap12 has the final tag

@soltysh
Copy link
Contributor

soltysh commented Jun 26, 2025

/triage accepted
/priority backlog

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/backlog Higher priority than priority/awaiting-more-evidence. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 26, 2025
Copy link
Member

@omerap12 omerap12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 26, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 208371e2590f632142d1453f9e45b61af0af6fe0

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: googs1025, omerap12, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit efd2a0d into kubernetes:master Jun 26, 2025
15 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.34 milestone Jun 26, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in SIG Apps Jun 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Incorrect m Suffix in Memory Values for kubectl get hpa
4 participants