Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s.io/dynamic-resource-allocation: fix compatibility with Kubernetes 1.27 #120868

Merged
merged 1 commit into from
Oct 10, 2023

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Sep 25, 2023

What type of PR is this?

/kind bug
/kind failing-test

What this PR does / why we need it:

Kubernetes 1.28 introduced generated resource claim names, with the actual name recorded in the pod status. As part of that change, the helper library for DRA drivers was updated to match that behavior. Since then, driver deployments were either compatible with Kubernetes 1.27 when using k8s.io/dynamic-resource-allocation 0.27 or Kubernetes 1.28 when using 0.28, but never both.

This was okay because this is an alpha feature, but it makes testing of DRA drivers harder (in particular because cloud providers have not all updated to 1.28 yet) and can be fixed fairly easily. Therefore adding back support for 1.27 is worthwhile and reasonable.

Does this PR introduce a user-facing change?

k8s.io/dynamic-resource-allocation: DRA drivers updating to this release are compatible with Kubernetes 1.27 and 1.28.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/issues/3063

/cc @elezar

@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Sep 25, 2023
@k8s-ci-robot
Copy link
Contributor

@pohly: GitHub didn't allow me to request PR reviews from the following users: elezar.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

What type of PR is this?

/kind bug
/kind failing-test

What this PR does / why we need it:

Kubernetes 1.28 introduced generated resource claim names, with the actual name recorded in the pod status. As part of that change, the helper library for DRA drivers was updated to match that behavior. Since then, driver deployments were either compatible with Kubernetes 1.27 when using k8s.io/dynamic-resource-allocation 0.27 or Kubernetes 1.28 when using 0.28, but never both.

This was okay because this is an alpha feature, but it makes testing of DRA drivers harder (in particular because cloud providers have not all updated to 1.28 yet) and can be fixed fairly easily. Therefore adding back support for 1.27 is worthwhile and reasonable.

Does this PR introduce a user-facing change?

k8s.io/dynamic-resource-allocation: DRA drivers updating to this release are compatible with Kubernetes 1.27 and 1.28.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/issues/3063

/cc @elezar

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. approved Indicates a PR has been approved by an approver from all required OWNERS files. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 25, 2023
// and users can set the DRA_WITH_DETERMINISTIC_RESOURCE_CLAIM_NAMES env
// variable to an arbitrary non-empty value to use the naming from Kubernetes <
// 1.28.
func NewNameLookup(client kubernetes.Interface) *Lookup {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping this function easy and not doing anything which can fail is intentional because it avoids adding additional error checks in code which uses this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question: Are there other options that we would want to pass to the contructor (something like withDeterministicResourceClaimNames to allow for overriding this behaviour in tests)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of any and didn't want to make it more complex (interim solution, package doesn't need a stable API).

// and users can set the DRA_WITH_DETERMINISTIC_RESOURCE_CLAIM_NAMES env
// variable to an arbitrary non-empty value to use the naming from Kubernetes <
// 1.28.
func NewNameLookup(client kubernetes.Interface) *Lookup {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question: Are there other options that we would want to pass to the contructor (something like withDeterministicResourceClaimNames to allow for overriding this behaviour in tests)?

// If podClaim.Template is not nil, the caller must check that the
// ResourceClaim is indeed the one that was created for the Pod by calling
// IsUsable.
// Determining the name depends on Kubernetes >= 1.28.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm to understand this correctly, this function will only work in k8s >= 1.28. What happens when this condition is not met? Is this something we want to explicitly check and return some error?

Also, is this superseded by Lookup.Name function? If so, could we delegate to that -- constructing an applicable Lookup object when this function is called?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, do we want to mark Name as deprecated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm to understand this correctly, this function will only work in k8s >= 1.28.

Correct.

What happens when this condition is not met?

The DRA driver fails with a "resource claim not created yet" even though the ResourceClaim has been created - it just doesn't have a deterministic name.

Is this something we want to explicitly check and return some error?

We can't. This function has no way to detect the actual behavior of the cluster, nor can the caller.

}

if *l.usePodStatus {
return Name(pod, podClaim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ... ok, I see that the relationship is the inverse of what I was expecting. As a matter of interest, do we expect that Name will be the function used moving forward, or would we rather have clients depend on the Lookup type (or possibly an interface)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see NewNameLookup as an interim solution and want to remove it again once 1.27 is no longer supported, upstream or by this package (whatever happens sooner).

I was tempted to clarify that by adding a Deprecated: use Name instead of you don't need to support Kubernetes 1.27, but it felt odd to add something that is deprecated right out of the gate.

… 1.27

Kubernetes 1.28 introduced generated resource claim names, with the actual name
recorded in the pod status. As part of that change, the helper library for DRA
drivers was updated to match that behavior. Since then, driver deployments were
either compatible with Kubernetes 1.27 when using
k8s.io/dynamic-resource-allocation 0.27 or Kubernetes 1.28 when using  0.28,
but never both.

This was okay because this is an alpha feature, but it makes testing of DRA
drivers harder (in particular because cloud providers have not all updated to
1.28 yet) and can be fixed fairly easily. Therefore adding support for 1.27 is
worthwhile and reasonable.
@pohly
Copy link
Contributor Author

pohly commented Sep 25, 2023

Are there other options that we would want to pass to the contructor (something like withDeterministicResourceClaimNames to allow for overriding this behaviour in tests)?

Tests, if there were any, would have to cover the different code paths without such an override. I can't think of anything else either.

Tests are missing because I don't know how to mock the ServerVersion call. I think we have to test this manually.

Such testing is necessary to determine the actual behavior of different server releases. I had previously tested with GKE 1.27 and the version check worked there. But with Kubernetes master I got "28+" (not a number!) and parsing it failed. I just pushed an update which works with both 1.27 and master (tested manually).

@bart0sh bart0sh added this to Triage in SIG Node PR Triage Sep 26, 2023
@bart0sh
Copy link
Contributor

bart0sh commented Sep 26, 2023

/triage accepted
/priority important-soon
/cc

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 26, 2023
@bart0sh bart0sh moved this from Triage to Needs Reviewer in SIG Node PR Triage Sep 26, 2023
@ndixita ndixita moved this from Triage to Archive-it in SIG Node CI/Test Board Sep 27, 2023
Copy link
Contributor

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot
Copy link
Contributor

@elezar: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elezar, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@bart0sh
Copy link
Contributor

bart0sh commented Oct 10, 2023

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 10, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 815c85e2a042328e56017366eae9be4552205b28

@bart0sh bart0sh moved this from Needs Reviewer to Needs Approver in SIG Node PR Triage Oct 10, 2023
@k8s-ci-robot k8s-ci-robot merged commit a6b8954 into kubernetes:master Oct 10, 2023
17 checks passed
SIG Node CI/Test Board automation moved this from Archive-it to Done Oct 10, 2023
SIG Node PR Triage automation moved this from Needs Approver to Done Oct 10, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

4 participants