Skip to content

Unclear concurrency safety requirements of k8s.io/cli-runtime/pkg/resource.NewBuilder argument #111714

Open
@porridge

Description

@porridge

What happened?

NewBuilder takes a RESTClientGetter interface as argument. Neither has any docstring.

The latter has a ToRESTConfig method which returns a pointer to a rest.Config. It turns out that the implementation of Builder (specifically the methods of ClientConfigFunc) assume ownership of the rest.Config returned by this method: it gets mutated.

This is fine with the "stock" implementation of RESTClientGetter, namely ConfigFlags whose ToRESTConfig implementation is careful enough to return a fresh copy of rest.Config on each call.

However it so happened, that at least one user of this library (the helm-operator-plugins framework) provided an implementation of RESTClientGetter that simply returned a pointer to a shared rest.Config object. As things are with shared mutable state, a data race bug in some scenarios ensued, which was quite tricky to track down though easy to fix for that particular use case.

What did you expect to happen?

It is not unreasonable to assume that other users of cli-runtime might also stumble upon this deficiency.

For the sake of future users, the concurrency safety requirements of NewBuilder should be clearly documented. Arguably the definition of the RESTClientGetter interfaces (I found at least a couple in the codebase) should also have at least a mention of this.

For the current users (who are extremely unlikely to see a change in a docstring of a method they already use) it might make sense to consider something more:

  • proactively make a copy of the returned rest.Config inside all those methods of ClientConfigFunc which call the receiver. However since these methods are chain-called this might be tricky to do without needlessly double-copying the config.
  • (wild idea) calling ToRestConfig a couple of times in NewBuilder and comparing the return values to make sure they are different objects, and panicking if they are not

How can we reproduce it (as minimally and precisely as possible)?

https://github.com/porridge/config-race-repro

Anything else we need to know?

No response

Kubernetes version

Cloud provider

Irrelavant

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.sig/cliCategorizes an issue or PR as relevant to SIG CLI.

    Type

    No type

    Projects

    Status

    Needs Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions