Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster-bootstrap: address constant-time problems as in NCC-E003660-TTV #120400

Conversation

neolit123
Copy link
Member

@neolit123 neolit123 commented Sep 4, 2023

What type of PR is this?

/kind bug

What this PR does / why we need it:

as reported in #119632 there are a couple of problems in the cluster-bootstrap package.


1

the function randBytes() responsible for generating a token uses a modulo to perform range reduction from 0-252 -> 0-36 to obtain a character from the 0-9a-z predefined literal / lookup table.

solution: generate numbers in the 0-36 range using crypto/rand.Int() then instead of accessing a character from
the table with [] indexing, use simple operations (constant-time) to obtain the random character.


2

the function IsValidBootstrapToken() uses regexp string matching which is not a good practice.

solution: break down the function and use simple comparisons.


Which issue(s) this PR fixes:

Fixes #119632

Special notes for your reviewer:

not an expert on "constant time", have some expertise on how a CPU works...

Does this PR introduce a user-facing change?

cluster-bootstrap: improve the security of the functions responsible for generation and validation of bootstrap tokens

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 4, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neolit123

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 4, 2023
@neolit123
Copy link
Member Author

/hold for review
/priority important-longterm
/triage accepted

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 4, 2023
@neolit123
Copy link
Member Author

neolit123 commented Sep 13, 2023

reviews are welcome.

/sig security

security are the organizers (?) of the audit that found this.

@k8s-ci-robot k8s-ci-robot added sig/security Categorizes an issue or PR as relevant to SIG Security. sig/auth Categorizes an issue or PR as relevant to SIG Auth. labels Sep 13, 2023
}

token[i] = validBootstrapTokenChars[int(b)%len(validBootstrapTokenChars)]
token[i] = validBootstrapTokenChars[val.Uint64()]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This performs in effect an array lookup based on a secret value, the index of the token character in string validBootstrapTokenChars. This would in principle leak the index, and therefore the secret token character, via timing side-channels. The lookup based on index x, where x is val.Uint64(), could be replaced by a constant-time selection e.g. res := x + 48 + (39 & ((9 - x) >> 8)), then string(rune(res)).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the reply.
so we still generate a random index x := val.Uint64() but then use the simple operations to range lock it in the ASCII values for a-z and 0-9?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you keep the random index generation as is (val, err := rand.Int(rand.Reader, max)). Then, instead of performing an array lookup, you map x (val.Uint64) to the appropriate ASCII ranges using the operation above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understood, thank you. i will update this as soon as possible in the next few days (not on a computer right now).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated!

The function generates bytes in the x={0-252} range and then
applies an y=(x mod 36) to obtain allowed token characters
from validBootstrapTokenChars[y].

Instead of using crypto/rand.Reader, use crypto/rand.Int()
that operates in the val={0-len(validBootstrapTokenChars))}.

Once a random index is generated, use simple operations
to obtain a random character in the a-z,0-9 character range.
This makes the character generation in constant-time.
@neolit123 neolit123 force-pushed the 1.29-fix-bootstrap-token-constant-time branch from 06e32d2 to ae30bb4 Compare September 23, 2023 15:27
@neolit123
Copy link
Member Author

/retest

Comment on lines +110 to +111
notDigit := (c < 48 || c > 57) // Character is not in the 0-9 range
notLetter := (c < 97 || c > 122) // Character is not in the a-z range
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gdncc do you think that it's worth using golang's crypto/subtle package for these comparisons?
https://pkg.go.dev/crypto/subtle#ConstantTimeLessOrEq

Copy link
Member Author

@neolit123 neolit123 Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another question,
in the issue description we have:

Attackers may determine Bootstrap Token values based on timing-based side channels.

is that a case where the attacker has tapped into the protected process memory of a the running binary (e.g. kubeadm) run by the cluster administrator and they could intercept the tokens? that would imply that the host OS has a vulnerability.

i am giving kubeadm as an example, because kubeadm is the only core k8s consumer of the cluster-bootstrap library.
the way to generates a token is a one-shot operation of calling the randBytes(), generating a token and uploading it as a secret in etcd. the token is also validated in the process using IsValidBootstrapToken().
the token is also validated when the user specifies it (i.e. pre-generated by the user).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the crypto/subtle package is an option, but I can’t think of a compelling reason to switch.

To perform a timing side-channel attack, the attacker does not need to have access to the memory of the running binary e.g. access to a debugger.

Common processors use caches to speed up access to resources such as data and code. Attackers who can monitor the cache, for example from the host, or from a VM machine for code running on the host (e.g. a different process/user), or another VM on the same hypervisor than the attacker controlled VM, may observe changes in speed and cache behaviour, when resources that depend on sensitive information are used. This attack can reveal in principle the locations where the victim is accessing data (data flow), or the code the victim is running (control flow). The attacker leverages a signal which is measured through elapsed time. In the case of an array access at a secret address, this exercises caches and their contents, so that the memory access, but also other subsequent memory accesses elsewhere in the system, will be impacted (e.g. the actual array access kicked out of cache another data element, and a later access to that evicted element will be slower). This means that what the attacker actually measures can be other code running at a (slightly) later time.

Timing differences may also be observable from the network (e.g. how long it takes to validate the token). This is probably not exploitable in this case (although recent paper Out of the Box Testing discusses impressive results a few network hops from the target, or better, using the loopback interface).

Noting that this finding was classified as informational. There are also potential attenuating circumstances. Timing side-channel attacks are statistical; the attacker has to be able to do repeated experiments and somehow assemble together the small scraps of information they gathered. It would require additional research to determine whether it is exploitable, or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the context. this is quite interesting!

@neolit123
Copy link
Member Author

dropping sig auth (we can consider this library as SIG CL exclusive)
/remove-sig auth

@k8s-ci-robot k8s-ci-robot removed the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Oct 4, 2023
The function uses BootstrapTokenRegexp.MatchString(token)
which is not a recommended practice.

Instead, break down the token into its components: ID, secret.
The ID is public thus we can use Regexp matching for it.
The secret needs constant time comparison. Iterate over
every character and make sure it fits the 0-9a-z range and that
it has a length of 16.
@neolit123 neolit123 force-pushed the 1.29-fix-bootstrap-token-constant-time branch from ae30bb4 to 1d519f1 Compare October 5, 2023 10:47
@neolit123
Copy link
Member Author

updated to use the api.BootstrapTokenSecretBytes constant instead of 16

@gdncc could you please add the text LGTM in a comment as a sign-off, in case you agree with the current state of the change. i can ask a k8s org member to double check and add our lgtm label.

thank you

@neolit123
Copy link
Member Author

/retest

@reylejano
Copy link
Member

Thank you for reviewing @gdncc
@gdncc is from the vendor that reporting the finding NCC-E003660-TTV: Timing Side Channel in Bootstrap Tokens Generation and Handling in issue #119632
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 5, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 1a1afc62f55460c0cdafce2e5d505121f6f48d31

@neolit123
Copy link
Member Author

canceling the hold.
/hold cancel

thanks for the LGTM

i also asked @justinsb to have another look. in case there are remarks i can send a cleanup / followup.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 6, 2023
@k8s-ci-robot k8s-ci-robot merged commit 5ff7961 into kubernetes:master Oct 6, 2023
14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Oct 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/security Categorizes an issue or PR as relevant to SIG Security. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

NCC-E003660-TTV: Timing Side Channel in Bootstrap Tokens Generation and Handling
4 participants