Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: fix the bug that kubeadm always do CRI detection when --config is passed even if it is not required by the subcommand #120828

Merged
merged 1 commit into from
Oct 7, 2023

Conversation

SataQiu
Copy link
Member

@SataQiu SataQiu commented Sep 22, 2023

What type of PR is this?

/kind bug

What this PR does / why we need it:

followup #114455

We have fixed the scenario of not passing --config, this PR is based on #114455, and try to fix the scenario of passing a config file.

Which issue(s) this PR fixes:

Ref kubernetes/kubeadm#2937

Special notes for your reviewer:

Does this PR introduce a user-facing change?

kubeadm: fix the bug that kubeadm always do CRI detection when --config is passed even if it is not required by the subcommand

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 22, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Sep 22, 2023
@SataQiu
Copy link
Member Author

SataQiu commented Sep 22, 2023

/hold

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 22, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: SataQiu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 22, 2023
Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no other way to know when to skip it, we have to add the extra argument everywhere?

@@ -251,47 +256,47 @@ func DefaultedInitConfiguration(versionedInitCfg *kubeadmapiv1.InitConfiguration
}

// LoadInitConfigurationFromFile loads a supported versioned InitConfiguration from a file, converts it into internal config, defaults it and verifies it.
func LoadInitConfigurationFromFile(cfgPath string) (*kubeadmapi.InitConfiguration, error) {
func LoadInitConfigurationFromFile(cfgPath string, skipCRIDetect bool) (*kubeadmapi.InitConfiguration, error) {
Copy link
Member Author

@SataQiu SataQiu Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no other way to know when to skip it, we have to add the extra argument everywhere?

The tricky part is here.
It's hard to know if we can skip the CRI detection without an additional parameter or context for LoadOrDefaultInitConfiguration -> LoadInitConfigurationFromFile.

This is different from DefaultedInitConfiguration, which can avoid CRI detection by pre-filling the criSocket field.

Any suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we spread things everywhere, it tastes bad. However, we have to do that sometimes.

In other cases, they may have several things that need to be options/flags everywhere. They use a xxxOptions struct with several bools or some other type attributes. Then later, if we have further logic to add to spread everywhere, we can add a new option in this options struct then. But this looks like an over-design if we only have one flag here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to double check my understanding. every time we call a phase with --config, there is no way to know if the user wants the cri detection or not. some phases can go without cri detection completely.

so logically the only way to do this is to pass either the skipcridetection flag or perhaps a phase name and then decide to put the no-op cri socket name.

it seems we are passing the skipcridetection flag in too many locations (too deepnin code paths) and it doesn't feel right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we spread things everywhere, it tastes bad. However, we have to do that sometimes.

In other cases, they may have several things that need to be options/flags everywhere. They use a xxxOptions struct with several bools or some other type attributes. Then later, if we have further logic to add to spread everywhere, we can add a new option in this options struct then. But this looks like an over-design if we only have one flag here.

an options struct or a variadic arg list sounds like something we can do at some point.

we can log a k/kubeadm issue for it.

Copy link
Member Author

@SataQiu SataQiu Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every time we call a phase with --config, there is no way to know if the user wants the cri detection or not

We can know if a command(phase) requires CRI information by checking whether it provides --cri-socket flag.
That's exactly what #114455 did.
But it's hard to get this cobra command information anywhere else.

it seems we are passing the skipcridetection flag in too many locations (too deepnin code paths) and it doesn't feel right.

Agree!

Copy link
Member

@pacoxu pacoxu Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example here :

// PodValidationOptions contains the different settings for pod validation
type PodValidationOptions struct {
// Allow invalid pod-deletion-cost annotation value for backward compatibility.
AllowInvalidPodDeletionCost bool
// Allow invalid label-value in LabelSelector
AllowInvalidLabelValueInSelector bool
// Allow pod spec to use non-integer multiple of huge page unit size
AllowIndivisibleHugePagesValues bool
// Allow pod spec to use status.hostIPs in downward API if feature is enabled
AllowHostIPsField bool
// Allow invalid topologySpreadConstraint labelSelector for backward compatibility
AllowInvalidTopologySpreadConstraintLabelSelector bool
// Allow node selector additions for gated pods.
AllowMutableNodeSelectorAndNodeAffinity bool
// The top-level resource being validated is a Pod, not just a PodSpec
// embedded in some other resource.
ResourceIsPod bool

But not sure if we should use this.

Copy link
Member

@neolit123 neolit123 Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo, we can do this with the extra argument that we pass on many places, the way this PR is.
but we should log a ticket in k/kubeadm for refactor with struct ^.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
I'm sorry for not getting back to you sooner after the holiday.
Let's follow it at kubernetes/kubeadm#2941.

…ig is passed even if it is not required by subcommand
Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 28, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: f538ab11eae6e8bdf2b1fbbae21f93a7add92ba5

Copy link
Member

@chendave chendave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

I saw the discussion here for the possibility of the a new field in a struct, not sure whether that approach is easy to implemented or not.

But I believe this change will fix the issue, I would more happy to see some testcases to guard around the new logic added though.

klog.V(4).Infof("skip CRI socket detection, fill with placeholder %s", kubeadmconstants.UnknownCRISocket)
cfg.CRISocket = kubeadmconstants.UnknownCRISocket // set a value to pass the ValidateSocketPath
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any testcase to check the logic added here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can only happen if the execution of the command or sub-command does not depend on the CRISocket. The purpose here is just to pass the validation check. It's pretty straightforward.
Previously, we were pre-setting the CRISocket field to prevent CRI detection, for example:

func DefaultInitConfiguration() *kubeadmapiv1.InitConfiguration {
initCfg := &kubeadmapiv1.InitConfiguration{
NodeRegistration: kubeadmapiv1.NodeRegistrationOptions{
CRISocket: kubeadmconstants.UnknownCRISocket, // avoid CRI detection
},
}
return initCfg
}

NodeRegistration: kubeadmapiv1.NodeRegistrationOptions{
CRISocket: constants.UnknownCRISocket,
},

Currently, we use this additional parameter skipCRIDetect to control whether to do CRI detection. This seems easier to understand than implicit control.

BTW, I think we need to do a subsequent refactoring with the struct option, and that might be a good time to add more test cases.

@@ -110,6 +110,11 @@ func SetNodeRegistrationDynamicDefaults(cfg *kubeadmapi.NodeRegistrationOptions,
}

if cfg.CRISocket == "" {
if skipCRIDetect {
klog.V(4).Infof("skip CRI socket detection, fill with placeholder %s", kubeadmconstants.UnknownCRISocket)
Copy link
Member

@chendave chendave Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls have some info about the phase/command/func name in the log, this will help us to quickly identify where we are since the same log is printed inside of func SetResetDynamicDefaults.

Copy link
Member Author

@SataQiu SataQiu Oct 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetXXXDynamicDefaults is only called by LoadOrDefaultXXXConfiguration.
So we can tell where we are from the top context.
For example:

I1007 07:24:17.900137    4332 initconfiguration.go:260] loading configuration from "./kubeadm.yaml"
I1007 07:24:17.900851    4332 initconfiguration.go:114] skip CRI socket detection, fill with placeholder unix:///var/run/unknown.sock
I1007 07:24:17.902233    4332 interface.go:432] Looking for default routes with IPv4 addresses
I1007 07:24:17.902269    4332 interface.go:437] Default route transits interface "eth0"
I1007 07:24:17.902387    4332 interface.go:209] Interface eth0 is up
I1007 07:24:17.902482    4332 interface.go:257] Interface "eth0" has 3 addresses :[172.18.0.2/16 fc00:f853:ccd:e793::2/64 fe80::42:acff:fe12:2/64].
I1007 07:24:17.902542    4332 interface.go:224] Checking addr  172.18.0.2/16.
I1007 07:24:17.902557    4332 interface.go:231] IP found 172.18.0.2
I1007 07:24:17.902601    4332 interface.go:263] Found valid IPv4 address 172.18.0.2 for interface "eth0".
I1007 07:24:17.902610    4332 interface.go:443] Found active IP 172.18.0.2 
...

}

// documentMapToResetConfiguration takes a map between GVKs and YAML documents (as returned by SplitYAMLDocuments),
// finds a ResetConfiguration, decodes it, dynamically defaults it and then validates it prior to return.
func documentMapToResetConfiguration(gvkmap kubeadmapi.DocumentMap, allowDeprecated, allowExperimental bool, strictErrors bool) (*kubeadmapi.ResetConfiguration, error) {
func documentMapToResetConfiguration(gvkmap kubeadmapi.DocumentMap, allowDeprecated, allowExperimental bool, strictErrors bool, skipCRIDetect bool) (*kubeadmapi.ResetConfiguration, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the parameters list can be simplified as allowDeprecated, allowExperimental, strictErrors, skipCRIDetect bool

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to do this with a follow-up refactor PR (struct option).

@chendave
Copy link
Member

chendave commented Oct 7, 2023

/unhold

some refactor could be done with a followup: kubernetes/kubeadm#2941

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 7, 2023
@k8s-ci-robot k8s-ci-robot merged commit bb06804 into kubernetes:master Oct 7, 2023
14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Oct 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants