Skip to content

Support makeDaemonSet() for Prometheus Agent's DaemonSet deployment #6652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 27, 2024

Conversation

haanhvu
Copy link
Contributor

@haanhvu haanhvu commented Jun 6, 2024

Description

Support makeDaemonSet() and inner function makeDaemonSetSpec() for Prometheus Agent's DaemonSet deployment

Type of change

What type of changes does your code introduce to the Prometheus operator? Put an x in the box that apply.

  • CHANGE (fix or feature that would cause existing functionality to not work as expected)
  • FEATURE (non-breaking change which adds functionality)
  • BUGFIX (non-breaking change which fixes an issue)
  • ENHANCEMENT (non-breaking change which improves existing functionality)
  • NONE (if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)

Verification

Unit tests

Changelog entry

Support makeDaemonSet() for Prometheus Agent's DaemonSet deployment

@haanhvu haanhvu requested a review from a team as a code owner June 6, 2024 10:11
@haanhvu
Copy link
Contributor Author

haanhvu commented Jun 6, 2024

@ArthurSens @simonpasquier @kakkoyun I just created makeDaemonSet() and makeDaemonSetSpec() with unit tests. The unit tests basically follow the current StatefulSet's unit tests. Could you check if these unit tests make sense or we need other unit tests?

There's a lot of code duplication between two modes right now. As we discussed, I'll refactor later.

I'm also waiting for #6636 to finish to add the WithNodeNameEnv() option to DaemonSet's config reloader.

@haanhvu haanhvu force-pushed the make-daemonset-object branch from 03bdb1a to ee45f5c Compare June 7, 2024 05:13
Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, couldn't find the time to review everything just yet. I've made some comments about test readability

I also have the impression that a lot of code is copy pasted from makeStatefulSet and makeStatefulSetSpec. Not a requirement for this PR, but I'm just wondering if we could reduce code duplication between daemonset and statefulsets

@haanhvu
Copy link
Contributor Author

haanhvu commented Jun 13, 2024

I also have the impression that a lot of code is copy pasted from makeStatefulSet and makeStatefulSetSpec. Not a requirement for this PR, but I'm just wondering if we could reduce code duplication between daemonset and statefulsets

@ArthurSens Yes we can. But as we discussed in previous meetings, I think we should do the refactoring later. Because the implementation for daemonset could change, e.g., if later we find a bug in the implementation through e2e tests. You agree?

Most of your change suggestions are in the code duplication of both modes. I'll resolve in both modes then.

// Mount web config and web TLS credentials as volumes.
// We always mount the web config file for versions greater than 2.24.0.
// With this we avoid redeploying prometheus when reconfiguring between
// HTTP and HTTPS and vice-versa.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd simplify and just consider that users must use a recent Prometheus version for DaemonSet deployment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a new mode so we don't need this check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm not entirely sure yet we should remove this check in DaemonSet. If we remove this check, it means we will officially require users to use >=2.24 right? If that's the case, we can remove this check and should clearly write the requirement in the doc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try to clarify: I would declare that Daemonset mode requires a "recent" version of Prometheus (for instance at least 2.45.0 which the current long-term support version) so we can simplify the support matrix. The check should be done as early as possible so makeDaemonSet() can safely assume a minimal version.

Copy link
Contributor Author

@haanhvu haanhvu Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I wanted to clarify this, because we had a discussion in Slack (with @ArthurSens too) about whether we should limit the Prometheus version in DaemonSet: https://cloud-native.slack.com/archives/C071STDT4FP/p1718703164200639

If we agree to require a specific recent version, I'll remove this check then. I'll also note so that we'll remember to clarify this in the user doc of DaemonSet.

// minutes) to ensure that the readiness probe only comes into effect once
// Prometheus is effectively ready.
// We don't want to use the /-/healthy handler here because it returns OK as
// soon as the web server is started (irrespective of the WAL replay).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can either use the same settings as with statefulset mode or consider that since daemonset doesn't support persistent storage, we can have tighter timeouts.

@haanhvu
Copy link
Contributor Author

haanhvu commented Jun 17, 2024

We discussed offline and agreed to do some refactoring of the common between StatefulSet and DaemonSet first, before continuing this PR.

@haanhvu
Copy link
Contributor Author

haanhvu commented Jun 24, 2024

I just rebased, haven't done anything new yet. Will ping when this's ready for review.

@haanhvu haanhvu force-pushed the make-daemonset-object branch from af8a297 to 0f5e96c Compare June 24, 2024 09:23
@haanhvu haanhvu marked this pull request as ready for review June 24, 2024 09:33
@haanhvu
Copy link
Contributor Author

haanhvu commented Jun 24, 2024

@ArthurSens @simonpasquier @kakkoyun This is ready for review now. Please help review so that we can move on to the next thing. Next thing will be operator's logic and e2e tests, which are more difficult and interesting 😄

ArthurSens
ArthurSens previously approved these changes Jun 25, 2024
Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

If we find something wrong along the way, we can just fix it in a follow-up PR :)

Comment on lines 131 to 127
// To avoid breaking users deploying an old version of the config-reloader image.
// TODO: remove the if condition after v0.72.0.
if cpf.Web != nil {
configReloaderWebConfigFile = confArg.Value
configReloaderVolumeMounts = append(configReloaderVolumeMounts, configMount...)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArthurSens @simonpasquier @kakkoyun We already agreed on Prometheus version in #6652 (comment) What about config reloader version here? Do we need this version check?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, let's keep it simple.

Comment on lines 131 to 127
// To avoid breaking users deploying an old version of the config-reloader image.
// TODO: remove the if condition after v0.72.0.
if cpf.Web != nil {
configReloaderWebConfigFile = confArg.Value
configReloaderVolumeMounts = append(configReloaderVolumeMounts, configMount...)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, let's keep it simple.

@haanhvu haanhvu force-pushed the make-daemonset-object branch from 411b13a to 489d8ea Compare June 27, 2024 10:36
Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment which you can address later.

Comment on lines +129 to +132
// In cases where an existing selector label is modified, or a new one is added, new sts cannot match existing pods.
// We should try to avoid removing such immutable fields whenever possible since doing
// so forces us to enter the 'recreate cycle' and can potentially lead to downtime.
// The requirement to make a change here should be carefully evaluated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit)

Suggested change
// In cases where an existing selector label is modified, or a new one is added, new sts cannot match existing pods.
// We should try to avoid removing such immutable fields whenever possible since doing
// so forces us to enter the 'recreate cycle' and can potentially lead to downtime.
// The requirement to make a change here should be carefully evaluated.
// In cases where an existing selector label is modified, or a new one is added, new daemonset cannot match existing pods.
// We should try to avoid removing such immutable fields whenever possible since doing
// so forces us to enter the 'recreate cycle' and can potentially lead to downtime.
// The requirement to make a change here should be carefully evaluated.

@simonpasquier simonpasquier merged commit 25a7bdf into prometheus-operator:main Jun 27, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants