Proposal for DaemonSet deployment of Prometheus Agent #6600

haanhvu · 2024-05-16T13:54:18Z

Proposal for DaemonSet deployment of Prometheus Agent

simonpasquier

This is looking great!

Documentation/proposals/202405-agent-daemonset.md

ArthurSens

Awesome start!

I have a few small comments but the proposal already looks pretty good!

I think we could already start refactoring our codebase to extract common configuration that will be used by both statefulset and daemonset modes :)

Documentation/proposals/202405-agent-daemonset.md

ArthurSens · 2024-05-16T20:11:58Z

Documentation/proposals/202405-agent-daemonset.md

+
+The current (StatefulSet) deployment brings long the corresponding pitfalls:
+* Load management & scalability: Since one or several high-availability Prometheus Agents are responsible for scraping metrics of the whole cluster, users would need to calculate/estimate the load and scalability of the whole cluster to decide on replicas and sharding strategies. Estimating cluster-wide load and scalability is a much harder task than estimating node-wide load and scalability.
+* Security: Similarly, cluster-wide security is a much bigger problem than node-wide security.


Should we provide examples of how cluster-wide attacks can be avoided with daemonsets?

Actually I'm thinking of removing Security from Why and Pitfalls of current solution.

First reason is AFAIU StatefulSet and DaemonSet face different kinds of security issues. In DaemonSet, Prometheus Agent pod knows all secrets and share them with all other pods. In StatefulSet, the complexity of security is at cluster scope and we have to deal with network issues. So we can hardly say one is better than the other regarding security.

Second reason is security is not the key reason we choose to implement DaemonSet.

You agree to remove it from our consideration?

@simonpasquier @kakkoyun

Given the comment above, I'm in favor of removing or at least re-writing the security concerns. I don't think we should be saying that one is better than the other, but acknowledging how they are different.

We should remove the security section. I don't think our primary goal is to address this. I don't even see how it would be more secure over statefulset approach. I'd even argue that it would be more insecure. But again I think we should drop it.

Removed it. @simonpasquier let us know if you have different opinions.

ArthurSens · 2024-05-16T20:13:40Z

Documentation/proposals/202405-agent-daemonset.md

+* Scraped load is very large or hard to estimate.
+* Scalability is hard to predict.
+* Security is a big concern.
+* They want to collect node system metrics (e.g. kubelet, node exporter).


Here it sounds like we can't collect kubelet/node exporter metrics with statefulset, which isn't true 🤔

Yeah we can do it with kubernetesSDConfigs in ScrapeConfig? If there's no advantage of solving this use case with DaemonSet, I'll remove it then.

We can also use ServiceMonitor/PodMonitor, just need to adjust labels between services/pods :)

Yeah, I'd remove this part

Yeah we can do it with kubernetesSDConfigs in ScrapeConfig?

I meant we can do this in the StatefulSet mode. This is the go-to solution in StatefulSet right?

Documentation/proposals/202405-agent-daemonset.md

haanhvu · 2024-05-20T09:08:18Z

@kakkoyun @simonpasquier @ArthurSens I resolved the reviews, also left comments for the reviews not resolved yet. Pls take a look.

@bwplotka @pintohutch If you have time, do you mind taking a look at this too? ^^

bwplotka · 2024-05-20T10:20:51Z

Nice, will check tomorrow latest 🤞🤞🤞🤞

kakkoyun · 2024-05-20T15:21:22Z

On top of my to-do list 👍

ArthurSens

Really good work 🥳

There are a few things that we could clarify, but, as mentioned before, looks good enough to start working on the codebase already :)

ArthurSens · 2024-05-20T22:54:13Z

Documentation/proposals/202405-agent-daemonset.md

+
+## 1. Why
+
+When deploying Prometheus Agent in Kubernetes, three of the biggest users’ concerns are: load distribution, scalability, and security.


Nit, have you seen this statement somewhere? If yes, would be nice to have a reference here :)

I didn't cite it from a source. I read users' blogs, browsed the issues in our repo, and from my experience of setting up some nodes (not a k8s cluster though) for benchmark pipelines for Jaeger in last year GSoC to form this general observation.

There're of course other concerns too, like cost^^ But I'm not sure DaemonSet could help anything with reducing cost, so didn't state it here.

In some way it helps a lot with the cost, because it scales with the load, so you don't need to keep buffy Prometheus servers when your cluster is scaled back. Of course some mix of vertical and horizontal scaling for thousands small agents would be the best from the cost perspective, but we will never be able to do this with scraped metrics (and it's fine). Daemon set is somewhat in this direction while maintaining some other pros of stable collection 🤗

In some way it helps a lot with the cost, because it scales with the load, so you don't need to keep buffy Prometheus servers when your cluster is scaled back

Yeah I mentioned automatic scaleup, but forgot to mention automatic scaledown. I'll add this to Scalability section then. We don't have any proof of cost so I wouldn't mention cost here. But automatic scaledown implicitly refers to cost optimization (and environmental benefits too, hopefully ^^).

Documentation/proposals/202405-agent-daemonset.md

ArthurSens · 2024-05-20T23:18:34Z

Documentation/proposals/202405-agent-daemonset.md

+
+The current (StatefulSet) deployment brings long the corresponding pitfalls:
+* Load management & scalability: Since one or several high-availability Prometheus Agents are responsible for scraping metrics of the whole cluster, users would need to calculate/estimate the load and scalability of the whole cluster to decide on replicas and sharding strategies. Estimating cluster-wide load and scalability is a much harder task than estimating node-wide load and scalability.
+* Security: Similarly, cluster-wide security is a much bigger problem than node-wide security.


Given the comment above, I'm in favor of removing or at least re-writing the security concerns. I don't think we should be saying that one is better than the other, but acknowledging how they are different.

Documentation/proposals/202405-agent-daemonset.md

kakkoyun

This LGTM. I'll have another look at it but this shouldn't be a blocker.

One additional thing I'd like to mention is Grafana Agent. We should check how did they approach it (they already support several deployment approaches). What consideration did they make? We can even try to reach out to people and ask about the tread-offs.

kakkoyun · 2024-05-21T08:02:47Z

Documentation/proposals/202405-agent-daemonset.md

+## 5. Non-Goals
+
+The non-goals are the features that are not easy to implement and require more investigation. We will need to investigate whether there are actual user needs for them, if yes, then how to best implement them. We’ll handle these after the MVP.
+* ServiceMonitor support: There's a performance issue regarding this feature. Since each Prometheus Agent running on a node requires one watch, making all Prometheus Agent pods watch all endpoints will put a huge stress on Kubernetes API server. This is the main reason why GMP hasn’t supported this, even though there are user needs stated in some issues ([#362](https://github.com/GoogleCloudPlatform/prometheus-engine/issues/362), [#192](https://github.com/GoogleCloudPlatform/prometheus-engine/issues/192)). However, as discussed with Danny from GMP [here](https://github.com/GoogleCloudPlatform/prometheus-engine/issues/192#issuecomment-2028850846), ServiceMonitor support based on EndpointSlice seems like a viable approach. We’ll investigate this further after the MVP.


This is a non-goal but it'd be nice to attack this if we ever finish our planned goals before the program ends.

bwplotka

Looks good generally! Suggested some wording changes, but the plan sounds good! Great work and amazing to see this work moving forward 💪🏽

I think the main challenge here is to make sure to not confuse users with too many/complex configuration. Making it as easy to use as possible, discover configuration pieces and debugging when something is misconfigured. Not sure what can replace a fresh CRDs honestly, but maybe more focused docs/guides on this mode would be good enough! 🤗

(Disclaimer: I work for Google Cloud Managed Prometheus Team)

bwplotka · 2024-05-21T09:16:31Z

Documentation/proposals/202405-agent-daemonset.md

+
+## 1. Why
+
+When deploying Prometheus Agent in Kubernetes, three of the biggest users’ concerns are: load distribution, scalability, and security.


In some way it helps a lot with the cost, because it scales with the load, so you don't need to keep buffy Prometheus servers when your cluster is scaled back. Of course some mix of vertical and horizontal scaling for thousands small agents would be the best from the cost perspective, but we will never be able to do this with scraped metrics (and it's fine). Daemon set is somewhat in this direction while maintaining some other pros of stable collection 🤗

bwplotka · 2024-05-21T09:22:26Z

Documentation/proposals/202405-agent-daemonset.md

+
+When deploying Prometheus Agent in Kubernetes, three of the biggest users’ concerns are: load distribution, scalability, and security.
+
+DaemonSet deployment solves all these three concerns:


Suggested change

DaemonSet deployment solves all these three concerns:

DaemonSet deployment significantly improves on all of these three concerns:

It's not perfect. It does not solve load distribution to a single series, or even to a single target. It's a pragmatic solution to can work in a good enough manner 99.9% of users. (:

Documentation/proposals/202405-agent-daemonset.md

bwplotka · 2024-05-21T09:27:45Z

Documentation/proposals/202405-agent-daemonset.md

+DaemonSet deployment solves all these three concerns:
+* Load distribution: Each Prometheus Agent pod will only scrape the targets located on the same node. Even though the targets on some nodes may produce more metrics than other nodes, the load distribution would be reliable enough.
+* Automatic scalability: When new nodes are added to the cluster, new Prometheus Agent pods will be automatically added in the nodes that meet user-defined restrictions (if any).
+* Security: Since the scraped targets are local to the Prometheus Agent pod (on the same node), the scope of security problems is reduced to each node.


This is in practice... very hard to scale and audit. 🙃 But in theory there could be some security improvements, not sure if I agree with the explanation though (@pintohutch added good arguments below).

@TheSpiritXIII is helping a lot to get us to a better place here, and Prometheus Operator will be better with those changes e.g. prometheus/prometheus#13956

Since the scraped targets are local to the Prometheus Agent pod (on the same node), the scope of security problems is reduced to each node.

Can you give an example? I would think it would actually go the other way.

What if there are exploitations in Prometheus? Wouldn't that compound the security problems in the cluster by the number of nodes (i.e. an attacker now can exploit on any node in the cluster since the container is everywhere).

I was just thinking that the scope of security could be "isolated" to each node, because there's no inter-node communication (between Prometheus and the scrape targets). But after more digging I realized this was too naive view. Scratched it here (and in next commit): #6600 (comment)

Documentation/proposals/202405-agent-daemonset.md

bwplotka · 2024-05-21T10:41:04Z

Documentation/proposals/202405-agent-daemonset.md

+
+The non-goals are the features that are not easy to implement and require more investigation. We will need to investigate whether there are actual user needs for them, if yes, then how to best implement them. We’ll handle these after the MVP.
+* ServiceMonitor support: There's a performance issue regarding this feature. Since each Prometheus Agent running on a node requires one watch, making all Prometheus Agent pods watch all endpoints will put a huge stress on Kubernetes API server. This is the main reason why GMP hasn’t supported this, even though there are user needs stated in some issues ([#362](https://github.com/GoogleCloudPlatform/prometheus-engine/issues/362), [#192](https://github.com/GoogleCloudPlatform/prometheus-engine/issues/192)). However, as discussed with Danny from GMP [here](https://github.com/GoogleCloudPlatform/prometheus-engine/issues/192#issuecomment-2028850846), ServiceMonitor support based on EndpointSlice seems like a viable approach. We’ll investigate this further after the MVP.
+* Storage: We will need to spend time studying more about the WAL, different storage solutions provided by Kubernetes, and how to gracefully handle storage in different cases of crashes. For example, there’s an [issue in Prometheus](https://github.com/prometheus/prometheus/issues/8809) showing that samples may be lost if remote write didn’t flush cleanly. We’ll investigate these further after the MVP.


What about mixed deployment cases? Daemon set vs others? Would it be part of goal or non goal?

@simonpasquier once mentioned mixed modes cases. In general what additional things we need to do to enable mixed modes? I haven't been able to clearly see that.

Not sure, what matters is what intention your have here when testing/designing features. It feels mixed modes is a goal then?

What do you mean by mixed deployment goals @bwplotka?
What I mentioned before is that someone could very well deploy "statefulset" Prometheus/PrometheusAgent resources alongside "daemonset" PrometheusAgent resources.

What I mentioned before is that someone could very well deploy "statefulset" Prometheus/PrometheusAgent resources alongside "daemonset" PrometheusAgent resources.

Exactly that - mixed deployment. I mean we should put that in goals section to keep allowing those cases 👍🏽

I described this in Pitfalls of the current solution and Audience sections in the last commit. But I'm thinking whether I should put it in Goals too.

@simonpasquier @ArthurSens @kakkoyun Should we clarify in How that in MVP we won't allow switching from a live StatefulSet to DaemonSet => if early adopters are using StatefulSet and want to deploy mixed mode, they will have to first deleting the StatefulSet object, then deploy DaemonSet, then deploy StatefulSet again?

AFAIU Goals/Non-goals are what guide the How section. If we need to clarify this in How then maybe we need to add mixed deployment in Goals then.

What I mean by "mixed deployment" involves distinct resources scraping distinct targets. For instance,

# Normal Prometheus server scraping control-plane components apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: name: control-plane spec: serviceMonitorSelector: {kubernetes.io/part-of: control-plane} --- # DaemonSet Prometheus agent scraping data-plane components like kubelet apiVersion: monitoring.coreos.com/v1alpha1 kind: PrometheusAgent metadata: name: data-plane spec: serviceMonitorSelector: {kubernetes.io/part-of: data-plane} mode: DaemonSet

In this case, there's no intersection between the sets of targets.

Documentation/proposals/202405-agent-daemonset.md

pintohutch · 2024-05-21T21:20:18Z

Documentation/proposals/202405-agent-daemonset.md

+* Replica
+* Shard
+* Storage
+


In the MVP, we will simply fail the reconciliation if any of those fields are set.

Would that potentially break scraping if a user were to switch from mode: StatefulSet to mode: DaemonSet?

If users want to switch from StatefulSet to DaemonSet, they would have to unset the unsupported fields (if they had set them in StatefulSet). Besides documentation, do you have any ideas on how to make this switch smoother?

I discussed with @simonpasquier and @ArthurSens about whether we should simply log, or completely fail the reconciliation when unsupported fields are set. We concluded that having a log might not be enough, because users might neglect it and keep thinking that the unsupported fields would work. Do you have any ideas on this?

Maybe we need a test case for this switch?

Dare I say - a failing webhook on the CRD?

Ah, very good point! It didn't occur to me that switch from statefulset to daemonset in a live object.

We might need CEL earlier than we thought? from my understanding, it works as an admission webhook

For a first approach, I'd consider that failing the reconciliation is good enough. If the validation can be modeled with CEL, I'd prefer to go this way (a validating webhook is more complex to manage).

CEL is nice because you don't need a separate webhook service to configure and route to.

Alternatively, you could add some new logic in go code to the admission webhook. This is a nice option if CEL is insufficient for what you want to check (e.g. you have some Prometheus library-based validation or something that cannot be codified easily in CEL).

FWIW we have a "dedicated" CEL issue #5079

simonpasquier · 2024-05-22T15:42:22Z

Documentation/proposals/202405-agent-daemonset.md

+
+The non-goals are the features that are not easy to implement and require more investigation. We will need to investigate whether there are actual user needs for them, if yes, then how to best implement them. We’ll handle these after the MVP.
+* ServiceMonitor support: There's a performance issue regarding this feature. Since each Prometheus Agent running on a node requires one watch, making all Prometheus Agent pods watch all endpoints will put a huge stress on Kubernetes API server. This is the main reason why GMP hasn’t supported this, even though there are user needs stated in some issues ([#362](https://github.com/GoogleCloudPlatform/prometheus-engine/issues/362), [#192](https://github.com/GoogleCloudPlatform/prometheus-engine/issues/192)). However, as discussed with Danny from GMP [here](https://github.com/GoogleCloudPlatform/prometheus-engine/issues/192#issuecomment-2028850846), ServiceMonitor support based on EndpointSlice seems like a viable approach. We’ll investigate this further after the MVP.
+* Storage: We will need to spend time studying more about the WAL, different storage solutions provided by Kubernetes, and how to gracefully handle storage in different cases of crashes. For example, there’s an [issue in Prometheus](https://github.com/prometheus/prometheus/issues/8809) showing that samples may be lost if remote write didn’t flush cleanly. We’ll investigate these further after the MVP.


What do you mean by mixed deployment goals @bwplotka?
What I mentioned before is that someone could very well deploy "statefulset" Prometheus/PrometheusAgent resources alongside "daemonset" PrometheusAgent resources.

Documentation/proposals/202405-agent-daemonset.md

simonpasquier · 2024-05-22T15:47:37Z

Documentation/proposals/202405-agent-daemonset.md

+* Replica
+* Shard
+* Storage
+


For a first approach, I'd consider that failing the reconciliation is good enough. If the validation can be modeled with CEL, I'd prefer to go this way (a validating webhook is more complex to manage).

Documentation/proposals/202405-agent-daemonset.md

simonpasquier

Virtual approval from my side :)

Documentation/proposals/202405-agent-daemonset.md

simonpasquier

LGTM. There are a few MarkDown issues reported by the linter that need fixing though.

haanhvu · 2024-06-03T09:32:37Z

LGTM. There are a few MarkDown issues reported by the linter that need fixing though.

Yeah, do you want to merge it now, or leave it open for a little while as we discussed?

Documentation/proposals/202405-agent-daemonset.md

haanhvu · 2024-06-19T07:08:22Z

@ArthurSens @simonpasquier @kakkoyun I resolved all the comments. We have left this open for a while. Since there're no new reviews, I think we can merge this now.

kakkoyun

LGTM.

I'll go ahead and merge it now. We can always send subsequent PRs if we change decisions.

haanhvu · 2024-06-20T04:31:21Z

The auto-merge doesn't work, probably because the tests in CI don't run (seems like they only run in code-related PRs?)

ArthurSens · 2024-06-20T20:48:51Z

The auto-merge doesn't work, probably because the tests in CI don't run (seems like they only run in code-related PRs?)

Correct 😅

* fix: ScrapeClass TLSConfig nil pointer exception (prometheus-operator#6507) Signed-off-by: Simon Pasquier <[email protected]> * Update .github/workflows/stale.yaml Co-authored-by: Jayapriya Pai <[email protected]> * build(deps): bump github.com/prometheus/common from 0.52.3 to 0.53.0 Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.52.3 to 0.53.0. - [Release notes](https://github.com/prometheus/common/releases) - [Commits](prometheus/common@v0.52.3...v0.53.0) --- updated-dependencies: - dependency-name: github.com/prometheus/common dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * build(deps): bump golang.org/x/net from 0.21.0 to 0.23.0 in /scripts Bumps [golang.org/x/net](https://github.com/golang/net) from 0.21.0 to 0.23.0. - [Commits](golang/net@v0.21.0...v0.23.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> * build(deps): bump golang.org/x/net from 0.22.0 to 0.23.0 in /pkg/client Bumps [golang.org/x/net](https://github.com/golang/net) from 0.22.0 to 0.23.0. - [Commits](golang/net@v0.22.0...v0.23.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> * chore:cut v0.73.2 Signed-off-by: Jayapriya Pai <[email protected]> Co-authored-by: Simon Pasquier <[email protected]> * chore: update RELEASE.md instructions (prometheus-operator#6539) * chore: update RELEASE.md instructions Signed-off-by: Jayapriya Pai <[email protected]> * Update RELEASE.md Co-authored-by: Arthur Silva Sens <[email protected]> --------- Signed-off-by: Jayapriya Pai <[email protected]> Co-authored-by: Arthur Silva Sens <[email protected]> * update golangci-lint version (prometheus-operator#6543) Signed-off-by: dongjiang1989 <[email protected]> * feat(xds): Add support nomad service discovery to the ScrapeConfig CRD (prometheus-operator#6485) * add support for nomad sd Signed-off-by: dongjiang1989 <[email protected]> * fix generate checks Signed-off-by: Jayapriya Pai <[email protected]> * build(deps): bump golangci/golangci-lint-action from 4.0.0 to 5.0.0 (prometheus-operator#6547) Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 4.0.0 to 5.0.0. - [Release notes](https://github.com/golangci/golangci-lint-action/releases) - [Commits](golangci/golangci-lint-action@v4.0.0...v5.0.0) --- updated-dependencies: - dependency-name: golangci/golangci-lint-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump helm/kind-action from 1.9.0 to 1.10.0 Bumps [helm/kind-action](https://github.com/helm/kind-action) from 1.9.0 to 1.10.0. - [Release notes](https://github.com/helm/kind-action/releases) - [Commits](helm/kind-action@v1.9.0...v1.10.0) --- updated-dependencies: - dependency-name: helm/kind-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * chore: bump k8s libs to v0.30.0 Signed-off-by: Simon Pasquier <[email protected]> * chore: refactor the assets package This commit simplifies the API of the assets package. To limit the impact, it tackles only Basic Auth secrets for now. Previous API: ``` // storing the credentials from function A err = store.AddBasicAuth(ctx, namespace, httpConfig.BasicAuth, "some key") // retrieving the credentials from function B basicAuth := store.BasicAuthAssets["some key"] ``` New API: ``` // storing the credentials from function A err = store.AddBasicAuth(ctx, namespace, httpConfig.BasicAuth) // retrieving the credentials from function B s := store.ForNamespace(namespace) username, err := s.GetSecretKey(basicAuth.Username) password, err := s.GetSecretKey(basicAuth.Password) ``` The main simplification is that function B doesn't need to know how function A built the key value. It also makes testing more decoupled and reduces the risk of leaking data across namespaces. Signed-off-by: Simon Pasquier <[email protected]> * build(deps): bump sigs.k8s.io/controller-runtime from 0.17.3 to 0.18.0 Bumps [sigs.k8s.io/controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) from 0.17.3 to 0.18.0. - [Release notes](https://github.com/kubernetes-sigs/controller-runtime/releases) - [Changelog](https://github.com/kubernetes-sigs/controller-runtime/blob/main/RELEASE.md) - [Commits](kubernetes-sigs/controller-runtime@v0.17.3...v0.18.0) --- updated-dependencies: - dependency-name: sigs.k8s.io/controller-runtime dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * Chore: Change *RelabelConfigs to values instead of Pointers (prometheus-operator#6479) Signed-off-by: Simon Pasquier <[email protected]> --------- Signed-off-by: Simon Pasquier <[email protected]> Co-authored-by: Simon Pasquier <[email protected]> * doc: fix sample port name used * build(deps): bump golangci/golangci-lint-action from 5.0.0 to 5.1.0 Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 5.0.0 to 5.1.0. - [Release notes](https://github.com/golangci/golangci-lint-action/releases) - [Commits](golangci/golangci-lint-action@v5.0.0...v5.1.0) --- updated-dependencies: - dependency-name: golangci/golangci-lint-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * build(deps): bump google.golang.org/protobuf from 1.33.0 to 1.34.0 Bumps google.golang.org/protobuf from 1.33.0 to 1.34.0. --- updated-dependencies: - dependency-name: google.golang.org/protobuf dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * chore: refactor OAuth2 in the assets package This is the follow-up of prometheus-operator#6537 for OAuth2 credentials. Signed-off-by: Simon Pasquier <[email protected]> * build(deps): bump sigs.k8s.io/controller-runtime from 0.18.0 to 0.18.1 Bumps [sigs.k8s.io/controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) from 0.18.0 to 0.18.1. - [Release notes](https://github.com/kubernetes-sigs/controller-runtime/releases) - [Changelog](https://github.com/kubernetes-sigs/controller-runtime/blob/main/RELEASE.md) - [Commits](kubernetes-sigs/controller-runtime@v0.18.0...v0.18.1) --- updated-dependencies: - dependency-name: sigs.k8s.io/controller-runtime dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * build(deps): bump github.com/thanos-io/thanos from 0.34.1 to 0.35.0 Bumps [github.com/thanos-io/thanos](https://github.com/thanos-io/thanos) from 0.34.1 to 0.35.0. - [Release notes](https://github.com/thanos-io/thanos/releases) - [Changelog](https://github.com/thanos-io/thanos/blob/main/CHANGELOG.md) - [Commits](thanos-io/thanos@v0.34.1...v0.35.0) --- updated-dependencies: - dependency-name: github.com/thanos-io/thanos dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * feat: add Go runtime scheduler metrics Signed-off-by: Simon Pasquier <[email protected]> * chore: add link to public calendar (prometheus-operator#6564) Signed-off-by: Simon Pasquier <[email protected]> * Add testing steps for podman with kind (prometheus-operator#6509) * chore: Add testing instructions for using Podman with Kind * chore: fixing typos * Update formatting according to the failing checks. * Removed whitespace to match the standard. * Updating according to the suggestions from review. * update prometheus version Signed-off-by: dongjiang1989 <[email protected]> * build(deps): bump golangci/golangci-lint-action from 5.1.0 to 5.3.0 Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 5.1.0 to 5.3.0. - [Release notes](https://github.com/golangci/golangci-lint-action/releases) - [Commits](golangci/golangci-lint-action@v5.1.0...v5.3.0) --- updated-dependencies: - dependency-name: golangci/golangci-lint-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * chore: make TLSConfig fields optional This commit changes the `serverName` and `insecureSkipVerify` fields of TLS configuration to pointers instead of values. Signed-off-by: Simon Pasquier <[email protected]> * build(deps): bump sigs.k8s.io/controller-runtime from 0.18.1 to 0.18.2 Bumps [sigs.k8s.io/controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) from 0.18.1 to 0.18.2. - [Release notes](https://github.com/kubernetes-sigs/controller-runtime/releases) - [Changelog](https://github.com/kubernetes-sigs/controller-runtime/blob/main/RELEASE.md) - [Commits](kubernetes-sigs/controller-runtime@v0.18.1...v0.18.2) --- updated-dependencies: - dependency-name: sigs.k8s.io/controller-runtime dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * build(deps): bump golang.org/x/net from 0.24.0 to 0.25.0 Bumps [golang.org/x/net](https://github.com/golang/net) from 0.24.0 to 0.25.0. - [Commits](golang/net@v0.24.0...v0.25.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * fix: apply TLS scrape class to all objects Before this change, the TLS configuration from the scrape class wasn't applied to the generated configuration for PodMonitor, ScrapeConfig and Probe objects. Closes prometheus-operator#6556 Signed-off-by: Simon Pasquier <[email protected]> * build(deps): bump google.golang.org/protobuf from 1.34.0 to 1.34.1 Bumps google.golang.org/protobuf from 1.34.0 to 1.34.1. --- updated-dependencies: - dependency-name: google.golang.org/protobuf dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * update thanos version Signed-off-by: dongjiang1989 <[email protected]> * fix mistake by make generate Signed-off-by: dongjiang1989 <[email protected]> * build(deps): bump golangci/golangci-lint-action from 5.3.0 to 6.0.1 Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 5.3.0 to 6.0.1. - [Release notes](https://github.com/golangci/golangci-lint-action/releases) - [Commits](golangci/golangci-lint-action@v5.3.0...v6.0.1) --- updated-dependencies: - dependency-name: golangci/golangci-lint-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> * build(deps): bump github.com/prometheus/prometheus from 0.51.2 to 0.52.0 Bumps [github.com/prometheus/prometheus](https://github.com/prometheus/prometheus) from 0.51.2 to 0.52.0. - [Release notes](https://github.com/prometheus/prometheus/releases) - [Changelog](https://github.com/prometheus/prometheus/blob/main/CHANGELOG.md) - [Commits](prometheus/prometheus@v0.51.2...v0.52.0) --- updated-dependencies: - dependency-name: github.com/prometheus/prometheus dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * build(deps): bump github.com/prometheus/client_golang Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.19.0 to 1.19.1. - [Release notes](https://github.com/prometheus/client_golang/releases) - [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md) - [Commits](prometheus/client_golang@v1.19.0...v1.19.1) --- updated-dependencies: - dependency-name: github.com/prometheus/client_golang dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * fix: deref nil pointer on WebexConfig Signed-off-by: Yoan Blanc <[email protected]> * feat: add `go_sync_mutex_wait_total_seconds_total` metric Signed-off-by: Simon Pasquier <[email protected]> * chore: update golangci-lint configuration Signed-off-by: Simon Pasquier <[email protected]> * chore: update kind version to v0.23.0 Signed-off-by: Simon Pasquier <[email protected]> * chore: update Prometheus to v2.52.0 Signed-off-by: Simon Pasquier <[email protected]> * feat(ProxyConfig): Update CRD for ProxyConnectHeader type (prometheus-operator#6541) * update CRD for ProxyConnectHeader --------- Signed-off-by: dongjiang1989 <[email protected]> * Feat: Add `relabel_configs` field to AlertmanagerEndpoints (prometheus-operator#6467) * Add RelabelConfigs to AlertmanagerEndpoints * chore: Update vulnerable dependency golang.org/x/net Signed-off-by: Arthur Silva Sens <[email protected]> * chore: bump k8s libraries Signed-off-by: Simon Pasquier <[email protected]> * feat(env): auto set GOMAXPROCS by go.uber.org/automaxprocs (prometheus-operator#6576) --------- Signed-off-by: dongjiang1989 <[email protected]> * feat: support SDK auth in AzureSD Related-to prometheus-operator#6584 Signed-off-by: Jayapriya Pai <[email protected]> * feat: support SDK auth in AzureAD RemoteWrite Related-to prometheus-operator#6584 Signed-off-by: Jayapriya Pai <[email protected]> * Update promcfg.go Co-authored-by: Simon Pasquier <[email protected]> * Add structure for feature flags Signed-off-by: Arthur Silva Sens <[email protected]> * [WIP] Feat: Add `alert_relabel_configs` to the Prometheus and PrometheusAgent CRD's (prometheus-operator#6450) * AlertmanagerEndpoints: add alertRelabelingConfigs field to AlertmanagerEndpoints * alertmanagerEndpoints: wrap errors and fix naming for tests * fix: attempt to manually revert mistakenly commited code * chore: cut v0.74.0 Signed-off-by: Simon Pasquier <[email protected]> * Corrected Documentation for xxxMonitorNamespaceSelector (prometheus-operator#6605) Chore: Clarify that null is the default value for Service/PodMonitor selectors * Reload alert manager when notification templates change (prometheus-operator#6607) * Reload alert manager when notification templates change * feat: add automatic GOMAXPROCS to admission webhook Signed-off-by: Simon Pasquier <[email protected]> * crd: add support for source pagerduty_config option in AlertMananger CRD (prometheus-operator#6427) * crd: add support for source pagerduty_config option in AlertMananger CRD The AlertManager CRD was expected to have 1:1 fields mapped from https://prometheus.io/docs/alerting/latest/configuration/#pagerduty_config . Currently source was missing so it is added. --------- Co-authored-by: Jayapriya Pai <[email protected]> * AlertmanagerEndpoints: Move AlertmanagerEndpoints validation to pkg/prometheus/server * chore: remove WebTLSConfigError Signed-off-by: Simon Pasquier <[email protected]> * chore: rework webconfig package Signed-off-by: Simon Pasquier <[email protected]> * Add extra metric relabelings to scrape classes Signed-off-by: Mathieu Parent <[email protected]> * bugfix: Fix bug created from race conditions during merge Signed-off-by: Arthur Silva Sens <[email protected]> * [CHORE] considering global limits over enforced Signed-off-by: Nicolas Takashi <[email protected]> * build(deps): bump sigs.k8s.io/controller-runtime from 0.18.2 to 0.18.3 Bumps [sigs.k8s.io/controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) from 0.18.2 to 0.18.3. - [Release notes](https://github.com/kubernetes-sigs/controller-runtime/releases) - [Changelog](https://github.com/kubernetes-sigs/controller-runtime/blob/main/RELEASE.md) - [Commits](kubernetes-sigs/controller-runtime@v0.18.2...v0.18.3) --- updated-dependencies: - dependency-name: sigs.k8s.io/controller-runtime dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * e2e/framework: Allow setting feature-gates when creating Prometheus-Operator Signed-off-by: Arthur Silva Sens <[email protected]> * chore: add slashpai as release shepherd for v0.75 Signed-off-by: Jayapriya Pai <[email protected]> * Make a cluster of 2 worker nodes for e2e * [BUGFIX] Fix PrometheusAgent reconciliation for the statefulset changes (prometheus-operator#6615) * [BUGFIX] Fix PrometheusAgent reconciliation for the statefulset changes Signed-off-by: junotx <[email protected]> --------- Signed-off-by: junotx <[email protected]> * chore: add test for AlertmanagerConfig with subroutes Signed-off-by: Simon Pasquier <[email protected]> * Use functional options pattern for Prometheus Controller Signed-off-by: Arthur Silva Sens <[email protected]> * ScrapeConfig: Add `JobName` field to the CRD Co-authored-by: M Viswanath Sai <[email protected]> * chore: fix testScrapeConfigKubernetesNodeRole() Signed-off-by: Simon Pasquier <[email protected]> * chore: bump k8s dependencies for api regenerate assets Fixes prometheus-operator#6617 Signed-off-by: Jayapriya Pai <[email protected]> * [CHORE] allowing kubeconfig as parameter (prometheus-operator#6623) Signed-off-by: Nicolas Takashi <[email protected]> * Add feature gate for Prometheus Agent's DaemonSet deployment (prometheus-operator#6626) * Add feature gate for Prometheus Agent's DaemonSet deployment * Update pkg/prometheus/promcfg.go Co-authored-by: Simon Pasquier <[email protected]> * feat(env): Add automatic memory limit handling (prometheus-operator#6591) * add auto GOMEMLIMIT Signed-off-by: dongjiang1989 <[email protected]> --------- Signed-off-by: dongjiang1989 <[email protected]> Co-authored-by: Simon Pasquier <[email protected]> * build(deps): bump github.com/KimMachineGun/automemlimit Bumps [github.com/KimMachineGun/automemlimit](https://github.com/KimMachineGun/automemlimit) from 0.6.0 to 0.6.1. - [Release notes](https://github.com/KimMachineGun/automemlimit/releases) - [Commits](KimMachineGun/automemlimit@v0.6.0...v0.6.1) --- updated-dependencies: - dependency-name: github.com/KimMachineGun/automemlimit dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * fix: use a separate port number for init container The Kubernetes API starting from v1.30 will return a warning when a pod template contains 2 containers exposing the same port number, even across init and regular containers. [1] kubernetes/kubernetes#113245 Signed-off-by: Simon Pasquier <[email protected]> * Update pkg/prometheus/promcfg.go Co-authored-by: Simon Pasquier <[email protected]> * build(deps): bump github.com/prometheus/prometheus from 0.52.0 to 0.52.1 Bumps [github.com/prometheus/prometheus](https://github.com/prometheus/prometheus) from 0.52.0 to 0.52.1. - [Release notes](https://github.com/prometheus/prometheus/releases) - [Changelog](https://github.com/prometheus/prometheus/blob/main/CHANGELOG.md) - [Commits](prometheus/prometheus@v0.52.0...v0.52.1) --- updated-dependencies: - dependency-name: github.com/prometheus/prometheus dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * build(deps): bump github.com/thanos-io/thanos from 0.35.0 to 0.35.1 Bumps [github.com/thanos-io/thanos](https://github.com/thanos-io/thanos) from 0.35.0 to 0.35.1. - [Release notes](https://github.com/thanos-io/thanos/releases) - [Changelog](https://github.com/thanos-io/thanos/blob/v0.35.1/CHANGELOG.md) - [Commits](thanos-io/thanos@v0.35.0...v0.35.1) --- updated-dependencies: - dependency-name: github.com/thanos-io/thanos dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * Update pkg/prometheus/promcfg.go Co-authored-by: Simon Pasquier <[email protected]> * Update pkg/prometheus/promcfg.go Co-authored-by: Simon Pasquier <[email protected]> * Update pkg/prometheus/promcfg.go Co-authored-by: Simon Pasquier <[email protected]> * Update pkg/prometheus/promcfg.go Co-authored-by: Simon Pasquier <[email protected]> * Update pkg/prometheus/promcfg.go Co-authored-by: Simon Pasquier <[email protected]> * Update pkg/prometheus/promcfg.go Co-authored-by: Simon Pasquier <[email protected]> * chore: refactor tokens management in the assets package This is a follow-up of prometheus-operator#6537 and prometheus-operator#6557. Signed-off-by: Simon Pasquier <[email protected]> * chore: add test-e2e-image target to Makefile This change also simplifies the end-to-end testing instructions. Signed-off-by: Simon Pasquier <[email protected]> * Add `mode` field in PrometheusAgent CRD (prometheus-operator#6640) * Add mode field in PrometheusAgent CRD * build(deps): bump github.com/prometheus/common from 0.53.0 to 0.54.0 Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.53.0 to 0.54.0. - [Release notes](https://github.com/prometheus/common/releases) - [Changelog](https://github.com/prometheus/common/blob/main/RELEASE.md) - [Commits](prometheus/common@v0.53.0...v0.54.0) --- updated-dependencies: - dependency-name: github.com/prometheus/common dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * chore: refactor sigv4 management in the assets package This is a follow-up of prometheus-operator#6537 and prometheus-operator#6557 and prometheus-operator#6641. Signed-off-by: Simon Pasquier <[email protected]> * Update pkg/prometheus/promcfg.go Co-authored-by: Simon Pasquier <[email protected]> * chore: refactor AzureAD management in the assets package This is a follow-up of prometheus-operator#6537 and prometheus-operator#6557, prometheus-operator#6641 and prometheus-operator#6644. Signed-off-by: Simon Pasquier <[email protected]> * build(deps): bump github.com/prometheus-community/prom-label-proxy Bumps [github.com/prometheus-community/prom-label-proxy](https://github.com/prometheus-community/prom-label-proxy) from 0.8.1 to 0.9.0. - [Release notes](https://github.com/prometheus-community/prom-label-proxy/releases) - [Changelog](https://github.com/prometheus-community/prom-label-proxy/blob/main/CHANGELOG.md) - [Commits](prometheus-community/prom-label-proxy@v0.8.1...v0.9.0) --- updated-dependencies: - dependency-name: github.com/prometheus-community/prom-label-proxy dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * build(deps): bump golang.org/x/net from 0.25.0 to 0.26.0 Bumps [golang.org/x/net](https://github.com/golang/net) from 0.25.0 to 0.26.0. - [Commits](golang/net@v0.25.0...v0.26.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * ScrapeConfig CRD: Add DockerswarmSDConfigs to the ScrapeConfig CRD * ScrapeConfig: Add LinodeSDConfigs To The ScrapeConfig CRD * ScrapeConfig CRD: Add PuppetDB Service Discovery Configurations * build(deps): bump sigs.k8s.io/controller-runtime from 0.18.3 to 0.18.4 Bumps [sigs.k8s.io/controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) from 0.18.3 to 0.18.4. - [Release notes](https://github.com/kubernetes-sigs/controller-runtime/releases) - [Changelog](https://github.com/kubernetes-sigs/controller-runtime/blob/main/RELEASE.md) - [Commits](kubernetes-sigs/controller-runtime@v0.18.3...v0.18.4) --- updated-dependencies: - dependency-name: sigs.k8s.io/controller-runtime dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * Add NODE_NAME env in config reloader (prometheus-operator#6636) * Add NODE_NAME env in config reloader * feat: add `prometheus_operator_feature_gate_info` metric This change also moves the feature gates to the operator config struct. It means that after a feature gate is enabled/disabled, the operator will reconcile the managed Prometheus resources which should be the right thing to do. Signed-off-by: Simon Pasquier <[email protected]> * chore: bump code-generator to v0.30.1 The gen tools arguments have changed a bit, the Makefile commands have been adjusted accordingly. Signed-off-by: Simon Pasquier <[email protected]> * Changed the description for ```overrideHonorLabels``` field (prometheus-operator#6653) * Changed the decription for overrideHonorLabels * Update pkg/prometheus/promcfg.go Co-authored-by: Simon Pasquier <[email protected]> * build(deps): bump google.golang.org/protobuf from 1.34.1 to 1.34.2 Bumps google.golang.org/protobuf from 1.34.1 to 1.34.2. --- updated-dependencies: - dependency-name: google.golang.org/protobuf dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * build(deps): bump github.com/Azure/azure-sdk-for-go/sdk/azidentity Bumps [github.com/Azure/azure-sdk-for-go/sdk/azidentity](https://github.com/Azure/azure-sdk-for-go) from 1.5.2 to 1.6.0. - [Release notes](https://github.com/Azure/azure-sdk-for-go/releases) - [Changelog](https://github.com/Azure/azure-sdk-for-go/blob/main/documentation/release.md) - [Commits](Azure/azure-sdk-for-go@sdk/internal/v1.5.2...sdk/azcore/v1.6.0) --- updated-dependencies: - dependency-name: github.com/Azure/azure-sdk-for-go/sdk/azidentity dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> * build(deps): bump imjasonh/setup-crane from 0.3 to 0.4 Bumps [imjasonh/setup-crane](https://github.com/imjasonh/setup-crane) from 0.3 to 0.4. - [Release notes](https://github.com/imjasonh/setup-crane/releases) - [Commits](imjasonh/setup-crane@v0.3...v0.4) --- updated-dependencies: - dependency-name: imjasonh/setup-crane dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * [Enhancement] Replacing t.Fatal with testify/require package (prometheus-operator#6659) chore: pkg/operator: Replace t.Fatal with require.testify * chore: Add feature-gated tests to CI Signed-off-by: Arthur Silva Sens <[email protected]> * ScrapeConfig CRD: Add LightSail Service Discovery Config Options * chore: bump to k8s.io libs v0.30.2 Signed-off-by: Simon Pasquier <[email protected]> * build(deps): bump github.com/prometheus-community/prom-label-proxy Bumps [github.com/prometheus-community/prom-label-proxy](https://github.com/prometheus-community/prom-label-proxy) from 0.9.0 to 0.10.0. - [Release notes](https://github.com/prometheus-community/prom-label-proxy/releases) - [Changelog](https://github.com/prometheus-community/prom-label-proxy/blob/main/CHANGELOG.md) - [Commits](prometheus-community/prom-label-proxy@v0.9.0...v0.10.0) --- updated-dependencies: - dependency-name: github.com/prometheus-community/prom-label-proxy dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * Alertmanager pkg,t.Fatal->require pkg * chore: fix build after prom-label-proxy bump Signed-off-by: Simon Pasquier <[email protected]> * Replacing t.fatal with require Package (prometheus-operator#6680) chore: Replace t.Fatal with require package * feat(remote): add support prometheus remote write/read ProxyConfig (prometheus-operator#6512) * update prometheus remote write/read proxy config Signed-off-by: dongjiang1989 <[email protected]> --------- Signed-off-by: dongjiang1989 <[email protected]> * WIP: Refactor common test code between Prometheus Agent's StatefulSet and DaemonSet modes (prometheus-operator#6688) * Refactor test code between Prometheus Agent's StatefulSet and DaemonSet modes * update default thanos version Signed-off-by: dongjiang1989 <[email protected]> * build(deps): bump github.com/prometheus/prometheus from 0.52.1 to 0.53.0 Bumps [github.com/prometheus/prometheus](https://github.com/prometheus/prometheus) from 0.52.1 to 0.53.0. - [Release notes](https://github.com/prometheus/prometheus/releases) - [Changelog](https://github.com/prometheus/prometheus/blob/main/CHANGELOG.md) - [Commits](prometheus/prometheus@v0.52.1...v0.53.0) --- updated-dependencies: - dependency-name: github.com/prometheus/prometheus dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * update prometheus version Signed-off-by: dongjiang1989 <[email protected]> * config-reloader: create correct probes when `listenLocal` is set to `true` When the prometheus operator is started with `--enable-config-reloader-probes` it will now create `exec` probes that run curl/wget in the config-reloader container against localhost to check the /healthz endpoint if `listenLocal` is set to `true`. Otherwise, it creates `httpGet` probes as before. Fixes prometheus-operator#6682 * Nit: Check if EnableFeatures already contains agent mode's features (prometheus-operator#6701) * Check if EnableFeatures already contains ageent mode's features * [CHORE] Nicolas as release volunteer Signed-off-by: Nicolas Takashi <[email protected]> * Add `ttl` obj to alertmanagercfgs resource (prometheus-operator#6515) * add ttl obj into alertmanagerConfig rsc --------- Co-authored-by: Nicolas Takashi <[email protected]> Co-authored-by: Simon Pasquier <[email protected]> * chore: factorize prober code This is a quick follow-up of prometheus-operator#6698. Signed-off-by: Simon Pasquier <[email protected]> * chore: Replace StringPtrValOrDefault with ptr.Deref Signed-off-by: Arthur Silva Sens <[email protected]> * Proposal for DaemonSet deployment of Prometheus Agent (prometheus-operator#6600) chore: Add Proposal for Daemonset deployment of Prometheus Agent * chore: refactor TLS management in the assets package This is a follow-up of prometheus-operator#6537, prometheus-operator#6557, prometheus-operator#6641, prometheus-operator#6644 and prometheus-operator#6645. Signed-off-by: Simon Pasquier <[email protected]> * Refactor the common implementation code (not including tests) between Prometheus's modes (prometheus-operator#6686) * Refactor the common implementation code (not including tests) between Prometheus's modes * Continue prometheus-operator#6688: Refactor common test code between Prometheus modes (prometheus-operator#6694) * chore: optimize get secret key from store (prometheus-operator#6700) * optimize code Signed-off-by: dongjiang1989 <[email protected]> --------- Signed-off-by: dongjiang1989 <[email protected]> * build(deps): bump github.com/go-test/deep from 1.1.0 to 1.1.1 Bumps [github.com/go-test/deep](https://github.com/go-test/deep) from 1.1.0 to 1.1.1. - [Release notes](https://github.com/go-test/deep/releases) - [Changelog](https://github.com/go-test/deep/blob/master/CHANGES.md) - [Commits](go-test/deep@v1.1.0...v1.1.1) --- updated-dependencies: - dependency-name: github.com/go-test/deep dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * chore: bump go dependencies before release Signed-off-by: Jayapriya Pai <[email protected]> * feat(xds): Add OVHcloud service discovery to the ScrapeConfig CRD (prometheus-operator#6689) * add service discovery for ovhcloud Signed-off-by: dongjiang1989 <[email protected]> --------- Signed-off-by: dongjiang1989 <[email protected]> * chore: cut 0.75.0 Signed-off-by: Jayapriya Pai <[email protected]> * cherry-pick 6722 Signed-off-by: dongjiang1989 <[email protected]> * chore: cut 0.75.1 Signed-off-by: Jayapriya Pai <[email protected]> * fix: avoid invalid alerting config with TLS Signed-off-by: Simon Pasquier <[email protected]> * chore: cut 0.75.2 Signed-off-by: Jayapriya Pai <[email protected]> * conflict fix Signed-off-by: Coleen Iona Quadros <[email protected]> * conflict Signed-off-by: Coleen Iona Quadros <[email protected]> * conflict files Signed-off-by: Coleen Iona Quadros <[email protected]> * conflict files Signed-off-by: Coleen Iona Quadros <[email protected]> * conflict files Signed-off-by: Coleen Iona Quadros <[email protected]> --------- Signed-off-by: Simon Pasquier <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Jayapriya Pai <[email protected]> Signed-off-by: dongjiang1989 <[email protected]> Signed-off-by: Yoan Blanc <[email protected]> Signed-off-by: Arthur Silva Sens <[email protected]> Signed-off-by: Arthur Silva Sens <[email protected]> Signed-off-by: Mathieu Parent <[email protected]> Signed-off-by: Nicolas Takashi <[email protected]> Signed-off-by: junotx <[email protected]> Signed-off-by: Coleen Iona Quadros <[email protected]> Co-authored-by: Kemal Akkoyun <[email protected]> Co-authored-by: Simon Pasquier <[email protected]> Co-authored-by: Jayapriya Pai <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Arthur Silva Sens <[email protected]> Co-authored-by: dongjiang <[email protected]> Co-authored-by: M Viswanath Sai <[email protected]> Co-authored-by: Horaci Macias <[email protected]> Co-authored-by: Kapil Ramwani <[email protected]> Co-authored-by: Nicolas Takashi <[email protected]> Co-authored-by: Yoan Blanc <[email protected]> Co-authored-by: Arthur Silva Sens <[email protected]> Co-authored-by: Ashwin Sriram <[email protected]> Co-authored-by: Muhammad Hamza Zaib <[email protected]> Co-authored-by: mviswanathsai <[email protected]> Co-authored-by: Mathieu Parent <[email protected]> Co-authored-by: haanhvu <[email protected]> Co-authored-by: junot <[email protected]> Co-authored-by: janluak <_> Co-authored-by: Ha Anh Vu <[email protected]> Co-authored-by: Ashwin <[email protected]> Co-authored-by: Simon Dickhoven <[email protected]> Co-authored-by: Afzal Ansari <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Coleen Iona Quadros <[email protected]>

haanhvu requested a review from a team as a code owner May 16, 2024 13:54

pull-request-size bot added the size/L label May 16, 2024

simonpasquier reviewed May 16, 2024

View reviewed changes

ArthurSens reviewed May 16, 2024

View reviewed changes

ArthurSens reviewed May 20, 2024

View reviewed changes

kakkoyun previously approved these changes May 21, 2024

View reviewed changes

bwplotka approved these changes May 21, 2024

View reviewed changes

pintohutch reviewed May 21, 2024

View reviewed changes

haanhvu dismissed kakkoyun’s stale review via c35026c May 22, 2024 13:57

simonpasquier reviewed May 22, 2024

View reviewed changes

simonpasquier reviewed May 23, 2024

View reviewed changes

This was referenced May 28, 2024

Make a cluster of 2 worker nodes for e2e #6624

Merged

Add feature gate for Prometheus Agent's DaemonSet deployment #6626

Merged

Support different PrometheusAgent deployment models #5495

Open

simonpasquier reviewed May 29, 2024

View reviewed changes

Documentation/proposals/202405-agent-daemonset.md Show resolved Hide resolved

Documentation/proposals/202405-agent-daemonset.md Outdated Show resolved Hide resolved

haanhvu mentioned this pull request May 30, 2024

Add NODE_NAME env in config reloader #6636

Merged

5 tasks

simonpasquier added this to the [GSoC 2024] Daemonset mode for PrometheusAgent milestone May 30, 2024

simonpasquier reviewed Jun 3, 2024

View reviewed changes

ArthurSens reviewed Jun 3, 2024

View reviewed changes

Documentation/proposals/202405-agent-daemonset.md Outdated Show resolved Hide resolved

ArthurSens mentioned this pull request Jun 11, 2024

chore: Add feature-gated tests to CI #6665

Merged

5 tasks

simonpasquier reviewed Jun 13, 2024

View reviewed changes

Documentation/proposals/202405-agent-daemonset.md Outdated Show resolved Hide resolved

haanhvu added 5 commits June 19, 2024 13:29

Proposal for DaemonSet deployment of Prometheus Agent

2a468c0

Add follow-ups

4234b05

Resolve comments and fix some nits

68d04e3

Resolve comments and fix some nits

0a22dd1

Resolve comments and fix some nits

9fae9a9

haanhvu added 2 commits June 19, 2024 13:29

Resolve comments and fix nits

82b93fc

Resolve comments and rebase

9a8a583

haanhvu force-pushed the agent-daemonset-proposal branch from 2e8b843 to 9a8a583 Compare June 19, 2024 07:05

kakkoyun approved these changes Jun 19, 2024

View reviewed changes

kakkoyun enabled auto-merge June 19, 2024 19:20

ArthurSens disabled auto-merge June 20, 2024 20:48

ArthurSens merged commit 8cf75b8 into prometheus-operator:main Jun 20, 2024
9 checks passed


		## 1. Why

		When deploying Prometheus Agent in Kubernetes, three of the biggest users’ concerns are: load distribution, scalability, and security.


		When deploying Prometheus Agent in Kubernetes, three of the biggest users’ concerns are: load distribution, scalability, and security.

		DaemonSet deployment solves all these three concerns:

	DaemonSet deployment solves all these three concerns:
	DaemonSet deployment significantly improves on all of these three concerns:

Proposal for DaemonSet deployment of Prometheus Agent #6600

Proposal for DaemonSet deployment of Prometheus Agent #6600

Conversation

haanhvu commented May 16, 2024

Uh oh!

simonpasquier left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArthurSens left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

haanhvu May 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurSens May 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

haanhvu commented May 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bwplotka commented May 20, 2024

Uh oh!

kakkoyun commented May 20, 2024

Uh oh!

ArthurSens left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

haanhvu May 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

haanhvu May 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kakkoyun left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bwplotka left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

haanhvu May 17, 2024 •

edited

Loading

ArthurSens May 17, 2024 •

edited

Loading

haanhvu commented May 20, 2024 •

edited

Loading

haanhvu May 21, 2024 •

edited

Loading

haanhvu May 25, 2024 •

edited

Loading

bwplotka May 21, 2024 •

edited

Loading

haanhvu May 21, 2024 •

edited

Loading

haanhvu May 26, 2024 •

edited

Loading

haanhvu May 22, 2024 •

edited

Loading