Skip to content

feat: allow users to pause prebuilt workspace reconciliation #18700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

SasSwart
Copy link
Contributor

@SasSwart SasSwart commented Jul 1, 2025

This PR provides two commands:

  • coder prebuilds pause
  • coder prebuilds resume

These allow the suspension of all prebuilds activity, intended for use if prebuilds are misbehaving.

@SasSwart SasSwart changed the title feat: add cli commands to pause and resume prebuilt workspace reconci… feat: provide a mechanism to pause prebuilt workspace reconciliation Jul 1, 2025
@SasSwart SasSwart changed the title feat: provide a mechanism to pause prebuilt workspace reconciliation feat: allow users to pause prebuilt workspace reconciliation Jul 1, 2025
@SasSwart SasSwart requested review from johnstcn and ssncferreira July 1, 2025 19:02
Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM so far, just missing the final piece of plumbing

@SasSwart SasSwart requested a review from johnstcn July 2, 2025 08:26
@SasSwart SasSwart marked this pull request as ready for review July 2, 2025 08:29
@johnstcn
Copy link
Member

johnstcn commented Jul 2, 2025

Might be good to also avoid the term prebuilds in the CLI command unless there is prior art

@@ -29,6 +29,7 @@ const (
MetricEligibleGauge = namespace + "eligible"
MetricPresetHardLimitedGauge = namespace + "preset_hard_limited"
MetricLastUpdatedGauge = namespace + "metrics_last_updated"
MetricReconciliationPausedGauge = namespace + "reconciliation_paused"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paused is difficult to reason about; I'd suggest inverting and make it _running so that a good value is 1 and a bad value is 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paused in this case has a technical meaning relating to a specific setting. If we rename this to running then I think we need to change the setting to be {"reconciliation_running": true}, which doesn't communicate the intent of the API clearly.

I'd like to keep it how it currently is, but as compensatory measures I will:

  • update the metric documentation to be clear about its meaning
  • add a second metric to represent a general "check engine light" for prebuilds

Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be pretty handy, LGTM!

return
}

if bytes.Equal(settingsJSON, []byte(currentSettingsJSON)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: not a problem now but when we add more settings, map ordering may cause this check to fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch.

Comment on lines +211 to +216
mc.reconciliationPausedMu.RLock()
var pausedValue float64
if mc.reconciliationPaused {
pausedValue = 1
}
mc.reconciliationPausedMu.RUnlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: simpler to use an atomic.Bool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I did it this way to stick to the pattern already used above. I agree with the nit, but I'm going to prioritize getting this deployed.

var settings codersdk.PrebuildsSettings
if len(settingsJSON) > 0 {
if err := json.Unmarshal([]byte(settingsJSON), &settings); err != nil {
return xerrors.Errorf("unmarshal prebuilds settings: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we fail to unmarshal the settings, this will effectively pause the reconciliation - so it's probably worth hardcoding paused=true in the metric collector if this happens.

Or we need to have another metric which indicates the recloop is broken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we need to have another metric which indicates the recloop is broken.

Going to opt for this.
See my comment about a "check engine light" above.

require.Len(t, workspaces, 2, "should have created 2 prebuilds")

// Now pause prebuilds reconciliation
err = db.UpsertPrebuildsSettings(ctx, `{"reconciliation_paused": true}`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest going via the API here to avoid duplicating this business logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants