Jump to content

Server Admin Log

From Wikitech

2024-06-21

  • 11:37 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) shellbox-video.discovery.wmnet on all recursors
  • 11:37 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache shellbox-video.discovery.wmnet on all recursors
  • 11:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2214 (T367856)', diff saved to https://phabricator.wikimedia.org/P65303 and previous config saved to /var/cache/conftool/dbconfig/20240621-110638-marostegui.json
  • 11:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 11:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 10:57 Emperor: restart swift-proxy on ms-fe2011 ms-fe2012 T360913
  • 10:56 Emperor: restart swift-proxy on ms-fe1010 T360913
  • 10:36 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2002.codfw.wmnet
  • 10:36 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2001.codfw.wmnet
  • 10:28 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 10:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T364069)', diff saved to https://phabricator.wikimedia.org/P65302 and previous config saved to /var/cache/conftool/dbconfig/20240621-100554-marostegui.json
  • 10:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 10:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 10:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T364069)', diff saved to https://phabricator.wikimedia.org/P65301 and previous config saved to /var/cache/conftool/dbconfig/20240621-100531-marostegui.json
  • 09:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P65300 and previous config saved to /var/cache/conftool/dbconfig/20240621-095024-marostegui.json
  • 09:45 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 0:00:00 on karapace[1001-1002].eqiad.wmnet with reason: The hosts are soon to be decommissioned
  • 09:45 brouberol@cumin2002: START - Cookbook sre.hosts.downtime for 12 days, 0:00:00 on karapace[1001-1002].eqiad.wmnet with reason: The hosts are soon to be decommissioned
  • 09:41 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bookworm
  • 09:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P65299 and previous config saved to /var/cache/conftool/dbconfig/20240621-093517-marostegui.json
  • 09:31 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240603/ using stat1009.eqiad.wmnet)
  • 09:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T364069)', diff saved to https://phabricator.wikimedia.org/P65298 and previous config saved to /var/cache/conftool/dbconfig/20240621-092009-marostegui.json
  • 09:16 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
  • 09:14 aborrero@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
  • 09:02 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
  • 08:57 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
  • 08:56 aborrero@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bookworm
  • 08:47 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1053.eqiad.wmnet
  • 08:41 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 08:39 aborrero@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudvirt1053.eqiad.wmnet
  • 08:14 vgutierrez: restarting logrotate.service on cp[3068,3070-3071].esams.wmnet
  • 08:04 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 08:04 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 08:03 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 08:03 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 08:00 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 08:00 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 07:54 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65297 and previous config saved to /var/cache/conftool/dbconfig/20240621-075404-arnaudb.json
  • 07:38 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65296 and previous config saved to /var/cache/conftool/dbconfig/20240621-073858-arnaudb.json
  • 07:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65295 and previous config saved to /var/cache/conftool/dbconfig/20240621-072353-arnaudb.json
  • 07:08 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65294 and previous config saved to /var/cache/conftool/dbconfig/20240621-070847-arnaudb.json
  • 07:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool for debugging T368098', diff saved to https://phabricator.wikimedia.org/P65293 and previous config saved to /var/cache/conftool/dbconfig/20240621-070358-arnaudb.json
  • 04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T364069)', diff saved to https://phabricator.wikimedia.org/P65292 and previous config saved to /var/cache/conftool/dbconfig/20240621-045107-marostegui.json
  • 04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 04:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 04:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T364069)', diff saved to https://phabricator.wikimedia.org/P65291 and previous config saved to /var/cache/conftool/dbconfig/20240621-045044-marostegui.json
  • 04:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 04:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 04:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367856)', diff saved to https://phabricator.wikimedia.org/P65290 and previous config saved to /var/cache/conftool/dbconfig/20240621-044455-marostegui.json
  • 04:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P65289 and previous config saved to /var/cache/conftool/dbconfig/20240621-043537-marostegui.json
  • 04:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P65288 and previous config saved to /var/cache/conftool/dbconfig/20240621-042948-marostegui.json
  • 04:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P65287 and previous config saved to /var/cache/conftool/dbconfig/20240621-042030-marostegui.json
  • 04:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P65286 and previous config saved to /var/cache/conftool/dbconfig/20240621-041441-marostegui.json
  • 04:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T364069)', diff saved to https://phabricator.wikimedia.org/P65285 and previous config saved to /var/cache/conftool/dbconfig/20240621-040523-marostegui.json
  • 03:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367856)', diff saved to https://phabricator.wikimedia.org/P65284 and previous config saved to /var/cache/conftool/dbconfig/20240621-035934-marostegui.json
  • 03:04 ejegg: fundraising civicrm upgraded from 2e1db811 to 8a0b5bea
  • 01:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T367856)', diff saved to https://phabricator.wikimedia.org/P65283 and previous config saved to /var/cache/conftool/dbconfig/20240621-014545-marostegui.json
  • 01:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 01:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 01:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65282 and previous config saved to /var/cache/conftool/dbconfig/20240621-014523-marostegui.json
  • 01:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P65281 and previous config saved to /var/cache/conftool/dbconfig/20240621-013016-marostegui.json
  • 01:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P65280 and previous config saved to /var/cache/conftool/dbconfig/20240621-011509-marostegui.json
  • 01:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65279 and previous config saved to /var/cache/conftool/dbconfig/20240621-010002-marostegui.json
  • 00:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T352010)', diff saved to https://phabricator.wikimedia.org/P65278 and previous config saved to /var/cache/conftool/dbconfig/20240621-005237-ladsgroup.json
  • 00:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P65277 and previous config saved to /var/cache/conftool/dbconfig/20240621-003730-ladsgroup.json
  • 00:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P65276 and previous config saved to /var/cache/conftool/dbconfig/20240621-002223-ladsgroup.json
  • 00:08 mutante: [cp3072:~] $ sudo systemctl start varnishkafka-webrequest.service
  • 00:08 mutante: [cp3067:~] $ sudo systemctl start logrotate
  • 00:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T352010)', diff saved to https://phabricator.wikimedia.org/P65275 and previous config saved to /var/cache/conftool/dbconfig/20240621-000716-ladsgroup.json
  • 00:00 sukhe: restarting haproxy on cp3068 and cp3072

2024-06-20

  • 23:47 zabe@deploy1002: Finished scap: Update interwiki cache (duration: 10m 12s)
  • 23:36 zabe@deploy1002: Started scap: Update interwiki cache
  • 23:35 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=btmwiki --cluster=all 2>&1 | tee /tmp/btmwiki.UpdateSearchIndexConfig.log # T368038
  • 23:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T364069)', diff saved to https://phabricator.wikimedia.org/P65274 and previous config saved to /var/cache/conftool/dbconfig/20240620-233346-marostegui.json
  • 23:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 23:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 23:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T364069)', diff saved to https://phabricator.wikimedia.org/P65273 and previous config saved to /var/cache/conftool/dbconfig/20240620-233324-marostegui.json
  • 23:33 zabe@deploy1002: Finished scap: Creating btmwiki (T368038) (duration: 12m 20s)
  • 23:20 zabe@deploy1002: Started scap: Creating btmwiki (T368038)
  • 23:20 zabe: create Wikipedia Mandailing # T368038
  • 23:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P65272 and previous config saved to /var/cache/conftool/dbconfig/20240620-231817-marostegui.json
  • 23:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P65271 and previous config saved to /var/cache/conftool/dbconfig/20240620-230310-marostegui.json
  • 22:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T364069)', diff saved to https://phabricator.wikimedia.org/P65270 and previous config saved to /var/cache/conftool/dbconfig/20240620-224803-marostegui.json
  • 22:39 mutante: aphlict1002/aphlict2001 - systemctl stop aphlict_lograte.timer (and .service); systemctl disable aphlict_logrotate.timer (and .service); systemctl daemon-reload; systemctl reset-failed T367960
  • 22:33 zabe@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T361041 T363825 T366649 (duration: 09m 55s)
  • 22:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65269 and previous config saved to /var/cache/conftool/dbconfig/20240620-222909-marostegui.json
  • 22:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 22:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 22:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367856)', diff saved to https://phabricator.wikimedia.org/P65268 and previous config saved to /var/cache/conftool/dbconfig/20240620-222847-marostegui.json
  • 22:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P65267 and previous config saved to /var/cache/conftool/dbconfig/20240620-221340-marostegui.json
  • 21:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P65266 and previous config saved to /var/cache/conftool/dbconfig/20240620-215833-marostegui.json
  • 21:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367856)', diff saved to https://phabricator.wikimedia.org/P65265 and previous config saved to /var/cache/conftool/dbconfig/20240620-214326-marostegui.json
  • 21:12 ebernhardson@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:12 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 21:12 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 21:11 ebernhardson@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:10 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 21:09 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 21:09 brett: Include ncmonitor 1.0.0 in wikimedia-bookworm apt repo
  • 21:09 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:08 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:08 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:08 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 21:07 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 21:07 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:06 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 21:06 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 21:05 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 21:04 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 21:03 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 21:03 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 20:53 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on elastic1105.eqiad.wmnet with reason: T348977
  • 20:53 bking@cumin2002: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on elastic1105.eqiad.wmnet with reason: T348977
  • 20:44 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
  • 20:44 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
  • 20:43 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
  • 20:42 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
  • 20:40 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
  • 20:40 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
  • 20:39 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 20:38 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 20:36 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 20:36 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 20:34 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 20:33 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 20:28 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 20:27 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 20:27 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 20:27 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 20:27 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 20:26 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 20:26 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 20:26 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 20:25 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 20:25 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 20:25 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 20:24 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 19:58 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 19:58 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 19:57 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 19:57 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 19:57 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 19:57 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 19:57 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 19:57 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 19:56 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 19:55 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 19:54 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 19:52 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 19:51 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 19:18 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1105* for T348977 - bking@cumin2002
  • 19:18 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1105* for T348977 - bking@cumin2002
  • 19:18 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic1105 for T348977 - bking@cumin2002
  • 19:18 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1105 for T348977 - bking@cumin2002
  • 19:04 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host elastic2088.codfw.wmnet
  • 19:01 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 18:58 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 18:21 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bookworm
  • 18:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T364069)', diff saved to https://phabricator.wikimedia.org/P65263 and previous config saved to /var/cache/conftool/dbconfig/20240620-181635-marostegui.json
  • 18:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 18:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 18:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65262 and previous config saved to /var/cache/conftool/dbconfig/20240620-181613-marostegui.json
  • 18:06 inflatador: bking@an-airflow1007 install `ripgrep` deb pkg
  • 18:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P65261 and previous config saved to /var/cache/conftool/dbconfig/20240620-180104-marostegui.json
  • 17:51 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
  • 17:48 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
  • 17:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P65260 and previous config saved to /var/cache/conftool/dbconfig/20240620-174557-marostegui.json
  • 17:44 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host elastic2088.codfw.wmnet
  • 17:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T352010)', diff saved to https://phabricator.wikimedia.org/P65259 and previous config saved to /var/cache/conftool/dbconfig/20240620-174125-ladsgroup.json
  • 17:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 17:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 17:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65258 and previous config saved to /var/cache/conftool/dbconfig/20240620-173050-marostegui.json
  • 17:30 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bookworm
  • 17:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm
  • 17:15 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
  • 17:14 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
  • 16:34 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm
  • 16:33 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 75%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65256 and previous config saved to /var/cache/conftool/dbconfig/20240620-163348-arnaudb.json
  • 16:30 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bookworm
  • 16:18 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 50%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65254 and previous config saved to /var/cache/conftool/dbconfig/20240620-161842-arnaudb.json
  • 16:18 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Fix Special:Notifications (T368029) (duration: 12m 21s)
  • 16:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, urbanecm: Continuing with sync
  • 16:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, urbanecm: Backport for Fix Special:Notifications (T368029) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T357309)
  • 16:06 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: test glibc updates - bking@cumin2002 - T367978
  • 16:06 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
  • 16:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Fix Special:Notifications (T368029)
  • 16:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1028.eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 16:03 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 25%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65253 and previous config saved to /var/cache/conftool/dbconfig/20240620-160337-arnaudb.json
  • 16:03 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
  • 16:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2282.codfw.wmnet
  • 16:01 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw2282.codfw.wmnet
  • 16:01 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=mw2282.codfw.wmnet,cluster=kubernetes,service=kubesvc
  • 16:00 claime: Repooling and uncordoning mw2282.codfw.wmnet following move - T361856
  • 15:59 hnowlan@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T357309)
  • 15:59 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:58 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:57 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2019.codfw.wmnet|wikikube-worker2020.codfw.wmnet|wikikube-worker2021.codfw.wmnet|wikikube-worker2022.codfw.wmnet|wikikube-worker2023.codfw.wmnet|wikikube-worker2024.codfw.wmnet),cluster=kubernetes,service=kubesvc
  • 15:57 claime: Pooling and uncordoning wikikube-worker2019.codfw.wmnet,wikikube-worker2020.codfw.wmnet,wikikube-worker2021.codfw.wmnet,wikikube-worker2022.codfw.wmnet,wikikube-worker2023.codfw.wmnet,wikikube-worker2024.codfw.wmnet - T351074
  • 15:55 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1028.eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 15:55 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:55 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:54 hnowlan@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T357309)
  • 15:52 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: test glibc updates - bking@cumin2002 - T367978
  • 15:48 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 10%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65252 and previous config saved to /var/cache/conftool/dbconfig/20240620-154831-arnaudb.json
  • 15:46 hnowlan@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T357309)
  • 15:46 claime: homer 'cr*codfw*' commit 'T351074'
  • 15:45 cmooney@cumin1002: START - Cookbook sre.hosts.dhcp for host wikikube-ctrl2002.codfw.wmnet
  • 15:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bookworm
  • 15:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2019.codfw.wmnet with OS bullseye
  • 15:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2020.codfw.wmnet with OS bullseye
  • 15:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2022.codfw.wmnet with OS bullseye
  • 15:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:33 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:33 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 5%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65251 and previous config saved to /var/cache/conftool/dbconfig/20240620-153326-arnaudb.json
  • 15:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2024.codfw.wmnet with OS bullseye
  • 15:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2023.codfw.wmnet with OS bullseye
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2405.codfw.wmnet
  • 15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2405.codfw.wmnet
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2404.codfw.wmnet
  • 15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2404.codfw.wmnet
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2403.codfw.wmnet
  • 15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2403.codfw.wmnet
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2400.codfw.wmnet
  • 15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2400.codfw.wmnet
  • 15:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2021.codfw.wmnet with OS bullseye
  • 15:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2019.codfw.wmnet with reason: host reimage
  • 15:23 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:20 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:19 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2020.codfw.wmnet with reason: host reimage
  • 15:18 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 2%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65249 and previous config saved to /var/cache/conftool/dbconfig/20240620-151820-arnaudb.json
  • 15:18 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2022.codfw.wmnet with reason: host reimage
  • 15:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2024.codfw.wmnet with reason: host reimage
  • 15:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2023.codfw.wmnet with reason: host reimage
  • 15:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2021.codfw.wmnet with reason: host reimage
  • 15:06 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:05 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:04 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore100[4-6].eqiad.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2024.codfw.wmnet with reason: host reimage
  • 15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2023.codfw.wmnet with reason: host reimage
  • 15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2022.codfw.wmnet with reason: host reimage
  • 15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2019.codfw.wmnet with reason: host reimage
  • 15:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2020.codfw.wmnet with reason: host reimage
  • 15:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2021.codfw.wmnet with reason: host reimage
  • 15:02 jhathaway@deploy1002: Finished scap: (no justification provided) (duration: 04m 15s)
  • 15:01 topranks: rebooting lsw1-e6-eqiad to upgrade JunOS on switch T365987
  • 15:01 jhathaway@deploy1002: Started scap: (no justification provided)
  • 14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on an-worker[1160-1162].eqiad.wmnet,es1036.eqiad.wmnet,ms-be1077.eqiad.wmnet with reason: JunOS upgrade lsw1-e6-eqiad
  • 14:58 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on an-worker[1160-1162].eqiad.wmnet,es1036.eqiad.wmnet,ms-be1077.eqiad.wmnet with reason: JunOS upgrade lsw1-e6-eqiad
  • 14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e6-eqiad,lsw1-e6-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e6-eqiad
  • 14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e6-eqiad,lsw1-e6-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e6-eqiad
  • 14:57 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lsw1-f6-eqiad.mgmt
  • 14:57 cmooney@cumin1002: START - Cookbook sre.hosts.remove-downtime for lsw1-f6-eqiad.mgmt
  • 14:56 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-e6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
  • 14:56 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-e6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
  • 14:56 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 14:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1020.eqiad.wmnet
  • 14:54 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs1020.eqiad.wmnet
  • 14:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1018.eqiad.wmnet
  • 14:54 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs1018.eqiad.wmnet
  • 14:54 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 14:53 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-f6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
  • 14:53 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 6 hosts
  • 14:53 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-f6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
  • 14:53 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for 6 hosts
  • 14:48 sukhe: homer "*" commit "rolling out NTP ACL change"
  • 14:48 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 14:48 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2024.codfw.wmnet with OS bullseye
  • 14:47 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65248 and previous config saved to /var/cache/conftool/dbconfig/20240620-144750-arnaudb.json
  • 14:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2023.codfw.wmnet with OS bullseye
  • 14:47 vgutierrez: rolling restart of pybal on lvs1020 and lvs1018 - T367511
  • 14:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2022.codfw.wmnet with OS bullseye
  • 14:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2021.codfw.wmnet with OS bullseye
  • 14:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2020.codfw.wmnet with OS bullseye
  • 14:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2364 to wikikube-worker2024
  • 14:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2024
  • 14:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2019.codfw.wmnet with OS bullseye
  • 14:46 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore100[4-6].eqiad.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 14:45 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2024
  • 14:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2364 to wikikube-worker2024 - cgoubert@cumin1002"
  • 14:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T367856)', diff saved to https://phabricator.wikimedia.org/P65247 and previous config saved to /var/cache/conftool/dbconfig/20240620-144423-marostegui.json
  • 14:44 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2364 to wikikube-worker2024 - cgoubert@cumin1002"
  • 14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 14:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 14:43 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 14:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 14:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 14:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65246 and previous config saved to /var/cache/conftool/dbconfig/20240620-144341-marostegui.json
  • 14:42 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore200[5-6].codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 14:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2364 to wikikube-worker2024
  • 14:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Main board swap — T362033
  • 14:39 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Main board swap — T362033
  • 14:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2363 to wikikube-worker2023
  • 14:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2023
  • 14:38 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1051.eqiad.wmnet with OS bookworm
  • 14:38 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 14:37 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2023
  • 14:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2363 to wikikube-worker2023 - cgoubert@cumin1002"
  • 14:37 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2324.codfw.wmnet
  • 14:37 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2324.codfw.wmnet
  • 14:36 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2323.codfw.wmnet
  • 14:36 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2323.codfw.wmnet
  • 14:36 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw1489.eqiad.wmnet
  • 14:36 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw1489.eqiad.wmnet
  • 14:35 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 14:35 sukhe: running authdns-update for CR 1047074
  • 14:35 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2363 to wikikube-worker2023 - cgoubert@cumin1002"
  • 14:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 14:32 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65245 and previous config saved to /var/cache/conftool/dbconfig/20240620-143244-arnaudb.json
  • 14:32 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:32 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2363 to wikikube-worker2023
  • 14:31 moritzm: imported python-pymysql 1.0.2-2~wmf11u2 to apt.wikimedia.org (merge of the security fix from DSA 5700 on top of our internal backport)
  • 14:31 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 depool ahead of T365987', diff saved to https://phabricator.wikimedia.org/P65244 and previous config saved to /var/cache/conftool/dbconfig/20240620-143109-arnaudb.json
  • 14:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1036.eqiad.wmnet with reason: T365987
  • 14:30 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore200[5-6].codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 14:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on es1036.eqiad.wmnet with reason: T365987
  • 14:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2362 to wikikube-worker2022
  • 14:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2022
  • 14:29 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2004.codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 14:28 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2022
  • 14:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2362 to wikikube-worker2022 - cgoubert@cumin1002"
  • 14:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65243 and previous config saved to /var/cache/conftool/dbconfig/20240620-142834-marostegui.json
  • 14:27 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2362 to wikikube-worker2022 - cgoubert@cumin1002"
  • 14:27 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 14:26 sukhe: sudo cumin 'O:alerting_host' 'run-puppet-agent'
  • 14:25 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 14:25 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:25 elukey@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update wmf-plugin for K8s ml-staging - elukey@cumin1002
  • 14:25 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:24 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2362 to wikikube-worker2022
  • 14:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 14:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 14:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 14:24 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:22 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2004.codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 14:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2360 to wikikube-worker2021
  • 14:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2021
  • 14:21 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2021
  • 14:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2360 to wikikube-worker2021 - cgoubert@cumin1002"
  • 14:19 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2360 to wikikube-worker2021 - cgoubert@cumin1002"
  • 14:17 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65242 and previous config saved to /var/cache/conftool/dbconfig/20240620-141739-arnaudb.json
  • 14:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: IPIP migration
  • 14:17 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:17 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: IPIP migration
  • 14:17 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2360 to wikikube-worker2021
  • 14:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2358 to wikikube-worker2020
  • 14:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2020
  • 14:15 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2020
  • 14:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2358 to wikikube-worker2020 - cgoubert@cumin1002"
  • 14:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 14:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 14:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65241 and previous config saved to /var/cache/conftool/dbconfig/20240620-141328-marostegui.json
  • 14:13 elukey@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update wmf-plugin for K8s ml-staging - elukey@cumin1002
  • 14:13 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2358 to wikikube-worker2020 - cgoubert@cumin1002"
  • 14:10 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:10 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2358 to wikikube-worker2020
  • 14:10 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1051.eqiad.wmnet with reason: host reimage
  • 14:10 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P65240 and previous config saved to /var/cache/conftool/dbconfig/20240620-141010-root.json
  • 14:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2339 to wikikube-worker2019
  • 14:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2019
  • 14:09 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2019
  • 14:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2339 to wikikube-worker2019 - cgoubert@cumin1002"
  • 14:07 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1051.eqiad.wmnet with reason: host reimage
  • 14:07 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2339 to wikikube-worker2019 - cgoubert@cumin1002"
  • 14:04 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:04 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2339 to wikikube-worker2019
  • 14:02 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65239 and previous config saved to /var/cache/conftool/dbconfig/20240620-140233-arnaudb.json
  • 14:01 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1049.eqiad.wmnet with OS bookworm
  • 14:01 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1050.eqiad.wmnet with OS bookworm
  • 13:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65238 and previous config saved to /var/cache/conftool/dbconfig/20240620-135820-marostegui.json
  • 13:57 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 13:56 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 13:56 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 13:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65237 and previous config saved to /var/cache/conftool/dbconfig/20240620-135610-marostegui.json
  • 13:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 13:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 13:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65236 and previous config saved to /var/cache/conftool/dbconfig/20240620-135559-marostegui.json
  • 13:55 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 13:55 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 13:55 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 13:55 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 13:55 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 13:55 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 13:54 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 13:54 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 13:54 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P65235 and previous config saved to /var/cache/conftool/dbconfig/20240620-135438-root.json
  • 13:54 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 13:53 claime: Depooling mw2339.codfw.wmnet,mw2358.codfw.wmnet,mw2360.codfw.wmnet,mw2362.codfw.wmnet,mw2363.codfw.wmnet,mw2364.codfw.wmnet for reimage to k8s - T351074
  • 13:53 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 13:52 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 13:52 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 13:51 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 13:51 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 13:50 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 13:50 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 13:50 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1051.eqiad.wmnet with OS bookworm
  • 13:50 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 13:47 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 10%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65234 and previous config saved to /var/cache/conftool/dbconfig/20240620-134728-arnaudb.json
  • 13:46 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 13:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65233 and previous config saved to /var/cache/conftool/dbconfig/20240620-134052-marostegui.json
  • 13:39 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P65232 and previous config saved to /var/cache/conftool/dbconfig/20240620-133907-root.json
  • 13:36 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1050.eqiad.wmnet with reason: host reimage
  • 13:32 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1049.eqiad.wmnet with reason: host reimage
  • 13:28 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1050.eqiad.wmnet with reason: host reimage
  • 13:28 hashar@deploy1002: Finished deploy [integration/docroot@7f59f49]: build: Updating eslint-config-wikimedia to 0.28.2 (duration: 00m 06s)
  • 13:28 hashar@deploy1002: Started deploy [integration/docroot@7f59f49]: build: Updating eslint-config-wikimedia to 0.28.2
  • 13:27 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1049.eqiad.wmnet with reason: host reimage
  • 13:27 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 13:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 13:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65231 and previous config saved to /var/cache/conftool/dbconfig/20240620-132545-marostegui.json
  • 13:24 reedy@deploy1002: Synchronized wmf-config/: T368003 (duration: 10m 39s)
  • 13:24 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 13:23 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P65230 and previous config saved to /var/cache/conftool/dbconfig/20240620-132335-root.json
  • 13:23 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 13:22 elukey: upload dragonfly packages 1.0.6-2 to bookworm-wikimedia - T365253
  • 13:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65228 and previous config saved to /var/cache/conftool/dbconfig/20240620-131038-marostegui.json
  • 13:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65227 and previous config saved to /var/cache/conftool/dbconfig/20240620-131031-marostegui.json
  • 13:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 13:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 13:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65226 and previous config saved to /var/cache/conftool/dbconfig/20240620-130928-marostegui.json
  • 13:09 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1050.eqiad.wmnet with OS bookworm
  • 13:09 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1049.eqiad.wmnet with OS bookworm
  • 13:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 13:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 13:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 13:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 13:08 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P65225 and previous config saved to /var/cache/conftool/dbconfig/20240620-130804-root.json
  • 13:07 sukhe: running homer on cr*{eqiad,codfw}* for CR 1046737: update policies/cr-labs.yaml for new NTP servers: T366360
  • 13:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin1002.eqiad.wmnet
  • 13:05 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2003.codfw.wmnet
  • 13:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin1002.eqiad.wmnet
  • 13:00 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-staging2003.codfw.wmnet
  • 12:54 sukhe: sudo cumin -b1 -s30 "A:installserver" "run-puppet-agent": T366360
  • 12:51 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 5%: 1', diff saved to https://phabricator.wikimedia.org/P65223 and previous config saved to /var/cache/conftool/dbconfig/20240620-125139-root.json
  • 12:51 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 12:44 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 12:37 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1048.eqiad.wmnet with OS bookworm
  • 12:33 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1047.eqiad.wmnet with OS bookworm
  • 12:11 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1048.eqiad.wmnet with reason: host reimage
  • 12:08 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
  • 12:06 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1048.eqiad.wmnet with reason: host reimage
  • 12:04 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
  • 11:52 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=mw2282.codfw.wmnet,cluster=kubernetes,service=kubesvc
  • 11:48 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 11:48 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1048.eqiad.wmnet with OS bookworm
  • 11:47 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bookworm
  • 11:41 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 11:38 XioNoX: merge netbox-extra CR1038869 - Fix lots of CI errors
  • 11:33 jgiannelos@deploy1002: Finished deploy [restbase/deploy@f867c66]: (no justification provided) (duration: 30m 12s)
  • 11:27 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 11:26 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 11:25 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 11:25 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 11:21 akosiaris: upgrade mathoid to 2024-06-18-233457-production T349118
  • 11:20 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: sync
  • 11:20 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: sync
  • 11:03 jgiannelos@deploy1002: Started deploy [restbase/deploy@f867c66]: (no justification provided)
  • 10:57 dreamyjazz@deploy1002: Finished scap: Backport for [testwiki] Fix assignment of 'checkuser-temporary-account' right (T367170) (duration: 15m 03s)
  • 10:48 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
  • 10:44 dreamyjazz@deploy1002: dreamyjazz: Backport for [testwiki] Fix assignment of 'checkuser-temporary-account' right (T367170) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:42 dreamyjazz@deploy1002: Started scap: Backport for [testwiki] Fix assignment of 'checkuser-temporary-account' right (T367170)
  • 10:41 Amir1: running extensions/Echo/maintenance/removeOrphanedEvents.php --force on all wikis (T308084)
  • 10:37 dreamyjazz@deploy1002: Finished scap: Backport for [testwiki] Assign 'checkuser-temporary-account' to the sysop group (T367170) (duration: 13m 49s)
  • 10:33 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 10:33 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1045.eqiad.wmnet with OS bookworm
  • 10:31 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=mw2321.codfw.wmnet,cluster=kubernetes,service=kubesvc
  • 10:31 claime: repooling and uncordoning mw2321.codfw.wmnet - T367862
  • 10:31 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 10:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2321.codfw.wmnet
  • 10:30 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw2321.codfw.wmnet
  • 10:28 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
  • 10:25 dreamyjazz@deploy1002: dreamyjazz: Backport for [testwiki] Assign 'checkuser-temporary-account' to the sysop group (T367170) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:24 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 10:23 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 10:23 dreamyjazz@deploy1002: Started scap: Backport for [testwiki] Assign 'checkuser-temporary-account' to the sysop group (T367170)
  • 10:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2321.codfw.wmnet with reason: Test scap with host unavailable
  • 10:20 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 10:20 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2321.codfw.wmnet with reason: Test scap with host unavailable
  • 10:19 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 10:18 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:18 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:17 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 10:16 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:16 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:15 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=mw2321.codfw.wmnet,cluster=kubernetes,service=kubesvc
  • 10:14 claime: Draining and depooling mw2321.codfw.wmnet to test 1047031 - T367862
  • 10:14 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 10:07 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
  • 10:04 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage
  • 10:04 claime: Running puppet on A:wikikube-worker
  • 10:02 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
  • 10:01 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage
  • 10:00 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 10:00 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 09:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 09:51 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 09:50 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 09:49 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 09:47 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:45 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 09:45 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1045.eqiad.wmnet with OS bookworm
  • 09:45 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:16 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php sysop_plwiki AramilFeraxa REDACTED --bureaucrat --sysop # T361041
  • 08:57 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 08:51 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 08:51 cmooney@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.6.6 - cmooney@cumin1002
  • 08:50 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 08:49 cmooney@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.6.6 - cmooney@cumin1002
  • 08:36 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 08:33 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 08:23 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 08:16 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.10 refs T361404
  • 08:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1001.wikimedia.org
  • 08:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1001.wikimedia.org
  • 08:08 moritzm: reboot of irc1001 to nudge clients to re-connect to the new bullseye host T331702
  • 08:06 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
  • 08:03 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
  • 07:53 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:53 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 07:53 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 07:52 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 07:48 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 07:04 moritzm: failover irc.wikimedia.org to the new Bullseye servers T331702
  • 06:04 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on an-worker1085.eqiad.wmnet with reason: T367825 hw maint 2024-06-20
  • 06:03 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 18:00:00 on an-worker1085.eqiad.wmnet with reason: T367825 hw maint 2024-06-20
  • 05:27 marostegui: Deploy schema change on old s7 eqiad master dbmaint (db1236) T364299
  • 05:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Long schema change
  • 05:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Long schema change
  • 05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1236 T367857', diff saved to https://phabricator.wikimedia.org/P65220 and previous config saved to /var/cache/conftool/dbconfig/20240620-052359-root.json
  • 05:22 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1181 to s7 primary and set section read-write T367857', diff saved to https://phabricator.wikimedia.org/P65219 and previous config saved to /var/cache/conftool/dbconfig/20240620-052253-marostegui.json
  • 05:22 marostegui@cumin1002: dbctl commit (dc=all): 'Set s7 eqiad as read-only for maintenance - T367857', diff saved to https://phabricator.wikimedia.org/P65218 and previous config saved to /var/cache/conftool/dbconfig/20240620-052230-marostegui.json
  • 05:22 marostegui: Starting s7 eqiad failover from db1236 to db1181 - T367857
  • 05:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Long schema change
  • 05:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Long schema change
  • 05:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 T367857
  • 05:04 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1181 with weight 0 T367857', diff saved to https://phabricator.wikimedia.org/P65217 and previous config saved to /var/cache/conftool/dbconfig/20240620-050428-marostegui.json
  • 05:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 28 hosts with reason: Primary switchover s7 T367857
  • 02:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T367856)', diff saved to https://phabricator.wikimedia.org/P65216 and previous config saved to /var/cache/conftool/dbconfig/20240620-022416-marostegui.json
  • 02:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 02:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 02:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 02:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 02:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65215 and previous config saved to /var/cache/conftool/dbconfig/20240620-022349-marostegui.json
  • 02:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65214 and previous config saved to /var/cache/conftool/dbconfig/20240620-020842-marostegui.json
  • 01:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65213 and previous config saved to /var/cache/conftool/dbconfig/20240620-015335-marostegui.json
  • 01:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65212 and previous config saved to /var/cache/conftool/dbconfig/20240620-013827-marostegui.json

2024-06-19

  • 23:05 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php arbcom_itwiki Superpes15 REDACTED --bureaucrat --sysop
  • 23:05 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php u4cwiki Superpes15 REDACTED --bureaucrat --sysop
  • 21:08 oblivian@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wikikube-ctrl[2001-2002].codfw.wmnet with reason: Reimage --kamila
  • 21:08 oblivian@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wikikube-ctrl[2001-2002].codfw.wmnet with reason: Reimage --kamila
  • 20:33 zabe@deploy1002: Finished scap: Backport for [tlywiki] Change the logo and wordmark/tagline (T366431) (duration: 14m 41s)
  • 20:24 zabe@deploy1002: superpes, zabe: Continuing with sync
  • 20:23 zabe@deploy1002: superpes, zabe: Backport for [tlywiki] Change the logo and wordmark/tagline (T366431) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:19 zabe@deploy1002: Started scap: Backport for [tlywiki] Change the logo and wordmark/tagline (T366431)
  • 19:08 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
  • 19:05 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
  • 18:54 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2001.codfw.wmnet with reason: host reimage
  • 18:51 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2001.codfw.wmnet with reason: host reimage
  • 18:49 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 18:48 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 18:40 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 18:35 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 18:34 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 18:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65211 and previous config saved to /var/cache/conftool/dbconfig/20240619-182922-marostegui.json
  • 18:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 18:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 18:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65210 and previous config saved to /var/cache/conftool/dbconfig/20240619-182900-marostegui.json
  • 18:21 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 18:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65209 and previous config saved to /var/cache/conftool/dbconfig/20240619-181353-marostegui.json
  • 17:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65208 and previous config saved to /var/cache/conftool/dbconfig/20240619-175846-marostegui.json
  • 17:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65207 and previous config saved to /var/cache/conftool/dbconfig/20240619-174338-marostegui.json
  • 17:21 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:21 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
  • 17:20 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
  • 17:13 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 17:05 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 17:01 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2002
  • 17:01 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2002
  • 17:01 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2001
  • 17:01 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2001
  • 17:00 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:00 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
  • 16:59 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
  • 16:42 sukhe: sudo cumin 'A:durum' 'run-puppet-agent' to switch timesyncd NTP pools to ntp-[abc].anycast.wmnet: T366360
  • 16:27 claime: pooling and uncordoning mw2321.codfw.wmnet - T367702
  • 16:27 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=mw2321.codfw.wmnet,cluster=kubernetes,service=kubesvc
  • 16:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: service=(ntp-a|ntp-b|ntp-c)
  • 16:15 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
  • 16:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw2321.codfw.wmnet back to active - cgoubert@cumin1002"
  • 16:12 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw2321.codfw.wmnet back to active - cgoubert@cumin1002"
  • 16:09 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 16:03 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
  • 15:55 ayounsi@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
  • 15:55 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
  • 15:51 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:50 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:46 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:46 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 15:45 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 15:44 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:43 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 15:32 taavi@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1042
  • 15:32 taavi@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1042
  • 15:24 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:24 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:23 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:23 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:23 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:22 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:16 sukhe: sudo cumin -b1 -s120 'A:dnsbox' 'run-puppet-agent --enable "merging CR 1046685"': T366360
  • 15:08 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:08 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS info - pt1979@cumin2002"
  • 15:07 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS info - pt1979@cumin2002"
  • 15:07 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 15:06 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns2006.wikimedia.org,service=ntp-c
  • 15:03 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:01 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on mw2282.codfw.wmnet with reason: Host move
  • 15:01 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:01 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on mw2282.codfw.wmnet with reason: Host move
  • 15:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2282.codfw.wmnet
  • 15:00 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw2282.codfw.wmnet
  • 14:59 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.remove-downtime (exit_code=97) for wikikube-worker2003.codfw.wmnet
  • 14:59 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-worker2003.codfw.wmnet
  • 14:42 marostegui: Deploy schema change on s2 eqiad master dbmaint T364069
  • 14:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Long schema change
  • 14:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Long schema change
  • 14:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155,1158].eqiad.wmnet with reason: Long schema change
  • 14:38 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 14:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155,1158].eqiad.wmnet with reason: Long schema change
  • 14:38 taavi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1002"
  • 14:37 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
  • 14:36 taavi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1002"
  • 14:35 moritzm: installing nano security updates
  • 14:34 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
  • 14:24 moritzm: installing libvpx security updates
  • 14:23 moritzm: installing pymysql security updates
  • 14:19 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 14:19 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 14:17 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 14:14 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 14:12 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 14:11 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
  • 14:11 taavi@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
  • 14:10 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging2003.codfw.wmnet with OS bookworm
  • 14:10 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - klausman@cumin2002"
  • 14:09 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
  • 14:09 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - klausman@cumin2002"
  • 14:09 taavi@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
  • 14:08 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
  • 14:08 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
  • 14:07 taavi@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
  • 14:07 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
  • 14:01 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 13:57 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging2003.codfw.wmnet with reason: host reimage
  • 13:54 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging2003.codfw.wmnet with reason: host reimage
  • 13:53 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 13:53 taavi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 13:51 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
  • 13:50 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
  • 13:49 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
  • 13:48 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
  • 13:43 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 13:42 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
  • 13:41 taavi@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 13:41 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
  • 13:35 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
  • 13:35 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
  • 13:35 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 13:35 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5017.*} and A:cp
  • 13:32 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5017.*} and A:cp
  • 13:32 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
  • 13:32 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org,service=ntp-a
  • 13:31 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
  • 13:31 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 13:28 sukhe: enable puppet on dns6001 to test CR 1046685
  • 13:23 sukhe: sudo cumin 'A:dnsbox' 'disable-puppet "merging CR 1046685"': T366360
  • 13:22 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:21 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on mw2282.codfw.wmnet with reason: host move
  • 13:21 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on mw2282.codfw.wmnet with reason: host move
  • 13:20 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 13:17 kamila_: drained mw2282.codfw.wmnet for T361856
  • 13:16 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 13:06 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 13:04 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: service=ntp-[abc]
  • 13:04 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 12:52 taavi@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 12:51 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2011.codfw.wmnet|wikikube-worker2012.codfw.wmnet|wikikube-worker2013.codfw.wmnet|wikikube-worker2014.codfw.wmnet|wikikube-worker2017.codfw.wmnet|wikikube-worker2018.codfw.wmnet),cluster=kubernetes,service=kubesvc
  • 12:40 claime: homer 'cr*codfw*' commit 'T351074'
  • 12:38 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:38 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
  • 12:38 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:37 taavi@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 12:37 taavi@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 12:36 taavi@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 12:36 taavi@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 12:36 taavi@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 12:35 taavi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 12:34 klausman: Puppet management of install2004 restored, lpxelinux.0 also restored.
  • 12:24 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
  • 12:22 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:21 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:20 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:19 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:17 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
  • 12:14 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 12:14 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 12:13 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 12:12 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 12:11 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 12:11 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 12:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 12:08 klausman: Will test-replace the PXE chainloader (/srv/tftpboot/lpxelinux.0) on install2003 with a newer version to see if it fixes the ldlinux.c32 error. Puppet will be disabled on that machine for the duration.
  • 12:07 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 12:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 12:06 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
  • 12:03 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
  • 12:03 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
  • 12:02 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
  • 12:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65204 and previous config saved to /var/cache/conftool/dbconfig/20240619-120142-root.json
  • 12:01 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on idp-test1002.wikimedia.org with reason: CAS 7 upgrade
  • 12:01 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on idp-test1002.wikimedia.org with reason: CAS 7 upgrade
  • 12:00 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 12:00 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5017.*} and A:cp
  • 11:57 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 11:57 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5017.*} and A:cp
  • 11:50 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 11:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65203 and previous config saved to /var/cache/conftool/dbconfig/20240619-114636-root.json
  • 11:36 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-eqsin
  • 11:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65201 and previous config saved to /var/cache/conftool/dbconfig/20240619-113131-root.json
  • 11:26 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 11:18 ayounsi@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host netbox-dev2003.codfw.wmnet
  • 11:18 ayounsi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host netbox-dev2003.codfw.wmnet with OS bookworm
  • 11:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2012.codfw.wmnet with OS bullseye
  • 11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65200 and previous config saved to /var/cache/conftool/dbconfig/20240619-111625-root.json
  • 11:15 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 11:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2013.codfw.wmnet with OS bullseye
  • 11:14 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 11:14 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:13 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:12 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2014.codfw.wmnet with OS bullseye
  • 11:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:08 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2017.codfw.wmnet with OS bullseye
  • 11:07 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:07 hnowlan@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:06 hnowlan@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:04 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 11:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2018.codfw.wmnet with OS bullseye
  • 11:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2011.codfw.wmnet with OS bullseye
  • 11:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65199 and previous config saved to /var/cache/conftool/dbconfig/20240619-110120-root.json
  • 10:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2012.codfw.wmnet with reason: host reimage
  • 10:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2013.codfw.wmnet with reason: host reimage
  • 10:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2014.codfw.wmnet with reason: host reimage
  • 10:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2017.codfw.wmnet with reason: host reimage
  • 10:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65198 and previous config saved to /var/cache/conftool/dbconfig/20240619-104614-root.json
  • 10:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2018.codfw.wmnet with reason: host reimage
  • 10:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2011.codfw.wmnet with reason: host reimage
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2017.codfw.wmnet with reason: host reimage
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2018.codfw.wmnet with reason: host reimage
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2013.codfw.wmnet with reason: host reimage
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2014.codfw.wmnet with reason: host reimage
  • 10:40 jmm@deploy1002: Finished scap: (no justification provided) (duration: 04m 03s)
  • 10:39 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2012.codfw.wmnet with reason: host reimage
  • 10:39 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2011.codfw.wmnet with reason: host reimage
  • 10:36 jmm@deploy1002: Started scap: (no justification provided)
  • 10:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65197 and previous config saved to /var/cache/conftool/dbconfig/20240619-103109-root.json
  • 10:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2018.codfw.wmnet with OS bullseye
  • 10:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2017.codfw.wmnet with OS bullseye
  • 10:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65196 and previous config saved to /var/cache/conftool/dbconfig/20240619-102504-marostegui.json
  • 10:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 10:24 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2014.codfw.wmnet with OS bullseye
  • 10:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 10:24 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2013.codfw.wmnet with OS bullseye
  • 10:24 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2012.codfw.wmnet with OS bullseye
  • 10:23 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2011.codfw.wmnet with OS bullseye
  • 10:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2409 to wikikube-worker2018
  • 10:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2018
  • 10:22 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2018
  • 10:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2409 to wikikube-worker2018 - cgoubert@cumin1002"
  • 10:21 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2409 to wikikube-worker2018 - cgoubert@cumin1002"
  • 10:18 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:18 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2409 to wikikube-worker2018
  • 10:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2408 to wikikube-worker2017
  • 10:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2017
  • 10:17 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2017
  • 10:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2408 to wikikube-worker2017 - cgoubert@cumin1002"
  • 10:16 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2408 to wikikube-worker2017 - cgoubert@cumin1002"
  • 10:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P65195 and previous config saved to /var/cache/conftool/dbconfig/20240619-101625-marostegui.json
  • 10:14 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:14 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2408 to wikikube-worker2017
  • 10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2405 to wikikube-worker2014
  • 10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2014
  • 10:12 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2014
  • 10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2405 to wikikube-worker2014 - cgoubert@cumin1002"
  • 10:09 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2405 to wikikube-worker2014 - cgoubert@cumin1002"
  • 10:06 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:06 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2405 to wikikube-worker2014
  • 10:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2404 to wikikube-worker2013
  • 10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2013
  • 10:05 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 10:05 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2013
  • 10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2404 to wikikube-worker2013 - cgoubert@cumin1002"
  • 10:03 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2404 to wikikube-worker2013 - cgoubert@cumin1002"
  • 10:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65194 and previous config saved to /var/cache/conftool/dbconfig/20240619-100118-marostegui.json
  • 10:00 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1
  • 10:00 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 09:59 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2404 to wikikube-worker2013
  • 09:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2403 to wikikube-worker2012
  • 09:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2012
  • 09:55 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 09:54 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 09:53 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2012
  • 09:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2403 to wikikube-worker2012 - cgoubert@cumin1002"
  • 09:51 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2403 to wikikube-worker2012 - cgoubert@cumin1002"
  • 09:51 ayounsi@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "netbox-dev2003 - ayounsi@cumin1002"
  • 09:47 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "netbox-dev2003 - ayounsi@cumin1002"
  • 09:47 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 09:47 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2403 to wikikube-worker2012
  • 09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2400 to wikikube-worker2011
  • 09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2011
  • 09:46 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2011
  • 09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2400 to wikikube-worker2011 - cgoubert@cumin1002"
  • 09:44 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 09:43 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2400 to wikikube-worker2011 - cgoubert@cumin1002"
  • 09:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 09:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2400 to wikikube-worker2011
  • 09:40 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-eqsin
  • 09:34 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:32 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 09:22 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 09:21 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 09:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netbox-dev2003.codfw.wmnet with reason: host reimage
  • 09:15 claime: Depooling mw2400.codfw.wmnet,mw2403.codfw.wmnet,mw2404.codfw.wmnet,mw2405.codfw.wmnet,mw2408.codfw.wmnet,mw2409.codfw.wmnet for reimage - T351074
  • 09:13 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on netbox-dev2003.codfw.wmnet with reason: host reimage
  • 09:11 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 09:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe2001.codfw.wmnet with OS bookworm
  • 09:01 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5025.*} and A:cp
  • 08:59 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe1001.eqiad.wmnet with OS bookworm
  • 08:58 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5025.*} and A:cp
  • 08:57 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5017.*} and A:cp
  • 08:54 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5017.*} and A:cp
  • 08:52 fabfur: upgrading eqsin cp hosts to haproxy 2.8.10 (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1047436) (T367756)
  • 08:51 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
  • 08:48 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
  • 08:40 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: host reimage
  • 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15830
  • 08:38 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: host reimage
  • 08:35 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 15830
  • 08:31 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2001.codfw.wmnet with OS bookworm
  • 08:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe2001.codfw.wmnet with OS bookworm
  • 08:30 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 08:24 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 08:23 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1001.eqiad.wmnet with OS bookworm
  • 08:23 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe1001.eqiad.wmnet with OS bookworm
  • 08:18 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.10 refs T361404
  • 08:12 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2001.codfw.wmnet with OS bookworm
  • 08:11 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1001.eqiad.wmnet with OS bookworm
  • 08:09 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 08:03 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 08:01 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host netbox-dev2003.codfw.wmnet with OS bookworm
  • 08:00 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
  • 07:59 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
  • 07:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox-dev2003.codfw.wmnet on all recursors
  • 07:59 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache netbox-dev2003.codfw.wmnet on all recursors
  • 07:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
  • 07:57 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
  • 07:54 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 07:54 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host netbox-dev2003.codfw.wmnet
  • 07:48 kartik@deploy1002: Finished scap: Backport for igwiki: Enable MinT for Wikipedia readers (T363464) (duration: 18m 55s)
  • 07:38 kartik@deploy1002: kartik: Continuing with sync
  • 07:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 07:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 07:33 kartik@deploy1002: kartik: Backport for igwiki: Enable MinT for Wikipedia readers (T363464) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:29 kartik@deploy1002: Started scap: Backport for igwiki: Enable MinT for Wikipedia readers (T363464)
  • 07:22 kartik@deploy1002: Finished scap: Backport for testwiki: Enable MinT for Wikipedia readers MVP on a Igbo Wikipedia (T367852) (duration: 20m 12s)
  • 07:20 marostegui: Deploy schema change on old s7 eqiad master db1160 dbmaint T364069
  • 07:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65192 and previous config saved to /var/cache/conftool/dbconfig/20240619-071516-root.json
  • 07:12 kartik@deploy1002: kartik: Continuing with sync
  • 07:07 kartik@deploy1002: kartik: Backport for testwiki: Enable MinT for Wikipedia readers MVP on a Igbo Wikipedia (T367852) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:02 kartik@deploy1002: Started scap: Backport for testwiki: Enable MinT for Wikipedia readers MVP on a Igbo Wikipedia (T367852)
  • 07:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65191 and previous config saved to /var/cache/conftool/dbconfig/20240619-070010-root.json
  • 06:52 jynus: stop db1240:s1, wipe and reimport db1240:s3 T367162
  • 06:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65190 and previous config saved to /var/cache/conftool/dbconfig/20240619-064505-root.json
  • 06:40 XioNoX: merge Puppet "Prepare for netbox-dev" CR1047081
  • 06:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65189 and previous config saved to /var/cache/conftool/dbconfig/20240619-063337-root.json
  • 06:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65188 and previous config saved to /var/cache/conftool/dbconfig/20240619-062959-root.json
  • 06:21 _joe_: upgrading conftool everywhere T367919
  • 06:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65187 and previous config saved to /var/cache/conftool/dbconfig/20240619-061831-root.json
  • 06:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P65186 and previous config saved to /var/cache/conftool/dbconfig/20240619-061721-root.json
  • 06:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65185 and previous config saved to /var/cache/conftool/dbconfig/20240619-061454-root.json
  • 06:08 _joe_: uploaded newer python-conftool packages T367919
  • 06:05 _joe_: deleting manually thirdparty/conda repositories from reprepro T364550
  • 06:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65184 and previous config saved to /var/cache/conftool/dbconfig/20240619-060326-root.json
  • 06:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: After reimage', diff saved to https://phabricator.wikimedia.org/P65183 and previous config saved to /var/cache/conftool/dbconfig/20240619-060216-root.json
  • 05:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65182 and previous config saved to /var/cache/conftool/dbconfig/20240619-055948-root.json
  • 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65181 and previous config saved to /var/cache/conftool/dbconfig/20240619-054820-root.json
  • 05:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P65180 and previous config saved to /var/cache/conftool/dbconfig/20240619-054710-root.json
  • 05:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65179 and previous config saved to /var/cache/conftool/dbconfig/20240619-054443-root.json
  • 05:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65178 and previous config saved to /var/cache/conftool/dbconfig/20240619-054259-root.json
  • 05:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65177 and previous config saved to /var/cache/conftool/dbconfig/20240619-054214-marostegui.json
  • 05:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65176 and previous config saved to /var/cache/conftool/dbconfig/20240619-053315-root.json
  • 05:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: After reimage', diff saved to https://phabricator.wikimedia.org/P65175 and previous config saved to /var/cache/conftool/dbconfig/20240619-053205-root.json
  • 05:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65174 and previous config saved to /var/cache/conftool/dbconfig/20240619-052754-root.json
  • 05:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 05:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 05:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65173 and previous config saved to /var/cache/conftool/dbconfig/20240619-051809-root.json
  • 05:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P65172 and previous config saved to /var/cache/conftool/dbconfig/20240619-051659-root.json
  • 05:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65171 and previous config saved to /var/cache/conftool/dbconfig/20240619-051248-root.json
  • 05:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P65170 and previous config saved to /var/cache/conftool/dbconfig/20240619-051233-root.json
  • 05:10 marostegui@cumin1002: dbctl commit (dc=all): 'repool db1169', diff saved to https://phabricator.wikimedia.org/P65169 and previous config saved to /var/cache/conftool/dbconfig/20240619-051014-marostegui.json
  • 05:09 marostegui@cumin1002: dbctl commit (dc=all): 'test depool db1169', diff saved to https://phabricator.wikimedia.org/P65168 and previous config saved to /var/cache/conftool/dbconfig/20240619-050951-marostegui.json

2024-06-18

  • 23:22 jforrester@deploy1002: Finished scap: Backport for Use isEnumType in selector and isCustomEnum for creating literals (T367159), findAddedContentNeedingReference was removed accidentally (T367920) (duration: 17m 16s)
  • 23:12 jforrester@deploy1002: jforrester, kemayo: Continuing with sync
  • 23:10 jforrester@deploy1002: jforrester, kemayo: Backport for Use isEnumType in selector and isCustomEnum for creating literals (T367159), findAddedContentNeedingReference was removed accidentally (T367920) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:05 jforrester@deploy1002: Started scap: Backport for Use isEnumType in selector and isCustomEnum for creating literals (T367159), findAddedContentNeedingReference was removed accidentally (T367920)
  • 22:49 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 22:49 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:31 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 22:20 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:19 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 22:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:07 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:56 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:55 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:54 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:44 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:26 jdrewniak@deploy1002: Finished scap: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844) (duration: 16m 33s)
  • 21:16 jdrewniak@deploy1002: jdlrobson, jdrewniak: Continuing with sync
  • 21:14 jdrewniak@deploy1002: jdlrobson, jdrewniak: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:09 jdrewniak@deploy1002: Started scap: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844)
  • 21:07 jdrewniak@deploy1002: Sync cancelled.
  • 21:07 jdrewniak@deploy1002: jdrewniak, jdlrobson: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:03 jdrewniak@deploy1002: Started scap: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844)
  • 20:59 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 20:50 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 20:50 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 20:49 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 20:49 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 20:47 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 20:47 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 20:33 urbanecm@deploy1002: Finished scap: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki (duration: 18m 59s)
  • 20:24 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 20:22 urbanecm@deploy1002: kemayo, urbanecm, superzerocool: Continuing with sync
  • 20:18 urbanecm@deploy1002: kemayo, urbanecm, superzerocool: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:14 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 20:14 urbanecm@deploy1002: Started scap: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki
  • 20:10 urbanecm@deploy1002: Sync cancelled.
  • 20:10 urbanecm@deploy1002: urbanecm, superzerocool, kemayo: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:09 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 20:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 20:06 urbanecm@deploy1002: Started scap: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki
  • 19:59 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:42 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 19:42 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-staging2003.codfw.wmnet with OS bookworm
  • 19:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:30 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 19:29 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:26 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 19:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:17 mutante: lists1001 - systemctl reset-failed - clean up systemd state due to units not found anymore after migration - disable puppet and then deploy gerrit:1047160 on lists to fix invalid unit name - T331706
  • 18:49 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 18:44 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in esams for T365123
  • 18:39 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in eqsin for T365123
  • 18:33 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in drmrs for T365123
  • 18:29 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 18:27 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in magru for T365123
  • 18:19 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
  • 18:17 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in ulsfo for T365123
  • 18:16 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-staging2003.codfw.wmnet with OS bookworm
  • 18:16 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
  • 17:37 swfrench-wmf: updated conftool to 3.0.0 on bullseye hosts in eqiad for T365123
  • 17:35 swfrench-wmf: updated conftool to 3.0.0 on bookworm hosts in eqiad for T365123
  • 17:34 swfrench-wmf: updated conftool to 3.0.0 on buster hosts in eqiad for T365123
  • 17:21 cdanis: resetting Wiki response time metric on wikimedia.statuspage.io following complete switch to k8s - T362323 T367894
  • 17:16 swfrench-wmf: updated conftool to 3.0.0 on remaining bullseye hosts in codfw for T365123
  • 17:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
  • 17:14 swfrench-wmf: updated conftool to 3.0.0 on remaining bookworm hosts in codfw for T365123
  • 17:12 swfrench-wmf: updated conftool to 3.0.0 on remaining buster hosts in codfw for T365123
  • 16:42 swfrench-wmf: conftool on puppetmaster2001 updated to 3.0.0 for T365123
  • 16:39 swfrench-wmf: validated dbctl 3.0.0 on cumin2002 (noop edit to note: on parsercache spare pc2014) for T365123
  • 16:39 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 16:34 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s1
  • 16:31 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-worker1093.eqiad.wmnet with reason: T367825 hw maint
  • 16:31 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-worker1093.eqiad.wmnet with reason: T367825 hw maint
  • 16:29 swfrench-wmf: conftool on cumin2002 updated to 3.0.0 for T365123
  • 16:23 claime: resetting Wiki response time metric on wikimedia.statuspage.io following complete switch to k8s - T362323
  • 16:23 swfrench-wmf: depooled / pooled mw2441.codfw.wmnet to smoke-test python3-conftool for T365123
  • 16:22 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 16:20 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 100%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65167 and previous config saved to /var/cache/conftool/dbconfig/20240618-162053-arnaudb.json
  • 16:19 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 16:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 16:05 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 75%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65166 and previous config saved to /var/cache/conftool/dbconfig/20240618-160548-arnaudb.json
  • 16:02 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 15:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-staging2003.codfw.wmnet with OS bookworm
  • 15:53 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s2
  • 15:53 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s7
  • 15:52 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 15:51 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 15:51 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:50 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 50%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65165 and previous config saved to /var/cache/conftool/dbconfig/20240618-155042-arnaudb.json
  • 15:50 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:50 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 15:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T364069)', diff saved to https://phabricator.wikimedia.org/P65164 and previous config saved to /var/cache/conftool/dbconfig/20240618-155000-marostegui.json
  • 15:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 15:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 15:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65163 and previous config saved to /var/cache/conftool/dbconfig/20240618-154938-marostegui.json
  • 15:49 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-staging2003
  • 15:49 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ml-staging2003
  • 15:48 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:47 swfrench-wmf: included conftool 3.0.0 into buster/bullseye/bookworm-wikimedia on apt.w.o for T365123
  • 15:47 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:46 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:45 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:44 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5032.*} and A:cp
  • 15:43 hnowlan@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:42 hnowlan@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:42 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5032.*} and A:cp
  • 15:41 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5030.*} and A:cp
  • 15:39 fabfur: upgrade haproxy to v2.8.10 on cp5030,cp5032 (T367756)
  • 15:39 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5030.*} and A:cp
  • 15:38 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp3066.*} and A:cp
  • 15:36 fabfur: upgrade haproxy to v2.8.10 on cp3066 (T367756)
  • 15:35 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp3066.*} and A:cp
  • 15:35 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 25%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65162 and previous config saved to /var/cache/conftool/dbconfig/20240618-153537-arnaudb.json
  • 15:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P65161 and previous config saved to /var/cache/conftool/dbconfig/20240618-153430-marostegui.json
  • 15:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1003.eqiad.wmnet with OS bookworm
  • 15:30 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 15:30 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 15:30 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 15:23 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:20 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 10%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65159 and previous config saved to /var/cache/conftool/dbconfig/20240618-152031-arnaudb.json
  • 15:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P65158 and previous config saved to /var/cache/conftool/dbconfig/20240618-151923-marostegui.json
  • 15:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
  • 15:07 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
  • 15:07 brennen@deploy1002: Finished deploy [phabricator/deployment@ef680d8]: revert phab1004 after breakage for T367775 (duration: 00m 15s)
  • 15:07 brennen@deploy1002: Started deploy [phabricator/deployment@ef680d8]: revert phab1004 after breakage for T367775
  • 15:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe1002.eqiad.wmnet with OS bookworm
  • 15:06 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:06 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:06 brennen@deploy1002: Finished deploy [phabricator/deployment@ebe3a94]: deploy phab1004 for T367775 (duration: 00m 47s)
  • 15:05 brennen@deploy1002: Started deploy [phabricator/deployment@ebe3a94]: deploy phab1004 for T367775
  • 15:05 brennen@deploy1002: Finished deploy [phabricator/deployment@ebe3a94]: deploy phab2002 for T367775 (duration: 00m 36s)
  • 15:04 brennen@deploy1002: Started deploy [phabricator/deployment@ebe3a94]: deploy phab2002 for T367775
  • 15:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65157 and previous config saved to /var/cache/conftool/dbconfig/20240618-150416-marostegui.json
  • 15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
  • 15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
  • 15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
  • 15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
  • 15:00 mforns@deploy1002: Finished deploy [airflow-dags/analytics@4f7d29a]: (no justification provided) (duration: 00m 28s)
  • 15:00 topranks: rebooting lsw1-f7-eqiad to upgrade JunOS on switch T365984
  • 15:00 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host htmldumper1001.eqiad.wmnet
  • 15:00 mforns@deploy1002: Started deploy [airflow-dags/analytics@4f7d29a]: (no justification provided)
  • 14:57 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:35:00 on an-worker[1172-1174].eqiad.wmnet,es1040.eqiad.wmnet,ms-be1081.eqiad.wmnet with reason: JunOS upgrade lsw1-f7-eqiad
  • 14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:35:00 on an-worker[1172-1174].eqiad.wmnet,es1040.eqiad.wmnet,ms-be1081.eqiad.wmnet with reason: JunOS upgrade lsw1-f7-eqiad
  • 14:56 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-f7-eqiad,lsw1-f7-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f7-eqiad
  • 14:56 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-f7-eqiad,lsw1-f7-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f7-eqiad
  • 14:53 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host htmldumper1001.eqiad.wmnet
  • 14:49 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bookworm
  • 14:47 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:40:00 on lsw1-f7-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f7-eqiad
  • 14:47 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:40:00 on lsw1-f7-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f7-eqiad
  • 14:44 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1001.eqiad.wmnet with OS bookworm
  • 14:44 jynus: reenable puppet on backup2002
  • 14:40 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ml-serve2001.codfw.wmnet with reason: Hardware maintenance for memory errors
  • 14:40 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ml-serve2001.codfw.wmnet with reason: Hardware maintenance for memory errors
  • 14:39 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 depool - T365984', diff saved to https://phabricator.wikimedia.org/P65156 and previous config saved to /var/cache/conftool/dbconfig/20240618-143951-arnaudb.json
  • 14:39 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4046.ulsfo.wmnet
  • 14:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1040.eqiad.wmnet with reason: T365984
  • 14:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1040.eqiad.wmnet with reason: T365984
  • 14:36 sukhe: enabling puppet and running puppet agent on cp4037
  • 14:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
  • 14:24 claime: trafficserver: move 100% of traffic to mw-on-k8s - T362323
  • 14:23 btullis@cumin1002: START - Cookbook sre.presto.reboot-workers for Presto an-presto cluster: Reboot Presto nodes
  • 14:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1001.eqiad.wmnet with reason: host reimage
  • 14:21 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 14:21 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 14:21 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:21 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:20 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 14:20 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 14:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 14:20 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 14:20 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 14:20 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 14:19 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1001.eqiad.wmnet with reason: host reimage
  • 14:17 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 14:17 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 14:17 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 14:09 swfrench-wmf: included conftool 3.0.0 into buster-wikimedia on apt.w.o for T365123
  • 14:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: host reimage
  • 14:03 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: host reimage
  • 14:02 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be1001.eqiad.wmnet with OS bookworm
  • 13:57 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 13:57 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 13:57 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 13:57 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 13:54 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 13:54 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 13:51 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 13:51 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 13:50 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 13:49 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1002.eqiad.wmnet with OS bookworm
  • 13:49 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe1002.eqiad.wmnet with OS bookworm
  • 13:49 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 13:47 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 13:47 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 13:47 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 13:45 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 13:40 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo
  • 13:39 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo
  • 13:37 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1002.eqiad.wmnet with OS bookworm
  • 13:35 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1002.eqiad.wmnet
  • 13:34 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes azwiktionary --fix # T367264; 7 pages fixed, 10 links fixed
  • 13:33 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Add VL namespace alias to Azerbaijani Wiktionary (T367264) (duration: 16m 07s)
  • 13:29 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-db1002.eqiad.wmnet
  • 13:28 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 13:28 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • 13:23 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamrimmer: Continuing with sync
  • 13:22 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-mariadb1002.eqiad.wmnet
  • 13:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamrimmer: Backport for Add VL namespace alias to Azerbaijani Wiktionary (T367264) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:19 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db1208.eqiad.wmnet
  • 13:19 btullis@cumin1002: START - Cookbook sre.hosts.remove-downtime for db1208.eqiad.wmnet
  • 13:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Add VL namespace alias to Azerbaijani Wiktionary (T367264)
  • 13:16 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 13:16 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 13:16 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-mariadb1002.eqiad.wmnet
  • 13:10 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: sync
  • 13:10 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1004.eqiad.wmnet
  • 13:09 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: sync
  • 13:09 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: sync
  • 13:08 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: sync
  • 13:07 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: sync
  • 13:07 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: sync
  • 13:07 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: sync
  • 13:06 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: sync
  • 13:06 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: sync
  • 13:04 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: sync
  • 13:04 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-coord1004.eqiad.wmnet
  • 12:56 vgutierrez: rolling upgrade on A:cp-eqsin to fifo-log-demux 0.7.5 - T364383
  • 12:53 vgutierrez: disable puppet on A:cp-eqsin before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1047070 - T364383
  • 12:52 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
  • 12:51 marostegui: Deploy schema change on old s4 eqiad master db1160 dbmaint T364069
  • 12:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo
  • 12:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1160', diff saved to https://phabricator.wikimedia.org/P65155 and previous config saved to /var/cache/conftool/dbconfig/20240618-124945-root.json
  • 12:48 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 12:47 fabfur: upgrade haproxy to v2.8.10 on all ulsfo cp hosts (T367756)
  • 12:47 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 12:43 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 12:42 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 12:42 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 12:42 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 12:40 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 12:36 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 12:35 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2003.codfw.wmnet
  • 12:29 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2003.codfw.wmnet
  • 12:22 moritzm: rebalance ganeti eqiad/D following reboots
  • 12:15 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
  • 12:15 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
  • 12:06 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:06 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add IPv6 records for mw, parse and wikikube-worker hosts - cmooney@cumin1002"
  • 12:05 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 12:05 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add IPv6 records for mw, parse and wikikube-worker hosts - cmooney@cumin1002"
  • 12:04 topranks: adding Netbox-generated IPv6 DNS records for wikikube-worker, mw and parse hosts
  • 12:04 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 12:00 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 11:59 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:59 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 11:59 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 11:58 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:58 effie: Slowly pointing mediawiki in eqiad to mw-mcrouter daemonset - T346690
  • 11:54 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:54 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 11:53 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 11:53 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:50 eoghan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) lists.wikimedia.org on all recursors
  • 11:50 eoghan@cumin1002: START - Cookbook sre.dns.wipe-cache lists.wikimedia.org on all recursors
  • 11:48 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1208.eqiad.wmnet with OS bookworm
  • 11:42 marostegui: Delete ipblocks table on clouddb2002-dev (labtestwiki) T367632
  • 11:40 marostegui: Rename ipblocks table on db1169 (enwiki) T367632
  • 11:29 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 11:29 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 11:28 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1002.eqiad.wmnet
  • 11:26 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1208.eqiad.wmnet with reason: host reimage
  • 11:24 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1208.eqiad.wmnet with reason: host reimage
  • 11:22 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-master1002.eqiad.wmnet
  • 11:18 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-druid1001.eqiad.wmnet
  • 11:14 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 11:14 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 11:13 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-druid1001.eqiad.wmnet
  • 11:13 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 11:13 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 11:12 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-ui1001.eqiad.wmnet
  • 11:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65152 and previous config saved to /var/cache/conftool/dbconfig/20240618-111001-marostegui.json
  • 11:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 11:09 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host db1208.eqiad.wmnet with OS bookworm
  • 11:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 11:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65151 and previous config saved to /var/cache/conftool/dbconfig/20240618-110939-marostegui.json
  • 11:08 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-ui1001.eqiad.wmnet
  • 11:08 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 11:08 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 11:07 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 11:05 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1208.eqiad.wmnet with reason: Upgrading to bookworm
  • 11:05 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
  • 11:05 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1208.eqiad.wmnet with reason: Upgrading to bookworm
  • 11:01 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
  • 10:58 fabfur: cp3066 repooled and puppet enabled (T367756)
  • 10:58 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp3066.esams.wmnet
  • 10:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P65150 and previous config saved to /var/cache/conftool/dbconfig/20240618-105432-marostegui.json
  • 10:48 marostegui: dbmaint codfw s2 deploy schema change T364069
  • 10:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P65149 and previous config saved to /var/cache/conftool/dbconfig/20240618-103925-marostegui.json
  • 10:33 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 10:33 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 10:33 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 10:33 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 10:33 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 10:33 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 10:33 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 10:32 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 10:32 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 10:32 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 10:32 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 10:32 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 10:32 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 10:32 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 10:32 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 10:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 10:31 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 10:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 10:30 moritzm: upload openjdk-21 21.0.3+9-2~deb12u2 for bookworm/wikimedia (secondary rebuild on build2001 following the initial bootstrap build) https://phabricator.wikimedia.org/T367487
  • 10:30 cgoubert@deploy1002: Finished scap: Deploy statsd exporter - T365265 (duration: 03m 39s)
  • 10:27 cgoubert@deploy1002: Started scap: Deploy statsd exporter - T365265
  • 10:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65148 and previous config saved to /var/cache/conftool/dbconfig/20240618-102418-marostegui.json
  • 10:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65147 and previous config saved to /var/cache/conftool/dbconfig/20240618-102130-root.json
  • 10:14 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
  • 10:14 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
  • 10:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65146 and previous config saved to /var/cache/conftool/dbconfig/20240618-100624-root.json
  • 10:05 fabfur: cp3066 currently depooled and puppet disabled for T367756
  • 10:04 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp3066.esams.wmnet
  • 09:53 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1019.eqiad.wmnet|wikikube-worker1020.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
  • 09:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65145 and previous config saved to /var/cache/conftool/dbconfig/20240618-095119-root.json
  • 09:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
  • 09:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2002.wikimedia.org
  • 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2002.wikimedia.org
  • 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65144 and previous config saved to /var/cache/conftool/dbconfig/20240618-093614-root.json
  • 09:27 moritzm: arm keyholder on acmechief2002
  • 09:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65143 and previous config saved to /var/cache/conftool/dbconfig/20240618-092108-root.json
  • 09:13 moritzm: rebooting ganeti2029
  • 09:10 marostegui: dbmaint eqiad s4 deploy schema change T367261
  • 09:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65142 and previous config saved to /var/cache/conftool/dbconfig/20240618-090603-root.json
  • 09:05 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 08:53 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.10 refs T361404
  • 08:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 depool to troubleshoot hardware issues', diff saved to https://phabricator.wikimedia.org/P65141 and previous config saved to /var/cache/conftool/dbconfig/20240618-085254-arnaudb.json
  • 08:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: hardware issues
  • 08:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: hardware issues
  • 08:51 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: repl issues
  • 08:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: repl issues
  • 08:50 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65140 and previous config saved to /var/cache/conftool/dbconfig/20240618-085057-root.json
  • 08:45 hashar@deploy1002: Finished deploy [integration/docroot@7a92240]: doc: Add mwseaql Rust crate (duration: 00m 07s)
  • 08:45 hashar@deploy1002: Started deploy [integration/docroot@7a92240]: doc: Add mwseaql Rust crate
  • 08:43 fabfur: cp4037 currently depooled and puppet disabled for T367756
  • 08:41 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 08:40 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
  • 08:34 marostegui: dbmaint eqiad s6 deploy schema change on eqiad master T364069
  • 08:29 XioNoX: deploy pfw policy update 1718644831 - T367796
  • 07:56 moritzm: uploaded python-irc 8.5.3+dfsg-4+wmf1 to apt.wikimedia.org T331702
  • 07:40 marostegui: dbmaint codfw s7 deploy schema change on codfw master T364069
  • 07:33 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 07:31 kart_: Updated cxserver to 2024-06-13-045621-production (T364122, T138401)
  • 07:30 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 07:29 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 07:28 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 07:28 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 07:26 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 07:26 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 07:20 kartik@deploy1002: Finished scap: Backport for Content Translation: Adjust the Machine translation limit for Telugu WP from 70% to 75% (T367838) (duration: 16m 36s)
  • 07:15 marostegui: dbmaint eqiad s5 deploy schema change on primary master T364069
  • 07:12 marostegui: dbmaint codfw s4 deploy schema change T367261
  • 07:12 marostegui: dbmaint codfw s4 deploy schema change
  • 07:11 kartik@deploy1002: kartik: Continuing with sync
  • 07:09 kartik@deploy1002: kartik: Backport for Content Translation: Adjust the Machine translation limit for Telugu WP from 70% to 75% (T367838) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:04 kartik@deploy1002: Started scap: Backport for Content Translation: Adjust the Machine translation limit for Telugu WP from 70% to 75% (T367838)
  • 06:52 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1240.eqiad.wmnet with reason: data reload
  • 06:52 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1240.eqiad.wmnet with reason: data reload
  • 06:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65139 and previous config saved to /var/cache/conftool/dbconfig/20240618-060100-marostegui.json
  • 06:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 06:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 06:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65138 and previous config saved to /var/cache/conftool/dbconfig/20240618-060038-marostegui.json
  • 05:55 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2102.codfw.wmnet
  • 05:55 jynus@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:55 jynus@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2102.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin2002"
  • 05:53 jynus@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2102.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin2002"
  • 05:50 jynus@cumin2002: START - Cookbook sre.dns.netbox
  • 05:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P65137 and previous config saved to /var/cache/conftool/dbconfig/20240618-054531-marostegui.json
  • 05:44 jynus@cumin2002: START - Cookbook sre.hosts.decommission for hosts db2102.codfw.wmnet
  • 05:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P65136 and previous config saved to /var/cache/conftool/dbconfig/20240618-053024-marostegui.json
  • 05:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65135 and previous config saved to /var/cache/conftool/dbconfig/20240618-051517-marostegui.json
  • 05:00 marostegui: dbmaint codfw s5 deploy schema change on db2213 T364299
  • 04:57 marostegui: dbmaint eqiad s2 deploy schema change on db2207 T364299
  • 04:54 marostegui: dbmaint eqiad s4 deploy schema change on db1160 T364299
  • 04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Long schema change
  • 04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Long schema change
  • 04:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1160 T367378', diff saved to https://phabricator.wikimedia.org/P65134 and previous config saved to /var/cache/conftool/dbconfig/20240618-044908-root.json
  • 04:48 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1238 to s4 primary and set section read-write T367378', diff saved to https://phabricator.wikimedia.org/P65133 and previous config saved to /var/cache/conftool/dbconfig/20240618-044806-marostegui.json
  • 04:47 marostegui@cumin1002: dbctl commit (dc=all): 'Set s4 eqiad as read-only for maintenance - T367378', diff saved to https://phabricator.wikimedia.org/P65132 and previous config saved to /var/cache/conftool/dbconfig/20240618-044747-marostegui.json
  • 04:47 marostegui: Starting s4 eqiad failover from db1160 to db1238 - T367378
  • 04:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s4 T367378
  • 04:20 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1238 with weight 0 T367378', diff saved to https://phabricator.wikimedia.org/P65131 and previous config saved to /var/cache/conftool/dbconfig/20240618-042054-marostegui.json
  • 04:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 33 hosts with reason: Primary switchover s4 T367378
  • 04:02 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.7 (duration: 02m 50s)
  • 04:01 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.10 refs T361404 (duration: 58m 57s)
  • 03:03 mwpresync@deploy1002: Started scap: testwikis wikis to 1.43.0-wmf.10 refs T361404
  • 01:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65130 and previous config saved to /var/cache/conftool/dbconfig/20240618-013639-marostegui.json
  • 01:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 01:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 01:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65129 and previous config saved to /var/cache/conftool/dbconfig/20240618-013616-marostegui.json
  • 01:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P65128 and previous config saved to /var/cache/conftool/dbconfig/20240618-012109-marostegui.json
  • 01:10 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4044.ulsfo.wmnet
  • 01:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P65127 and previous config saved to /var/cache/conftool/dbconfig/20240618-010601-marostegui.json
  • 00:57 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS bullseye
  • 00:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65126 and previous config saved to /var/cache/conftool/dbconfig/20240618-005054-marostegui.json
  • 00:34 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
  • 00:31 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
  • 00:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P65125 and previous config saved to /var/cache/conftool/dbconfig/20240618-002823-ladsgroup.json
  • 00:18 zabe@deploy1002: Finished scap: Update interwiki cache (duration: 14m 03s)
  • 00:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65124 and previous config saved to /var/cache/conftool/dbconfig/20240618-001316-ladsgroup.json
  • 00:10 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS bullseye
  • 00:10 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4044.ulsfo.wmnet with OS bullseye
  • 00:05 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=u4cwiki --cluster=all 2>&1 | tee /tmp/u4c.UpdateSearchIndexConfig.log # T366649
  • 00:04 zabe@deploy1002: Started scap: Update interwiki cache
  • 00:02 zabe@deploy1002: Finished scap: T366649 (duration: 15m 16s)
  • 00:00 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS bullseye

2024-06-17

  • 23:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65123 and previous config saved to /var/cache/conftool/dbconfig/20240617-235809-ladsgroup.json
  • 23:52 zabe@deploy1002: zabe: Continuing with sync
  • 23:52 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4044.ulsfo.wmnet
  • 23:51 zabe@deploy1002: zabe: T366649 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:48 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=arbcom_itwiki --cluster=all 2>&1 | tee /tmp/arbcom_it.UpdateSearchIndexConfig.log # T363825
  • 23:47 zabe@deploy1002: Started scap: T366649
  • 23:46 zabe: Create an 'Universal Code of Conduct Coordinating Committee (U4C)' private wiki # T366649
  • 23:44 zabe@deploy1002: Finished scap: T363825 (duration: 15m 00s)
  • 23:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P65122 and previous config saved to /var/cache/conftool/dbconfig/20240617-234302-ladsgroup.json
  • 23:34 zabe@deploy1002: zabe: Continuing with sync
  • 23:34 zabe@deploy1002: zabe: T363825 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:29 zabe@deploy1002: Started scap: T363825
  • 23:29 zabe: create private wiki for itwiki arbcom # T363825
  • 23:23 cdobbins@cumin1002: conftool action : set/pooled=yes; selector: name=cp4043.ulsfo.wmnet
  • 23:14 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4043.ulsfo.wmnet with OS bullseye
  • 22:52 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
  • 22:49 cdobbins@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
  • 22:42 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1041.eqiad.wmnet with OS bookworm
  • 22:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P65121 and previous config saved to /var/cache/conftool/dbconfig/20240617-223010-ladsgroup.json
  • 22:28 cdobbins@cumin1002: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS bullseye
  • 22:26 cdobbins@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4043.ulsfo.wmnet with OS bullseye
  • 22:25 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev200[2-3].codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 22:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
  • 22:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P65120 and previous config saved to /var/cache/conftool/dbconfig/20240617-221503-ladsgroup.json
  • 22:12 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
  • 22:11 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev200[2-3].codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 22:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2001.codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 21:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P65119 and previous config saved to /var/cache/conftool/dbconfig/20240617-215956-ladsgroup.json
  • 21:59 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2001.codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 21:55 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1041.eqiad.wmnet with OS bookworm
  • 21:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P65118 and previous config saved to /var/cache/conftool/dbconfig/20240617-214449-ladsgroup.json
  • 21:41 cdobbins@cumin1002: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS bullseye
  • 21:20 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1040.eqiad.wmnet with OS bookworm
  • 21:09 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=cp4043.ulsfo.wmnet
  • 21:09 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=4043.ulsfo.wmnet
  • 21:05 jforrester@deploy1002: Finished scap: Backport for Fix styles for new heading HTML (T367468) (duration: 18m 57s)
  • 20:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65117 and previous config saved to /var/cache/conftool/dbconfig/20240617-205955-marostegui.json
  • 20:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 20:59 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 20:55 jforrester@deploy1002: jforrester: Continuing with sync
  • 20:52 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
  • 20:50 jforrester@deploy1002: jforrester: Backport for Fix styles for new heading HTML (T367468) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:50 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
  • 20:46 jforrester@deploy1002: Started scap: Backport for Fix styles for new heading HTML (T367468)
  • 20:34 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1040.eqiad.wmnet with OS bookworm
  • 20:33 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1039.eqiad.wmnet with OS bookworm
  • 20:08 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4042.ulsfo.wmnet
  • 20:07 jforrester@deploy1002: jforrester: Continuing with sync
  • 20:07 jforrester@deploy1002: jforrester: Backport for [wikifunctionswiki] Remove right to promote/demote sysops and bureaucrats from staff (T365627), Add a note that you cannot change wgCategoryCollation easily (T362494 T366809) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:06 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
  • 20:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS bullseye
  • 20:02 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
  • 20:02 jforrester@deploy1002: Started scap: Backport for [wikifunctionswiki] Remove right to promote/demote sysops and bureaucrats from staff (T365627), Add a note that you cannot change wgCategoryCollation easily (T362494 T366809)
  • 19:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P65116 and previous config saved to /var/cache/conftool/dbconfig/20240617-195520-ladsgroup.json
  • 19:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 19:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 19:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1039.eqiad.wmnet with OS bookworm
  • 19:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
  • 19:38 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
  • 19:22 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1038.eqiad.wmnet with OS bookworm
  • 19:15 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS bullseye
  • 19:15 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4042.ulsfo.wmnet with OS bullseye
  • 18:57 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
  • 18:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS bullseye
  • 18:54 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
  • 18:42 ladsgroup@deploy1002: Finished scap: Backport for Change static footer icons to the new one (T256190), Remove footer override (duration: 17m 12s)
  • 18:36 ejegg: fundraising civicrm upgraded from 66acce1f to a25a359b
  • 18:36 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1038.eqiad.wmnet with OS bookworm
  • 18:33 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1037.eqiad.wmnet with OS bookworm
  • 18:30 ladsgroup@deploy1002: ladsgroup, jforrester: Continuing with sync
  • 18:29 ladsgroup@deploy1002: ladsgroup, jforrester: Backport for Change static footer icons to the new one (T256190), Remove footer override synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:24 ladsgroup@deploy1002: Started scap: Backport for Change static footer icons to the new one (T256190), Remove footer override
  • 18:19 ladsgroup@deploy1002: Started scap: Backport for Change static footer icons to the new one (T256190)
  • 18:17 ejegg: standalone SmashPig upgraded from 1d1b770c to c8993ec6
  • 18:12 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: sync
  • 18:12 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: sync
  • 18:11 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 18:10 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 18:09 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 18:09 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 18:08 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 18:07 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 18:07 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
  • 18:06 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
  • 18:05 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
  • 18:05 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 18:04 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 18:03 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 18:02 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
  • 18:02 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 18:01 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 18:00 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 17:58 ejegg: fundraising civicrm upgraded from aa127608 to 66acce1f
  • 17:53 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 17:53 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 17:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1037.eqiad.wmnet with OS bookworm
  • 17:37 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
  • 17:36 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
  • 17:35 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
  • 17:34 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
  • 17:34 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
  • 17:33 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
  • 17:32 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
  • 17:31 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
  • 17:30 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 17:29 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 17:18 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4042.ulsfo.wmnet
  • 17:17 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 17:16 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 17:07 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 17:06 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 17:05 claime: Pooling and uncordoning wikikube-worker1019.eqiad.wmnet,wikikube-worker1020.eqiad.wmnet,wikikube-worker1021.eqiad.wmnet - T351074
  • 17:02 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet
  • 16:59 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: sync
  • 16:59 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: sync
  • 16:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1021.eqiad.wmnet with OS bullseye
  • 16:58 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 16:58 claime: homer 'cr*eqiad*' commit 'T351074'
  • 16:58 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 16:43 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: sync
  • 16:43 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: sync
  • 16:42 mnz@deploy1002: Finished deploy [airflow-dags/research@5e1cd80]: (no justification provided) (duration: 00m 32s)
  • 16:42 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 16:42 mnz@deploy1002: Started deploy [airflow-dags/research@5e1cd80]: (no justification provided)
  • 16:42 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 16:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1021.eqiad.wmnet with reason: host reimage
  • 16:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1019.eqiad.wmnet with reason: host reimage
  • 16:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1020.eqiad.wmnet with reason: host reimage
  • 16:32 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1021.eqiad.wmnet with reason: host reimage
  • 16:31 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1019.eqiad.wmnet with reason: host reimage
  • 16:30 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1020.eqiad.wmnet with reason: host reimage
  • 16:30 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 16:29 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 16:29 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 16:29 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 16:29 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 16:28 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 16:27 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 16:27 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 16:26 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 16:25 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 16:25 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt-wdqs1003.eqiad.wmnet
  • 16:25 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:25 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 16:24 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 16:21 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 16:16 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1003.eqiad.wmnet
  • 16:16 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1002.eqiad.wmnet
  • 16:16 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:14 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 16:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1019.eqiad.wmnet with OS bullseye
  • 16:09 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 14m 13s)
  • 16:09 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1028.eqiad.wmnet: Apply update to Java 11 - eevans@cumin1002
  • 16:09 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1019.eqiad.wmnet with OS bullseye
  • 16:08 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1002.eqiad.wmnet
  • 16:05 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1001.eqiad.wmnet
  • 16:05 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:03 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 16:00 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1028.eqiad.wmnet: Apply update to Java 11 - eevans@cumin1002
  • 15:59 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1001.eqiad.wmnet
  • 15:57 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1001.eqiad.wmnet
  • 15:57 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:56 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1002.eqiad.wmnet
  • 15:56 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:56 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 15:55 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 15:55 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 15:52 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 14m 41s)
  • 15:50 topranks: rebooting cr2-eqdfw to upgrade JunOS T364092
  • 15:49 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 15:48 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 15:48 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1021.eqiad.wmnet with OS bullseye
  • 15:46 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr3-knams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 15:46 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr3-knams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1020.eqiad.wmnet with OS bullseye
  • 15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1019.eqiad.wmnet with OS bullseye
  • 15:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1019.eqiad.wmnet wikikube-worker1020.eqiad.wmnet wikikube-worker1021.eqiad.wmnet on all recursors
  • 15:46 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1002.eqiad.wmnet
  • 15:46 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1001.eqiad.wmnet
  • 15:46 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1019.eqiad.wmnet wikikube-worker1020.eqiad.wmnet wikikube-worker1021.eqiad.wmnet on all recursors
  • 15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1489 to wikikube-worker1021
  • 15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1021
  • 15:44 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1021
  • 15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1489 to wikikube-worker1021 - cgoubert@cumin1002"
  • 15:43 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1489 to wikikube-worker1021 - cgoubert@cumin1002"
  • 15:41 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 15:41 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1001.eqiad.wmnet
  • 15:41 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1489 to wikikube-worker1021
  • 15:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1447 to wikikube-worker1020
  • 15:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1020
  • 15:39 topranks: deactivate Tranist and peering sessions on cr2-eqdfw in advance of power-supply change T366864
  • 15:39 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 15:39 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1020
  • 15:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1447 to wikikube-worker1020 - cgoubert@cumin1002"
  • 15:37 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1447 to wikikube-worker1020 - cgoubert@cumin1002"
  • 15:37 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 15:37 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 15:34 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 15:34 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1447 to wikikube-worker1020
  • 15:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1444 to wikikube-worker1019
  • 15:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1019
  • 15:32 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1019
  • 15:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1444 to wikikube-worker1019 - cgoubert@cumin1002"
  • 15:32 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1001.eqiad.wmnet
  • 15:31 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1444 to wikikube-worker1019 - cgoubert@cumin1002"
  • 15:31 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 15:29 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl2002.codfw.wmnet
  • 15:29 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:29 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 15:28 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 15:28 fabfur: upgrading haproxy to 2.8.10 on cp4037 (T367756)
  • 15:28 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp4037.*} and A:cp
  • 15:26 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp4037.*} and A:cp
  • 15:26 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 15:24 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1444 to wikikube-worker1019
  • 15:24 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1444.eqiad.wmnet
  • 15:24 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1444.eqiad.wmnet
  • 15:23 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 15:21 claime: Depooling mw1444.eqiad.wmnet,mw1447.eqiad.wmnet,mw1489.eqiad.wmnet for reimage - T351074
  • 15:20 topranks: draining transport circuits in/out of eqdfw in advance of router power-supply work/upgrade T366864
  • 15:17 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 15:17 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl2002.codfw.wmnet
  • 15:16 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts wikikube-ctrl2002.codfw.wmnet
  • 15:16 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl2002.codfw.wmnet
  • 15:10 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 15:03 claime: Repooling mw1359.eqiad.wmnet,mw1364.eqiad.wmnet,mw1365.eqiad.wmnet,mw1412.eqiad.wmnet pending fw upgrade - T351074
  • 15:03 cgoubert@cumin1002: conftool action : set/weight=30:pooled=yes; selector: name=(mw1359.eqiad.wmnet|mw1364.eqiad.wmnet|mw1365.eqiad.wmnet|mw1412.eqiad.wmnet)
  • 14:59 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 14:58 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 14:58 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 14:56 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 14:56 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 14:55 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1444.eqiad.wmnet
  • 14:55 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/FAQ On Countering Terrorist and Violent Extremist Content on Wikimedia Projects" "Wikimedia Foundation/Legal/FAQ On Countering Terrorist and Violent Extremist Content on Wikimedia Projects" "Zabe" --reason "per request T367216"
  • 14:54 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1444.eqiad.wmnet
  • 14:53 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 14:52 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet
  • 14:52 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt-wdqs1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 14:50 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Committee appointments/Announcement/Short" "Wikimedia Foundation/Legal/Committee appointments/Announcement/Short" "Zabe" --reason "per request T367216"
  • 14:48 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1412.eqiad.wmnet
  • 14:48 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1412.eqiad.wmnet
  • 14:48 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 14:47 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Committee appointments/Announcement" "Wikimedia Foundation/Legal/Committee appointments/Announcement" "Zabe" --reason "per request T367216"
  • 14:45 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1365.eqiad.wmnet
  • 14:45 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1365.eqiad.wmnet
  • 14:44 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1364.eqiad.wmnet
  • 14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 14:44 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1364.eqiad.wmnet
  • 14:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 14:44 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl2001.codfw.wmnet
  • 14:44 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:44 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 14:43 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 14:43 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 14:41 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Committee appointments" "Wikimedia Foundation/Legal/Committee appointments" "Zabe" --reason "per request T367216"
  • 14:39 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-staging2003 to codfw - jhancock@cumin2002"
  • 14:39 joal@deploy1002: Finished deploy [airflow-dags/analytics@b682892]: (no justification provided) (duration: 00m 33s)
  • 14:38 joal@deploy1002: Started deploy [airflow-dags/analytics@b682892]: (no justification provided)
  • 14:37 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Tools and processes" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Tools and processes" "Zabe" --reason "per request T367217"
  • 14:36 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 14:34 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Resources/What is a conduct warning" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Resources/What is a conduct warning" "Zabe" --reason "per request T367217"
  • 14:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-staging2003 to codfw - jhancock@cumin2002"
  • 14:31 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Resources" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Resources" "Zabe" --reason "per request T367217"
  • 14:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:30 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1364.eqiad.wmnet
  • 14:29 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1364.eqiad.wmnet
  • 14:28 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee/Legal agreement" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee/Legal agreement" "Zabe" --reason "per request T367217"
  • 14:27 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Brand Stewardship Report" "Wikimedia Foundation/Legal/Brand Stewardship Report" "Zabe" --reason "per request T367216"
  • 14:24 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1359.eqiad.wmnet
  • 14:23 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1359.eqiad.wmnet
  • 14:23 taavi@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt-wdqs1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 14:22 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2001.eqiad.wmnet
  • 14:21 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl2001.codfw.wmnet
  • 14:21 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Announcement/2023 OC and CRC appointments process" "Wikimedia Foundation/Legal/Announcement/2023 OC and CRC appointments process" "Zabe" --reason "per request T367216"
  • 14:18 claime: Depooling mw1359.eqiad.wmnet,mw1364.eqiad.wmnet,mw1365.eqiad.wmnet,mw1412.eqiad.wmnet for reimage - T351074
  • 14:17 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1003.eqiad.wmnet with OS bookworm
  • 14:17 claime: Depooling mw2323.codfw.wmnet,mw2324.codfw.wmnet,mw2326.codfw.wmnet,mw2327.codfw.wmnet,mw2328.codfw.wmnet,mw2329.codfw.wmnet for reimage - T351074
  • 14:17 urbanecm@deploy1002: Finished scap: Backport for Growth: Enable CommunityConfiguration on arwiki, eswiki (T364895) (duration: 15m 34s)
  • 14:16 Amir1: killing updateMenteeData.php --wiki=enwiki --statsd --dbshard s1
  • 14:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for 4 mw servers - cgoubert@cumin1002"
  • 14:11 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for 4 mw servers - cgoubert@cumin1002"
  • 14:11 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/talkheader" "Wikimedia Foundation/Legal/2023 ToU updates/talkheader" "Zabe" --reason "per request T367216"
  • 14:08 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:07 taavi@cumin1002: START - Cookbook sre.hosts.dhcp for host cloudvirt-wdqs1001.eqiad.wmnet
  • 14:06 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Proposed update" "Wikimedia Foundation/Legal/2023 ToU updates/Proposed update" "Zabe" --reason "per request T367216"
  • 14:06 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 14:06 vgutierrez: rolling upgrade on A:cp-codfw to fifo-log-demux 0.7.5 - T364383
  • 14:05 urbanecm@deploy1002: urbanecm: Backport for Growth: Enable CommunityConfiguration on arwiki, eswiki (T364895) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:04 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee/Charter" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee/Charter" "Zabe" --reason "per request T367217"
  • 14:02 vgutierrez: disable puppet on A:cp-codfw before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1046681 - T364383
  • 14:01 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee/Call for applicants" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee/Call for applicants" "Zabe" --reason "per request T367217"
  • 14:01 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 14:01 urbanecm@deploy1002: Started scap: Backport for Growth: Enable CommunityConfiguration on arwiki, eswiki (T364895)
  • 14:01 brouberol@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling reboot on A:datahubsearch
  • 14:00 urbanecm@deploy1002: Finished scap: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153) (duration: 16m 47s)
  • 13:54 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 13:52 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1036.eqiad.wmnet with OS bookworm
  • 13:51 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
  • 13:50 urbanecm@deploy1002: urbanecm, lucaswerkmeister-wmde: Continuing with sync
  • 13:48 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
  • 13:48 urbanecm@deploy1002: urbanecm, lucaswerkmeister-wmde: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:48 taavi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 13:45 brouberol@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling reboot on A:datahubsearch
  • 13:44 urbanecm@deploy1002: Started scap: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153)
  • 13:43 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
  • 13:43 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 13:43 urbanecm@deploy1002: Sync cancelled.
  • 13:43 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 13:43 urbanecm@deploy1002: lucaswerkmeister-wmde, urbanecm: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P65112 and previous config saved to /var/cache/conftool/dbconfig/20240617-133951-ladsgroup.json
  • 13:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 13:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 13:37 urbanecm@deploy1002: Started scap: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153)
  • 13:34 claime: Drained and cordoned wikikube-ctrl2001.codfw.wmnet wikikube-ctrl2002.codfw.wmnet
  • 13:33 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1003.eqiad.wmnet with OS bookworm
  • 13:33 claime: Uncordoned wikikube-ctrl2003.codfw.wmnet
  • 13:33 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
  • 13:33 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 13:26 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
  • 13:25 urbanecm@deploy1002: Finished scap: Backport for Enable subpages for the main namespace in sourceswiki (T367674), CommunityConfiguration: set feedback url instead of bug tool (T363801) (duration: 23m 07s)
  • 13:24 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
  • 13:14 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2002.codfw.wmnet
  • 13:14 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2001.codfw.wmnet
  • 13:14 brouberol@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-jumbo-eqiad
  • 13:13 vgutierrez: rolling upgrade on A:cp-ulsfo to fifo-log-demux 0.7.5 - T364383
  • 13:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T352010)', diff saved to https://phabricator.wikimedia.org/P65111 and previous config saved to /var/cache/conftool/dbconfig/20240617-131222-ladsgroup.json
  • 13:10 urbanecm@deploy1002: urbanecm, jhsoby, sgimeno: Continuing with sync
  • 13:07 urbanecm@deploy1002: urbanecm, jhsoby, sgimeno: Backport for Enable subpages for the main namespace in sourceswiki (T367674), CommunityConfiguration: set feedback url instead of bug tool (T363801) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:05 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1036.eqiad.wmnet with OS bookworm
  • 13:03 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1036.eqiad.wmnet with reason: reimage and move to OVS
  • 13:03 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1036.eqiad.wmnet with reason: reimage and move to OVS
  • 13:03 vgutierrez: disable puppet on A:cp-ulsfo before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1046665 - T364383
  • 13:02 urbanecm@deploy1002: Started scap: Backport for Enable subpages for the main namespace in sourceswiki (T367674), CommunityConfiguration: set feedback url instead of bug tool (T363801)
  • 12:59 joal@deploy1002: Finished deploy [airflow-dags/analytics@a8843e6]: (no justification provided) (duration: 00m 03s)
  • 12:59 joal@deploy1002: Started deploy [airflow-dags/analytics@a8843e6]: (no justification provided)
  • 12:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65110 and previous config saved to /var/cache/conftool/dbconfig/20240617-125715-ladsgroup.json
  • 12:53 vgutierrez: upload fifo-log-demux 0.7.5 to apt.wm.o (bullseye-wikimedia)
  • 12:47 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
  • 12:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65109 and previous config saved to /var/cache/conftool/dbconfig/20240617-124207-ladsgroup.json
  • 12:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 12:34 vgutierrez: upgrading HAProxy to version 2.8.10 on cp4051
  • 12:34 vgutierrez: fetch HAProxy 2.8.10 into thirdparty/haproxy28 component for bullseye-wikimedia (apt.wm.o)
  • 12:28 jynus: restarting ms-backup100[12], backup1004-7,11
  • 12:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T352010)', diff saved to https://phabricator.wikimedia.org/P65108 and previous config saved to /var/cache/conftool/dbconfig/20240617-122700-ladsgroup.json
  • 12:14 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2003.codfw.wmnet|wikikube-worker2004.codfw.wmnet|wikikube-worker2007.codfw.wmnet|wikikube-worker2008.codfw.wmnet|wikikube-worker2009.codfw.wmnet|wikikube-worker2010.codfw.wmnet),cluster=kubernetes,service=kubesvc
  • 12:14 claime: pooling and uncordoning wikikube-worker2003.codfw.wmnet wikikube-worker2004.codfw.wmnet wikikube-worker2007.codfw.wmnet wikikube-worker2008.codfw.wmnet wikikube-worker2009.codfw.wmnet wikikube-worker2010.codfw.wmnet - T351074
  • 12:09 ayounsi@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 15830
  • 12:07 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 15830
  • 12:04 jynus: restart db1204, db1205
  • 12:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2008.codfw.wmnet with OS bullseye
  • 12:03 claime: homer 'cr*codfw*' commit 'T351074'
  • 12:02 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1035.eqiad.wmnet with OS bookworm
  • 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: archiva
  • 12:01 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2003.codfw.wmnet
  • 12:01 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2003.codfw.wmnet
  • 11:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2010.codfw.wmnet with OS bullseye
  • 11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-worker2003.codfw.wmnet
  • 11:54 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-worker2003.codfw.wmnet
  • 11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2009.codfw.wmnet with OS bullseye
  • 11:53 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: archiva
  • 11:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2003.codfw.wmnet with OS bullseye
  • 11:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2007.codfw.wmnet with OS bullseye
  • 11:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2004.codfw.wmnet with OS bullseye
  • 11:47 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
  • 11:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2008.codfw.wmnet with reason: host reimage
  • 11:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2010.codfw.wmnet with reason: host reimage
  • 11:37 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee" "Zabe" --reason "per request T367217"
  • 11:37 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
  • 11:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2009.codfw.wmnet with reason: host reimage
  • 11:34 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
  • 11:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2007.codfw.wmnet with reason: host reimage
  • 11:30 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Office hours/Reminder" "Wikimedia Foundation/Legal/2023 ToU updates/Office hours/Reminder" "Zabe" --reason "per request T367216"
  • 11:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2004.codfw.wmnet with reason: host reimage
  • 11:26 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Office hours/Announcement" "Wikimedia Foundation/Legal/2023 ToU updates/Office hours/Announcement" "Zabe" --reason "per request T367216"
  • 11:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2003.codfw.wmnet with reason: host reimage
  • 11:25 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2010.codfw.wmnet with reason: host reimage
  • 11:25 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2009.codfw.wmnet with reason: host reimage
  • 11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2008.codfw.wmnet with reason: host reimage
  • 11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2007.codfw.wmnet with reason: host reimage
  • 11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2004.codfw.wmnet with reason: host reimage
  • 11:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2003.codfw.wmnet with reason: host reimage
  • 11:23 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2003.codfw.wmnet with reason: host reimage
  • 11:22 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Office hours" "Wikimedia Foundation/Legal/2023 ToU updates/Office hours" "Zabe" --reason "per request T367216"
  • 11:17 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/LandingCNTranslate" "Wikimedia Foundation/Legal/2023 ToU updates/LandingCNTranslate" "Zabe" --reason "per request T367216"
  • 11:17 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on archiva1002.wikimedia.org with reason: Upgrading to bullseye
  • 11:17 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on archiva1002.wikimedia.org with reason: Upgrading to bullseye
  • 11:16 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2003.codfw.wmnet with reason: host reimage
  • 11:16 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1035.eqiad.wmnet with OS bookworm
  • 11:13 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1035.eqiad.wmnet with reason: reimage and move to OVS
  • 11:13 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1035.eqiad.wmnet with reason: reimage and move to OVS
  • 11:11 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/About" "Wikimedia Foundation/Legal/2023 ToU updates/About" "Zabe" --reason "per request T367216"
  • 11:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2010.codfw.wmnet with OS bullseye
  • 11:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2009.codfw.wmnet with OS bullseye
  • 11:08 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2008.codfw.wmnet with OS bullseye
  • 11:08 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2007.codfw.wmnet with OS bullseye
  • 11:08 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2004.codfw.wmnet with OS bullseye
  • 11:07 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2003.codfw.wmnet with OS bullseye
  • 11:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2329 to wikikube-worker2010
  • 11:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2010
  • 11:06 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2010
  • 11:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2329 to wikikube-worker2010 - cgoubert@cumin1002"
  • 11:03 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2329 to wikikube-worker2010 - cgoubert@cumin1002"
  • 11:03 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates" "Wikimedia Foundation/Legal/2023 ToU updates" "Zabe" --reason "per request T367216"
  • 11:01 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
  • 10:59 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
  • 10:58 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:57 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2329 to wikikube-worker2010
  • 10:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2328 to wikikube-worker2009
  • 10:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2009
  • 10:55 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2009
  • 10:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2328 to wikikube-worker2009 - cgoubert@cumin1002"
  • 10:54 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2328 to wikikube-worker2009 - cgoubert@cumin1002"
  • 10:52 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
  • 10:51 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:51 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2328 to wikikube-worker2009
  • 10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2327 to wikikube-worker2008
  • 10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2008
  • 10:50 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2008
  • 10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2327 to wikikube-worker2008 - cgoubert@cumin1002"
  • 10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw2321.codfw.wmnet with reason: hardware issue
  • 10:50 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw2321.codfw.wmnet with reason: hardware issue
  • 10:49 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2327 to wikikube-worker2008 - cgoubert@cumin1002"
  • 10:48 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 10:46 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:46 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2327 to wikikube-worker2008
  • 10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2326 to wikikube-worker2007
  • 10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2007
  • 10:45 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2007
  • 10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2326 to wikikube-worker2007 - cgoubert@cumin1002"
  • 10:43 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2326 to wikikube-worker2007 - cgoubert@cumin1002"
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2326 to wikikube-worker2007
  • 10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2324 to wikikube-worker2004
  • 10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2004
  • 10:39 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2004
  • 10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2324 to wikikube-worker2004 - cgoubert@cumin1002"
  • 10:38 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2324 to wikikube-worker2004 - cgoubert@cumin1002"
  • 10:37 jynus: restarting ms-backup200[12], backup2004-7,11
  • 10:35 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:35 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2324 to wikikube-worker2004
  • 10:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2323 to wikikube-worker2003
  • 10:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2003
  • 10:34 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2003
  • 10:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2323 to wikikube-worker2003 - cgoubert@cumin1002"
  • 10:34 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2003
  • 10:34 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2003
  • 10:33 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2323 to wikikube-worker2003 - cgoubert@cumin1002"
  • 10:31 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:31 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2323 to wikikube-worker2003
  • 10:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T367261)', diff saved to https://phabricator.wikimedia.org/P65107 and previous config saved to /var/cache/conftool/dbconfig/20240617-102938-marostegui.json
  • 10:26 jynus: restarting db2183, db2184
  • 10:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for mw232[3-9] - cgoubert@cumin1002"
  • 10:21 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for mw232[3-9] - cgoubert@cumin1002"
  • 10:17 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65106 and previous config saved to /var/cache/conftool/dbconfig/20240617-101431-marostegui.json
  • 10:11 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:10 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 10:09 claime: Depooling mw2323.codfw.wmnet,mw2324.codfw.wmnet,mw2326.codfw.wmnet,mw2327.codfw.wmnet,mw2328.codfw.wmnet,mw2329.codfw.wmnet for reimage - T351074
  • 10:08 claime: Depooling mw2323.codfw.wmnet,mw2324.codfw.wmnet,mw2326.codfw.wmnet,mw2327.codfw.wmnet,mw2328.codfw.wmnet,mw2329.codfw.wmnet for reimage
  • 10:01 claime: draining and cordoning mw2321 - T367702
  • 10:01 brouberol@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-jumbo-eqiad
  • 10:01 taavi@deploy1002: Finished scap: Backport for Stop loading OSM i18n (T161553) (duration: 34m 07s)
  • 09:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65104 and previous config saved to /var/cache/conftool/dbconfig/20240617-095924-marostegui.json
  • 09:54 jayme@deploy1002: Finished deploy [docker-pkg/deploy@38eb04d]: Update docker-pkg to 4.0.1 (duration: 00m 24s)
  • 09:53 jayme@deploy1002: Started deploy [docker-pkg/deploy@38eb04d]: Update docker-pkg to 4.0.1
  • 09:52 jayme@deploy1002: Finished deploy [docker-pkg/deploy@4dbea81]: Update docker-pkg to 4.0.1 (duration: 00m 38s)
  • 09:51 jayme@deploy1002: Started deploy [docker-pkg/deploy@4dbea81]: Update docker-pkg to 4.0.1
  • 09:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:49 taavi@deploy1002: taavi: Continuing with sync
  • 09:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P65103 and previous config saved to /var/cache/conftool/dbconfig/20240617-094926-marostegui.json
  • 09:48 taavi@deploy1002: taavi: Backport for Stop loading OSM i18n (T161553) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T367261)', diff saved to https://phabricator.wikimedia.org/P65102 and previous config saved to /var/cache/conftool/dbconfig/20240617-094417-marostegui.json
  • 09:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T367261)', diff saved to https://phabricator.wikimedia.org/P65101 and previous config saved to /var/cache/conftool/dbconfig/20240617-094034-marostegui.json
  • 09:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 09:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 09:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 09:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367261)', diff saved to https://phabricator.wikimedia.org/P65100 and previous config saved to /var/cache/conftool/dbconfig/20240617-093427-marostegui.json
  • 09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P65099 and previous config saved to /var/cache/conftool/dbconfig/20240617-093419-marostegui.json
  • 09:26 taavi@deploy1002: Started scap: Backport for Stop loading OSM i18n (T161553)
  • 09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65098 and previous config saved to /var/cache/conftool/dbconfig/20240617-091920-marostegui.json
  • 09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P65097 and previous config saved to /var/cache/conftool/dbconfig/20240617-091912-marostegui.json
  • 09:05 brouberol@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-test-eqiad
  • 09:04 _joe_: removed damaged AOF file for redis rdb1014-6379, resyncing with primary
  • 09:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65096 and previous config saved to /var/cache/conftool/dbconfig/20240617-090413-marostegui.json
  • 09:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P65095 and previous config saved to /var/cache/conftool/dbconfig/20240617-090405-marostegui.json
  • 09:01 urbanecm@deploy1002: Finished scap: Backport for throttle: Fix exemption for ongoing course (duration: 25m 05s)
  • 08:53 claime: hardcycling rdb1014
  • 08:49 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=mw2321.codfw.wmnet
  • 08:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367261)', diff saved to https://phabricator.wikimedia.org/P65094 and previous config saved to /var/cache/conftool/dbconfig/20240617-084906-marostegui.json
  • 08:40 claime: powercycling rdb1014
  • 08:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 08:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 08:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367261)', diff saved to https://phabricator.wikimedia.org/P65093 and previous config saved to /var/cache/conftool/dbconfig/20240617-083755-marostegui.json
  • 08:36 urbanecm@deploy1002: Started scap: Backport for throttle: Fix exemption for ongoing course
  • 08:25 brouberol@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-test-eqiad
  • 08:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65092 and previous config saved to /var/cache/conftool/dbconfig/20240617-082248-marostegui.json
  • 08:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65091 and previous config saved to /var/cache/conftool/dbconfig/20240617-080741-marostegui.json
  • 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367261)', diff saved to https://phabricator.wikimedia.org/P65090 and previous config saved to /var/cache/conftool/dbconfig/20240617-075234-marostegui.json
  • 07:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T367261)', diff saved to https://phabricator.wikimedia.org/P65089 and previous config saved to /var/cache/conftool/dbconfig/20240617-074542-marostegui.json
  • 07:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 07:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 07:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367261)', diff saved to https://phabricator.wikimedia.org/P65088 and previous config saved to /var/cache/conftool/dbconfig/20240617-074530-marostegui.json
  • 07:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65087 and previous config saved to /var/cache/conftool/dbconfig/20240617-073023-marostegui.json
  • 07:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65086 and previous config saved to /var/cache/conftool/dbconfig/20240617-071516-marostegui.json
  • 07:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367261)', diff saved to https://phabricator.wikimedia.org/P65085 and previous config saved to /var/cache/conftool/dbconfig/20240617-070009-marostegui.json
  • 06:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T352010)', diff saved to https://phabricator.wikimedia.org/P65084 and previous config saved to /var/cache/conftool/dbconfig/20240617-065647-ladsgroup.json
  • 06:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 06:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 06:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T352010)', diff saved to https://phabricator.wikimedia.org/P65083 and previous config saved to /var/cache/conftool/dbconfig/20240617-065625-ladsgroup.json
  • 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2138 (T367261)', diff saved to https://phabricator.wikimedia.org/P65082 and previous config saved to /var/cache/conftool/dbconfig/20240617-065357-marostegui.json
  • 06:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 06:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367261)', diff saved to https://phabricator.wikimedia.org/P65081 and previous config saved to /var/cache/conftool/dbconfig/20240617-065335-marostegui.json
  • 06:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P65080 and previous config saved to /var/cache/conftool/dbconfig/20240617-064118-ladsgroup.json
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65079 and previous config saved to /var/cache/conftool/dbconfig/20240617-063923-root.json
  • 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65078 and previous config saved to /var/cache/conftool/dbconfig/20240617-063826-marostegui.json
  • 06:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P65077 and previous config saved to /var/cache/conftool/dbconfig/20240617-062612-ladsgroup.json
  • 06:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65076 and previous config saved to /var/cache/conftool/dbconfig/20240617-062511-root.json
  • 06:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65075 and previous config saved to /var/cache/conftool/dbconfig/20240617-062418-root.json
  • 06:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65074 and previous config saved to /var/cache/conftool/dbconfig/20240617-062319-marostegui.json
  • 06:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T352010)', diff saved to https://phabricator.wikimedia.org/P65073 and previous config saved to /var/cache/conftool/dbconfig/20240617-061105-ladsgroup.json
  • 06:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65072 and previous config saved to /var/cache/conftool/dbconfig/20240617-061006-root.json
  • 06:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65071 and previous config saved to /var/cache/conftool/dbconfig/20240617-060913-root.json
  • 06:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367261)', diff saved to https://phabricator.wikimedia.org/P65070 and previous config saved to /var/cache/conftool/dbconfig/20240617-060812-marostegui.json
  • 06:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T367261)', diff saved to https://phabricator.wikimedia.org/P65069 and previous config saved to /var/cache/conftool/dbconfig/20240617-060352-marostegui.json
  • 06:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 06:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 06:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 06:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 06:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367261)', diff saved to https://phabricator.wikimedia.org/P65068 and previous config saved to /var/cache/conftool/dbconfig/20240617-060326-marostegui.json
  • 05:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65067 and previous config saved to /var/cache/conftool/dbconfig/20240617-055501-root.json
  • 05:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65066 and previous config saved to /var/cache/conftool/dbconfig/20240617-055407-root.json
  • 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65065 and previous config saved to /var/cache/conftool/dbconfig/20240617-054819-marostegui.json
  • 05:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65064 and previous config saved to /var/cache/conftool/dbconfig/20240617-053955-root.json
  • 05:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65063 and previous config saved to /var/cache/conftool/dbconfig/20240617-053902-root.json
  • 05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65062 and previous config saved to /var/cache/conftool/dbconfig/20240617-053312-marostegui.json
  • 05:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65061 and previous config saved to /var/cache/conftool/dbconfig/20240617-052450-root.json
  • 05:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65060 and previous config saved to /var/cache/conftool/dbconfig/20240617-052355-root.json
  • 05:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367261)', diff saved to https://phabricator.wikimedia.org/P65059 and previous config saved to /var/cache/conftool/dbconfig/20240617-051805-marostegui.json
  • 05:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65058 and previous config saved to /var/cache/conftool/dbconfig/20240617-050944-root.json
  • 05:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2125 (T367261)', diff saved to https://phabricator.wikimedia.org/P65057 and previous config saved to /var/cache/conftool/dbconfig/20240617-050852-marostegui.json
  • 05:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65056 and previous config saved to /var/cache/conftool/dbconfig/20240617-050849-root.json
  • 05:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 05:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 05:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P65055 and previous config saved to /var/cache/conftool/dbconfig/20240617-050756-marostegui.json
  • 05:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 05:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2122 (T367261)', diff saved to https://phabricator.wikimedia.org/P65054 and previous config saved to /var/cache/conftool/dbconfig/20240617-050324-marostegui.json
  • 05:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 05:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance

2024-06-16

  • 22:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T352010)', diff saved to https://phabricator.wikimedia.org/P65053 and previous config saved to /var/cache/conftool/dbconfig/20240616-221944-ladsgroup.json
  • 22:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 22:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 22:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T352010)', diff saved to https://phabricator.wikimedia.org/P65052 and previous config saved to /var/cache/conftool/dbconfig/20240616-221921-ladsgroup.json
  • 22:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65051 and previous config saved to /var/cache/conftool/dbconfig/20240616-220414-ladsgroup.json
  • 21:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65050 and previous config saved to /var/cache/conftool/dbconfig/20240616-214907-ladsgroup.json
  • 21:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T352010)', diff saved to https://phabricator.wikimedia.org/P65049 and previous config saved to /var/cache/conftool/dbconfig/20240616-213400-ladsgroup.json
  • 14:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T352010)', diff saved to https://phabricator.wikimedia.org/P65047 and previous config saved to /var/cache/conftool/dbconfig/20240616-140214-ladsgroup.json
  • 14:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 14:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 14:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T352010)', diff saved to https://phabricator.wikimedia.org/P65046 and previous config saved to /var/cache/conftool/dbconfig/20240616-140152-ladsgroup.json
  • 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65045 and previous config saved to /var/cache/conftool/dbconfig/20240616-134645-ladsgroup.json
  • 13:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65044 and previous config saved to /var/cache/conftool/dbconfig/20240616-133137-ladsgroup.json
  • 13:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T352010)', diff saved to https://phabricator.wikimedia.org/P65043 and previous config saved to /var/cache/conftool/dbconfig/20240616-131630-ladsgroup.json
  • 05:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2138 (T352010)', diff saved to https://phabricator.wikimedia.org/P65042 and previous config saved to /var/cache/conftool/dbconfig/20240616-055411-ladsgroup.json
  • 05:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 05:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 05:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T352010)', diff saved to https://phabricator.wikimedia.org/P65041 and previous config saved to /var/cache/conftool/dbconfig/20240616-055359-ladsgroup.json
  • 05:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65040 and previous config saved to /var/cache/conftool/dbconfig/20240616-053852-ladsgroup.json
  • 05:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65039 and previous config saved to /var/cache/conftool/dbconfig/20240616-052345-ladsgroup.json
  • 05:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T352010)', diff saved to https://phabricator.wikimedia.org/P65038 and previous config saved to /var/cache/conftool/dbconfig/20240616-050838-ladsgroup.json
  • 03:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 03:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 03:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T364069)', diff saved to https://phabricator.wikimedia.org/P65037 and previous config saved to /var/cache/conftool/dbconfig/20240616-032102-marostegui.json
  • 03:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P65036 and previous config saved to /var/cache/conftool/dbconfig/20240616-030555-marostegui.json
  • 02:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P65035 and previous config saved to /var/cache/conftool/dbconfig/20240616-025048-marostegui.json
  • 02:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T364069)', diff saved to https://phabricator.wikimedia.org/P65034 and previous config saved to /var/cache/conftool/dbconfig/20240616-023541-marostegui.json
  • 00:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T352010)', diff saved to https://phabricator.wikimedia.org/P65033 and previous config saved to /var/cache/conftool/dbconfig/20240616-000421-ladsgroup.json
  • 00:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 00:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 00:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T352010)', diff saved to https://phabricator.wikimedia.org/P65032 and previous config saved to /var/cache/conftool/dbconfig/20240616-000343-ladsgroup.json

2024-06-15

  • 23:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65031 and previous config saved to /var/cache/conftool/dbconfig/20240615-234836-ladsgroup.json
  • 23:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65030 and previous config saved to /var/cache/conftool/dbconfig/20240615-233329-ladsgroup.json
  • 23:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T352010)', diff saved to https://phabricator.wikimedia.org/P65029 and previous config saved to /var/cache/conftool/dbconfig/20240615-231822-ladsgroup.json
  • 21:18 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T364069)', diff saved to https://phabricator.wikimedia.org/P65028 and previous config saved to /var/cache/conftool/dbconfig/20240615-211811-marostegui.json
  • 21:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 21:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 21:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T364069)', diff saved to https://phabricator.wikimedia.org/P65027 and previous config saved to /var/cache/conftool/dbconfig/20240615-211750-marostegui.json
  • 21:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P65026 and previous config saved to /var/cache/conftool/dbconfig/20240615-210243-marostegui.json
  • 20:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P65025 and previous config saved to /var/cache/conftool/dbconfig/20240615-204735-marostegui.json
  • 20:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T364069)', diff saved to https://phabricator.wikimedia.org/P65024 and previous config saved to /var/cache/conftool/dbconfig/20240615-203229-marostegui.json
  • 16:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P65021 and previous config saved to /var/cache/conftool/dbconfig/20240615-163203-marostegui.json
  • 16:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P65020 and previous config saved to /var/cache/conftool/dbconfig/20240615-161656-marostegui.json
  • 16:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65019 and previous config saved to /var/cache/conftool/dbconfig/20240615-160149-marostegui.json
  • 11:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65018 and previous config saved to /var/cache/conftool/dbconfig/20240615-115812-marostegui.json
  • 11:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 11:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 11:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65017 and previous config saved to /var/cache/conftool/dbconfig/20240615-115750-marostegui.json
  • 11:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P65016 and previous config saved to /var/cache/conftool/dbconfig/20240615-114243-marostegui.json
  • 11:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P65015 and previous config saved to /var/cache/conftool/dbconfig/20240615-112736-marostegui.json
  • 11:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65014 and previous config saved to /var/cache/conftool/dbconfig/20240615-111229-marostegui.json
  • 09:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2125 (T352010)', diff saved to https://phabricator.wikimedia.org/P65013 and previous config saved to /var/cache/conftool/dbconfig/20240615-092730-ladsgroup.json
  • 09:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 09:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 07:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65012 and previous config saved to /var/cache/conftool/dbconfig/20240615-071215-marostegui.json
  • 07:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 07:11 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 07:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65011 and previous config saved to /var/cache/conftool/dbconfig/20240615-071152-marostegui.json
  • 06:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P65010 and previous config saved to /var/cache/conftool/dbconfig/20240615-065645-marostegui.json
  • 06:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P65009 and previous config saved to /var/cache/conftool/dbconfig/20240615-064138-marostegui.json
  • 06:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65008 and previous config saved to /var/cache/conftool/dbconfig/20240615-062631-marostegui.json
  • 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T367261)', diff saved to https://phabricator.wikimedia.org/P65007 and previous config saved to /var/cache/conftool/dbconfig/20240615-061919-marostegui.json
  • 06:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 06:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T367261)', diff saved to https://phabricator.wikimedia.org/P65006 and previous config saved to /var/cache/conftool/dbconfig/20240615-061908-marostegui.json
  • 06:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P65005 and previous config saved to /var/cache/conftool/dbconfig/20240615-060401-marostegui.json
  • 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P65004 and previous config saved to /var/cache/conftool/dbconfig/20240615-054854-marostegui.json
  • 05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T367261)', diff saved to https://phabricator.wikimedia.org/P65003 and previous config saved to /var/cache/conftool/dbconfig/20240615-053346-marostegui.json
  • 05:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T367261)', diff saved to https://phabricator.wikimedia.org/P65002 and previous config saved to /var/cache/conftool/dbconfig/20240615-050236-marostegui.json
  • 05:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 05:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 02:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 02:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 02:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P65001 and previous config saved to /var/cache/conftool/dbconfig/20240615-024019-ladsgroup.json
  • 02:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65000 and previous config saved to /var/cache/conftool/dbconfig/20240615-023904-marostegui.json
  • 02:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 02:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 02:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P64999 and previous config saved to /var/cache/conftool/dbconfig/20240615-023842-marostegui.json
  • 02:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P64998 and previous config saved to /var/cache/conftool/dbconfig/20240615-022512-ladsgroup.json
  • 02:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P64997 and previous config saved to /var/cache/conftool/dbconfig/20240615-022335-marostegui.json
  • 02:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P64996 and previous config saved to /var/cache/conftool/dbconfig/20240615-021005-ladsgroup.json
  • 02:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P64995 and previous config saved to /var/cache/conftool/dbconfig/20240615-020827-marostegui.json
  • 01:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P64994 and previous config saved to /var/cache/conftool/dbconfig/20240615-015458-ladsgroup.json
  • 01:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P64993 and previous config saved to /var/cache/conftool/dbconfig/20240615-015320-marostegui.json

2024-06-14

  • 23:09 mnz@deploy1002: Finished deploy [airflow-dags/research@ee5a291]: (no justification provided) (duration: 00m 30s)
  • 23:09 mnz@deploy1002: Started deploy [airflow-dags/research@ee5a291]: (no justification provided)
  • 22:55 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4041.ulsfo.wmnet
  • 22:50 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4041.ulsfo.wmnet with OS bullseye
  • 22:33 mnz@deploy1002: Finished deploy [airflow-dags/research@5e1cd80]: (no justification provided) (duration: 00m 31s)
  • 22:33 mnz@deploy1002: Started deploy [airflow-dags/research@5e1cd80]: (no justification provided)
  • 22:27 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
  • 22:24 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
  • 22:03 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4041.ulsfo.wmnet with OS bullseye
  • 22:02 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4041.ulsfo.wmnet with OS bullseye
  • 21:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P64992 and previous config saved to /var/cache/conftool/dbconfig/20240614-214910-marostegui.json
  • 21:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 21:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 21:46 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4041.ulsfo.wmnet with OS bullseye
  • 21:33 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4040.ulsfo.wmnet
  • 21:33 Emperor: restart swift-proxy on ms-fe1010 T360913
  • 21:31 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4041.ulsfo.wmnet
  • 21:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64991 and previous config saved to /var/cache/conftool/dbconfig/20240614-211239-ladsgroup.json
  • 20:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P64990 and previous config saved to /var/cache/conftool/dbconfig/20240614-205731-ladsgroup.json
  • 20:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P64989 and previous config saved to /var/cache/conftool/dbconfig/20240614-204224-ladsgroup.json
  • 20:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64988 and previous config saved to /var/cache/conftool/dbconfig/20240614-202717-ladsgroup.json
  • 20:22 cdobbins@cumin1002: conftool action : set/pooled=yes; selector: name=4040.ulsfo.wmnet
  • 20:14 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS bullseye
  • 19:52 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
  • 19:49 cdobbins@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
  • 19:27 cdobbins@cumin1002: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS bullseye
  • 19:27 cdobbins@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4040.ulsfo.wmnet with OS bullseye
  • 19:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P64987 and previous config saved to /var/cache/conftool/dbconfig/20240614-192643-ladsgroup.json
  • 19:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 19:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 19:00 cdobbins@cumin1002: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS bullseye
  • 18:54 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=4040.ulsfo.wmnet
  • 17:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:23 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:11 jdrewniak@deploy1002: Finished scap: Backport for For now scope hatnote and infobox styles (T367462) (duration: 16m 06s)
  • 17:01 jdrewniak@deploy1002: jdlrobson, jdrewniak: Continuing with sync
  • 16:31 jan_drewniak: starting friday backport for T367462 https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaMessages/+/1043827
  • 16:25 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
  • 16:22 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
  • 16:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1002.eqiad.wmnet with OS bookworm
  • 16:01 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS bullseye
  • 16:00 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4039.ulsfo.wmnet with OS bullseye
  • 15:58 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1002.eqiad.wmnet with reason: host reimage
  • 15:55 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1002.eqiad.wmnet with reason: host reimage
  • 15:48 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS bullseye
  • 15:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:37 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be1002.eqiad.wmnet with OS bookworm
  • 15:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T364069)', diff saved to https://phabricator.wikimedia.org/P64984 and previous config saved to /var/cache/conftool/dbconfig/20240614-153727-marostegui.json
  • 15:37 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-be1002.eqiad.wmnet with OS bookworm
  • 15:32 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:32 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:31 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:31 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:29 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:27 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be1002.eqiad.wmnet with OS bookworm
  • 15:27 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:27 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:26 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:25 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4039.ulsfo.wmnet
  • 15:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P64982 and previous config saved to /var/cache/conftool/dbconfig/20240614-152220-marostegui.json
  • 15:21 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:21 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P64981 and previous config saved to /var/cache/conftool/dbconfig/20240614-150713-marostegui.json
  • 14:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be2003.codfw.wmnet with OS bookworm
  • 14:54 jynus: upgrade db1245 to mariadb 10.6 T360751
  • 14:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T364069)', diff saved to https://phabricator.wikimedia.org/P64980 and previous config saved to /var/cache/conftool/dbconfig/20240614-145206-marostegui.json
  • 14:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367261)', diff saved to https://phabricator.wikimedia.org/P64979 and previous config saved to /var/cache/conftool/dbconfig/20240614-144925-marostegui.json
  • 14:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P64978 and previous config saved to /var/cache/conftool/dbconfig/20240614-143418-marostegui.json
  • 14:34 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be2003.codfw.wmnet with reason: host reimage
  • 14:31 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be2003.codfw.wmnet with reason: host reimage
  • 14:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P64976 and previous config saved to /var/cache/conftool/dbconfig/20240614-141911-marostegui.json
  • 14:16 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:12 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bookworm
  • 14:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be2002.codfw.wmnet with OS bookworm
  • 14:11 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
  • 14:11 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1034.eqiad.wmnet with OS bookworm
  • 14:10 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
  • 14:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ldap-maint hosts - jmm@cumin2002 - T367490"
  • 14:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367261)', diff saved to https://phabricator.wikimedia.org/P64975 and previous config saved to /var/cache/conftool/dbconfig/20240614-140404-marostegui.json
  • 14:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T367261)', diff saved to https://phabricator.wikimedia.org/P64974 and previous config saved to /var/cache/conftool/dbconfig/20240614-140125-marostegui.json
  • 14:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 14:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 13:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2201.codfw.wmnet with reason: Maintenance
  • 13:59 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2201.codfw.wmnet with reason: Maintenance
  • 13:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367261)', diff saved to https://phabricator.wikimedia.org/P64973 and previous config saved to /var/cache/conftool/dbconfig/20240614-135900-marostegui.json
  • 13:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:52 jynus: restart db2139, db2141
  • 13:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be2002.codfw.wmnet with reason: host reimage
  • 13:49 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ldap-maint hosts - jmm@cumin2002 - T367490"
  • 13:47 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be2002.codfw.wmnet with reason: host reimage
  • 13:44 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
  • 13:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P64972 and previous config saved to /var/cache/conftool/dbconfig/20240614-134354-marostegui.json
  • 13:41 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
  • 13:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P64971 and previous config saved to /var/cache/conftool/dbconfig/20240614-132847-marostegui.json
  • 13:28 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2002.codfw.wmnet with OS bookworm
  • 13:24 jynus: restart db1216, db1225, db1240, db1245
  • 13:23 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1034.eqiad.wmnet with OS bookworm
  • 13:22 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1034.eqiad.wmnet with reason: reimage and move to OVS
  • 13:22 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1034.eqiad.wmnet with reason: reimage and move to OVS
  • 13:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be2001.codfw.wmnet with OS bookworm
  • 13:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367261)', diff saved to https://phabricator.wikimedia.org/P64970 and previous config saved to /var/cache/conftool/dbconfig/20240614-131339-marostegui.json
  • 13:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T367261)', diff saved to https://phabricator.wikimedia.org/P64969 and previous config saved to /var/cache/conftool/dbconfig/20240614-131113-marostegui.json
  • 13:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 13:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 13:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367261)', diff saved to https://phabricator.wikimedia.org/P64968 and previous config saved to /var/cache/conftool/dbconfig/20240614-131051-marostegui.json
  • 13:05 jynus: restart db1150, db1171
  • 12:58 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:58 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:58 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be2001.codfw.wmnet with reason: host reimage
  • 12:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P64967 and previous config saved to /var/cache/conftool/dbconfig/20240614-125543-marostegui.json
  • 12:54 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be2001.codfw.wmnet with reason: host reimage
  • 12:51 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2001.codfw.wmnet with OS bookworm
  • 12:45 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gitlab2002.wikimedia.org with reason: GitLab upgrade
  • 12:45 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:15:00 on gitlab2002.wikimedia.org with reason: GitLab upgrade
  • 12:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P64966 and previous config saved to /var/cache/conftool/dbconfig/20240614-124036-marostegui.json
  • 12:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367261)', diff saved to https://phabricator.wikimedia.org/P64964 and previous config saved to /var/cache/conftool/dbconfig/20240614-122530-marostegui.json
  • 12:23 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host moss-be2001.codfw.wmnet with OS bookworm
  • 12:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T367261)', diff saved to https://phabricator.wikimedia.org/P64963 and previous config saved to /var/cache/conftool/dbconfig/20240614-122255-marostegui.json
  • 12:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 12:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 12:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367261)', diff saved to https://phabricator.wikimedia.org/P64962 and previous config saved to /var/cache/conftool/dbconfig/20240614-122233-marostegui.json
  • 12:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64961 and previous config saved to /var/cache/conftool/dbconfig/20240614-122210-ladsgroup.json
  • 12:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 12:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 12:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P64960 and previous config saved to /var/cache/conftool/dbconfig/20240614-120918-ladsgroup.json
  • 12:09 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on clouddb1018.eqiad.wmnet with reason: hardware issues T367499
  • 12:08 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on clouddb1018.eqiad.wmnet with reason: hardware issues T367499
  • 12:08 fnegri@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host clouddb1018.eqiad.wmnet
  • 12:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P64959 and previous config saved to /var/cache/conftool/dbconfig/20240614-120727-marostegui.json
  • 12:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64958 and previous config saved to /var/cache/conftool/dbconfig/20240614-120704-ladsgroup.json
  • 12:01 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: GitLab to new version
  • 11:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P64957 and previous config saved to /var/cache/conftool/dbconfig/20240614-115411-ladsgroup.json
  • 11:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P64956 and previous config saved to /var/cache/conftool/dbconfig/20240614-115220-marostegui.json
  • 11:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64955 and previous config saved to /var/cache/conftool/dbconfig/20240614-115159-ladsgroup.json
  • 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64954 and previous config saved to /var/cache/conftool/dbconfig/20240614-114002-ladsgroup.json
  • 11:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P64953 and previous config saved to /var/cache/conftool/dbconfig/20240614-113904-ladsgroup.json
  • 11:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367261)', diff saved to https://phabricator.wikimedia.org/P64952 and previous config saved to /var/cache/conftool/dbconfig/20240614-113712-marostegui.json
  • 11:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-maint1001.eqiad.wmnet
  • 11:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ldap-maint1001.eqiad.wmnet with OS bookworm
  • 11:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P64951 and previous config saved to /var/cache/conftool/dbconfig/20240614-113654-ladsgroup.json
  • 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T367261)', diff saved to https://phabricator.wikimedia.org/P64950 and previous config saved to /var/cache/conftool/dbconfig/20240614-113325-marostegui.json
  • 11:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 11:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367261)', diff saved to https://phabricator.wikimedia.org/P64949 and previous config saved to /var/cache/conftool/dbconfig/20240614-113303-marostegui.json
  • 11:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 11:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P64948 and previous config saved to /var/cache/conftool/dbconfig/20240614-112357-ladsgroup.json
  • 11:21 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1018.eqiad.wmnet
  • 11:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-maint1001.eqiad.wmnet with reason: host reimage
  • 11:18 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1018.eqiad.wmnet with reason: T366555
  • 11:18 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1018.eqiad.wmnet with reason: T366555
  • 11:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P64947 and previous config saved to /var/cache/conftool/dbconfig/20240614-111756-marostegui.json
  • 11:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-maint1001.eqiad.wmnet with reason: host reimage
  • 11:06 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:06 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:02 jynus: restart backup* hosts
  • 11:02 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2001.codfw.wmnet with OS bookworm
  • 11:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P64946 and previous config saved to /var/cache/conftool/dbconfig/20240614-110249-marostegui.json
  • 11:00 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-be2001.codfw.wmnet with OS bookworm
  • 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2001.codfw.wmnet with OS bookworm
  • 10:56 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: sync
  • 10:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1018.eqiad.wmnet with reason: T366555
  • 10:55 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on clouddb1018.eqiad.wmnet with reason: T366555
  • 10:55 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: sync
  • 10:55 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: sync
  • 10:54 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: sync
  • 10:54 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-be2001.codfw.wmnet with OS bookworm
  • 10:54 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s7
  • 10:54 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s2
  • 10:54 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: sync
  • 10:53 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: sync
  • 10:53 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: sync
  • 10:52 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: sync
  • 10:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367261)', diff saved to https://phabricator.wikimedia.org/P64945 and previous config saved to /var/cache/conftool/dbconfig/20240614-104742-marostegui.json
  • 10:45 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2001.codfw.wmnet with OS bookworm
  • 10:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe2002.codfw.wmnet with OS bookworm
  • 10:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T367261)', diff saved to https://phabricator.wikimedia.org/P64943 and previous config saved to /var/cache/conftool/dbconfig/20240614-104352-marostegui.json
  • 10:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 10:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 10:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367261)', diff saved to https://phabricator.wikimedia.org/P64942 and previous config saved to /var/cache/conftool/dbconfig/20240614-104330-marostegui.json
  • 10:39 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 10:37 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 10:33 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
  • 10:30 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
  • 10:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P64941 and previous config saved to /var/cache/conftool/dbconfig/20240614-102823-marostegui.json
  • 10:28 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2002.codfw.wmnet with OS bookworm
  • 10:25 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host moss-be2001.codfw.wmnet with OS bookworm
  • 10:17 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-maint1001.eqiad.wmnet with OS bookworm
  • 10:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P64940 and previous config saved to /var/cache/conftool/dbconfig/20240614-101316-marostegui.json
  • 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-maint1001.eqiad.wmnet - jmm@cumin2002"
  • 09:59 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-maint1001.eqiad.wmnet - jmm@cumin2002"
  • 09:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367261)', diff saved to https://phabricator.wikimedia.org/P64939 and previous config saved to /var/cache/conftool/dbconfig/20240614-095809-marostegui.json
  • 09:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2128 (T367261)', diff saved to https://phabricator.wikimedia.org/P64938 and previous config saved to /var/cache/conftool/dbconfig/20240614-095434-marostegui.json
  • 09:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 09:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 09:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 09:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 09:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T367261)', diff saved to https://phabricator.wikimedia.org/P64937 and previous config saved to /var/cache/conftool/dbconfig/20240614-095356-marostegui.json
  • 09:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-maint1001.eqiad.wmnet on all recursors
  • 09:45 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 09:45 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-maint1001.eqiad.wmnet on all recursors
  • 09:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:45 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-maint1001.eqiad.wmnet - jmm@cumin2002"
  • 09:44 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 09:44 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 09:43 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 09:43 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: GitLab to new version
  • 09:43 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-maint1001.eqiad.wmnet - jmm@cumin2002"
  • 09:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P64936 and previous config saved to /var/cache/conftool/dbconfig/20240614-093849-marostegui.json
  • 09:37 jynus: upgrade and restart dbprov[12]00[3456]
  • 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T364069)', diff saved to https://phabricator.wikimedia.org/P64935 and previous config saved to /var/cache/conftool/dbconfig/20240614-093657-marostegui.json
  • 09:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64934 and previous config saved to /var/cache/conftool/dbconfig/20240614-093634-marostegui.json
  • 09:31 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 09:31 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 09:31 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:31 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-maint1001.eqiad.wmnet
  • 09:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 09:30 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 09:29 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 09:29 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 09:25 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
  • 09:25 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply
  • 09:23 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
  • 09:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P64933 and previous config saved to /var/cache/conftool/dbconfig/20240614-092342-marostegui.json
  • 09:23 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/apertium: apply
  • 09:22 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 09:22 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 09:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P64932 and previous config saved to /var/cache/conftool/dbconfig/20240614-092127-marostegui.json
  • 09:14 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 09:13 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 09:10 ryankemper@cumin2002: END (ERROR) - Cookbook sre.hadoop.reboot-workers (exit_code=97) for Hadoop analytics cluster
  • 09:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T367261)', diff saved to https://phabricator.wikimedia.org/P64931 and previous config saved to /var/cache/conftool/dbconfig/20240614-090835-marostegui.json
  • 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-maint2001.codfw.wmnet
  • 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ldap-maint2001.codfw.wmnet with OS bookworm
  • 09:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P64930 and previous config saved to /var/cache/conftool/dbconfig/20240614-090620-marostegui.json
  • 09:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2123 (T367261)', diff saved to https://phabricator.wikimedia.org/P64929 and previous config saved to /var/cache/conftool/dbconfig/20240614-090457-marostegui.json
  • 09:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 09:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 09:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 09:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 09:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 09:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 09:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 09:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 08:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 08:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 08:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367261)', diff saved to https://phabricator.wikimedia.org/P64928 and previous config saved to /var/cache/conftool/dbconfig/20240614-085817-marostegui.json
  • 08:55 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2001.codfw.wmnet with OS bookworm
  • 08:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64927 and previous config saved to /var/cache/conftool/dbconfig/20240614-085113-marostegui.json
  • 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-maint2001.codfw.wmnet with reason: host reimage
  • 08:47 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-maint2001.codfw.wmnet with reason: host reimage
  • 08:44 marostegui: dbmaint eqiad s8 deploy schema change T367261
  • 08:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Long schema change
  • 08:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Long schema change
  • 08:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P64926 and previous config saved to /var/cache/conftool/dbconfig/20240614-084310-marostegui.json
  • 08:35 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe2002.codfw.wmnet with OS bookworm
  • 08:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-maint2001.codfw.wmnet with OS bookworm
  • 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-maint2001.codfw.wmnet - jmm@cumin2002"
  • 08:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P64925 and previous config saved to /var/cache/conftool/dbconfig/20240614-082803-marostegui.json
  • 08:27 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-maint2001.codfw.wmnet - jmm@cumin2002"
  • 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-maint2001.codfw.wmnet on all recursors
  • 08:27 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-maint2001.codfw.wmnet on all recursors
  • 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-maint2001.codfw.wmnet - jmm@cumin2002"
  • 08:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-maint2001.codfw.wmnet - jmm@cumin2002"
  • 08:24 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:24 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-maint2001.codfw.wmnet
  • 08:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 08:21 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 08:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
  • 08:14 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:14 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
  • 08:14 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367261)', diff saved to https://phabricator.wikimedia.org/P64924 and previous config saved to /var/cache/conftool/dbconfig/20240614-081255-marostegui.json
  • 08:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1213 (T367261)', diff saved to https://phabricator.wikimedia.org/P64923 and previous config saved to /var/cache/conftool/dbconfig/20240614-080938-marostegui.json
  • 08:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 08:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 08:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367261)', diff saved to https://phabricator.wikimedia.org/P64922 and previous config saved to /var/cache/conftool/dbconfig/20240614-080915-marostegui.json
  • 08:03 marostegui: dbmaint codfw s8 deploy schema change T367261
  • 07:56 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2002.codfw.wmnet with OS bookworm
  • 07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P64921 and previous config saved to /var/cache/conftool/dbconfig/20240614-075408-marostegui.json
  • 07:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P64920 and previous config saved to /var/cache/conftool/dbconfig/20240614-073902-marostegui.json
  • 07:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping1003.eqiad.wmnet
  • 07:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 07:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367261)', diff saved to https://phabricator.wikimedia.org/P64919 and previous config saved to /var/cache/conftool/dbconfig/20240614-072354-marostegui.json
  • 07:23 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 07:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T367261)', diff saved to https://phabricator.wikimedia.org/P64918 and previous config saved to /var/cache/conftool/dbconfig/20240614-072034-marostegui.json
  • 07:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 07:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 07:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367261)', diff saved to https://phabricator.wikimedia.org/P64917 and previous config saved to /var/cache/conftool/dbconfig/20240614-072012-marostegui.json
  • 07:19 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:17 marostegui: dbmaint eqiad s1 deploy schema change T367261
  • 07:14 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping1003.eqiad.wmnet
  • 07:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping2003.codfw.wmnet
  • 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 07:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Long schema change
  • 07:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Long schema change
  • 07:07 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 07:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P64916 and previous config saved to /var/cache/conftool/dbconfig/20240614-070505-marostegui.json
  • 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 06:53 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping2003.codfw.wmnet
  • 06:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P64915 and previous config saved to /var/cache/conftool/dbconfig/20240614-064958-marostegui.json
  • 06:41 marostegui: dbmaint codfw s1 deploy schema change T367261
  • 06:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367261)', diff saved to https://phabricator.wikimedia.org/P64914 and previous config saved to /var/cache/conftool/dbconfig/20240614-063451-marostegui.json
  • 06:34 moritzm: rebalance ganeti/C in eqiad following reboots
  • 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T367261)', diff saved to https://phabricator.wikimedia.org/P64913 and previous config saved to /var/cache/conftool/dbconfig/20240614-063138-marostegui.json
  • 06:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 06:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367261)', diff saved to https://phabricator.wikimedia.org/P64912 and previous config saved to /var/cache/conftool/dbconfig/20240614-063116-marostegui.json
  • 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P64911 and previous config saved to /var/cache/conftool/dbconfig/20240614-061609-marostegui.json
  • 06:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P64910 and previous config saved to /var/cache/conftool/dbconfig/20240614-060102-marostegui.json
  • 05:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367261)', diff saved to https://phabricator.wikimedia.org/P64909 and previous config saved to /var/cache/conftool/dbconfig/20240614-054555-marostegui.json
  • 05:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T367261)', diff saved to https://phabricator.wikimedia.org/P64908 and previous config saved to /var/cache/conftool/dbconfig/20240614-054041-marostegui.json
  • 05:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 05:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 05:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367261)', diff saved to https://phabricator.wikimedia.org/P64907 and previous config saved to /var/cache/conftool/dbconfig/20240614-054019-marostegui.json
  • 05:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P64906 and previous config saved to /var/cache/conftool/dbconfig/20240614-053023-ladsgroup.json
  • 05:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 05:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 05:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T352010)', diff saved to https://phabricator.wikimedia.org/P64905 and previous config saved to /var/cache/conftool/dbconfig/20240614-053001-ladsgroup.json
  • 05:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P64904 and previous config saved to /var/cache/conftool/dbconfig/20240614-052512-marostegui.json
  • 05:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P64903 and previous config saved to /var/cache/conftool/dbconfig/20240614-051454-ladsgroup.json
  • 05:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P64902 and previous config saved to /var/cache/conftool/dbconfig/20240614-051005-marostegui.json
  • 04:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P64901 and previous config saved to /var/cache/conftool/dbconfig/20240614-045947-ladsgroup.json
  • 04:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367261)', diff saved to https://phabricator.wikimedia.org/P64900 and previous config saved to /var/cache/conftool/dbconfig/20240614-045458-marostegui.json
  • 04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T367261)', diff saved to https://phabricator.wikimedia.org/P64899 and previous config saved to /var/cache/conftool/dbconfig/20240614-045129-marostegui.json
  • 04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 04:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64898 and previous config saved to /var/cache/conftool/dbconfig/20240614-044840-marostegui.json
  • 04:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 04:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 04:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 04:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 04:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T352010)', diff saved to https://phabricator.wikimedia.org/P64897 and previous config saved to /var/cache/conftool/dbconfig/20240614-044440-ladsgroup.json
  • 03:39 cdobbins@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_eqsin
  • 03:39 cdobbins@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_eqsin
  • 01:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T352010)', diff saved to https://phabricator.wikimedia.org/P64896 and previous config saved to /var/cache/conftool/dbconfig/20240614-010717-ladsgroup.json
  • 01:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 01:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance

2024-06-13

  • 23:56 zabe@deploy1002: Finished scap: T361041, Update interwiki cache (duration: 11m 07s)
  • 23:48 foks: removing 7 files for legal compliance
  • 23:45 zabe@deploy1002: Started scap: T361041, Update interwiki cache
  • 23:23 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=sysop_plwiki --cluster=all 2>&1 | tee /tmp/sysop_plwiki.UpdateSearchIndexConfig.log # T361041
  • 23:20 zabe@deploy1002: Finished scap: T361041 (duration: 11m 36s)
  • 23:17 foks: removing 9 files for legal compliance
  • 23:08 zabe@deploy1002: Started scap: T361041
  • 23:06 zabe@deploy1002: Sync cancelled.
  • 23:02 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 23:01 zabe@deploy1002: zabe: T361041 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:59 zabe@deploy1002: Started scap: T361041
  • 22:49 zabe: create plwiki sysop wiki # T361041
  • 22:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:05 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
  • 21:33 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-eqiad: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 21:32 jsn@deploy1002: Finished scap: Backport for Deploy QuickSurvey for Automoderator patroller workstream survey (T362969) (duration: 14m 18s)
  • 21:23 jsn@deploy1002: jsn, kgraessle: Continuing with sync
  • 21:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T364069)', diff saved to https://phabricator.wikimedia.org/P64894 and previous config saved to /var/cache/conftool/dbconfig/20240613-212230-marostegui.json
  • 21:20 jsn@deploy1002: jsn, kgraessle: Backport for Deploy QuickSurvey for Automoderator patroller workstream survey (T362969) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:17 jsn@deploy1002: Started scap: Backport for Deploy QuickSurvey for Automoderator patroller workstream survey (T362969)
  • 21:15 jsn@deploy1002: Finished scap: Backport for Look for iPadOS in user-agent, in addition to iOS. (T362723) (duration: 14m 11s)
  • 21:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P64893 and previous config saved to /var/cache/conftool/dbconfig/20240613-210723-marostegui.json
  • 21:07 jsn@deploy1002: dbrant, jsn: Continuing with sync
  • 21:04 jsn@deploy1002: dbrant, jsn: Backport for Look for iPadOS in user-agent, in addition to iOS. (T362723) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:04 topranks: changing BGP aggregate contribution policy / external route announcement cr2-eqdfw (T367439)
  • 21:03 topranks: changing BGP aggregate contribution policy / external route announcement cr2-eqord (T367439)
  • 21:01 jsn@deploy1002: Started scap: Backport for Look for iPadOS in user-agent, in addition to iOS. (T362723)
  • 20:55 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-codfw: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 20:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P64892 and previous config saved to /var/cache/conftool/dbconfig/20240613-205215-marostegui.json
  • 20:50 cdobbins@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.eqsin.wmnet
  • 20:44 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS bullseye
  • 20:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T364069)', diff saved to https://phabricator.wikimedia.org/P64891 and previous config saved to /var/cache/conftool/dbconfig/20240613-203708-marostegui.json
  • 20:17 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
  • 20:14 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
  • 20:13 foks: removing 1 file for legal compliance
  • 20:00 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl1003.eqiad.wmnet
  • 19:59 foks: removing 2 files for legal compliance
  • 19:58 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl1003.eqiad.wmnet
  • 19:58 kamila@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl1003.eqiad.wmnet
  • 19:53 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
  • 19:51 foks: removing 2 files for legal compliance
  • 19:51 cdobbins@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS bullseye
  • 19:41 foks: removing 2 files for legal compliance
  • 19:28 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 19:27 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1013.eqiad.wmnet
  • 19:27 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1013.eqiad.wmnet
  • 19:27 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 19:10 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: reimage failing
  • 19:10 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: reimage failing
  • 18:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64890 and previous config saved to /var/cache/conftool/dbconfig/20240613-184924-ladsgroup.json
  • 18:36 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
  • 18:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64889 and previous config saved to /var/cache/conftool/dbconfig/20240613-183417-ladsgroup.json
  • 18:29 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.9 refs T361403
  • 18:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 18:28 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 18:26 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 18:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 18:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64888 and previous config saved to /var/cache/conftool/dbconfig/20240613-181911-ladsgroup.json
  • 18:17 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS bullseye
  • 18:16 brennen: 1.43.0-wmf.9 train (T361403): no current blockers, rolling to group2
  • 18:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64887 and previous config saved to /var/cache/conftool/dbconfig/20240613-180404-ladsgroup.json
  • 17:57 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
  • 17:57 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS bullseye
  • 17:39 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
  • 17:33 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4038.ulsfo.wmnet
  • 17:19 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240603/ using stat1009.eqiad.wmnet)
  • 17:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367261)', diff saved to https://phabricator.wikimedia.org/P64886 and previous config saved to /var/cache/conftool/dbconfig/20240613-170602-marostegui.json
  • 16:57 brennen@deploy1002: Finished scap: Backport for Convert local function to arrow function to fix context (T367366) (duration: 16m 51s)
  • 16:43 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:43 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS info - pt1979@cumin2002"
  • 16:43 brennen@deploy1002: jforrester, brennen: Backport for Convert local function to arrow function to fix context (T367366) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:41 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS info - pt1979@cumin2002"
  • 16:40 brennen@deploy1002: Started scap: Backport for Convert local function to arrow function to fix context (T367366)
  • 16:39 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P64884 and previous config saved to /var/cache/conftool/dbconfig/20240613-163547-marostegui.json
  • 16:30 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240603 using stat1009.eqiad.wmnet)
  • 16:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe2002.codfw.wmnet with OS bookworm
  • 16:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240603 using stat1009.eqiad.wmnet)
  • 16:24 mutante: gitlab-replica.wikimedia.org - short downtime - renaming to gitlab-replica-a
  • 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:23 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 100%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64883 and previous config saved to /var/cache/conftool/dbconfig/20240613-162321-arnaudb.json
  • 16:21 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367261)', diff saved to https://phabricator.wikimedia.org/P64882 and previous config saved to /var/cache/conftool/dbconfig/20240613-162040-marostegui.json
  • 16:18 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 16:18 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Main board swap — T362033
  • 16:18 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Main board swap — T362033
  • 16:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T367261)', diff saved to https://phabricator.wikimedia.org/P64881 and previous config saved to /var/cache/conftool/dbconfig/20240613-161641-marostegui.json
  • 16:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 16:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 16:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T367261)', diff saved to https://phabricator.wikimedia.org/P64880 and previous config saved to /var/cache/conftool/dbconfig/20240613-161617-marostegui.json
  • 16:11 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 16:11 cdanis: gnt-node failover -f ganeti2028.codfw.wmnet
  • 16:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
  • 16:09 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:08 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:08 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
  • 16:08 cdanis: forcibly rebooted ganeti2028, drdbd hung
  • 16:08 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 75%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64878 and previous config saved to /var/cache/conftool/dbconfig/20240613-160816-arnaudb.json
  • 16:07 ebernhardson@deploy1002: Finished deploy [airflow-dags/search@ee5a291]: make public data from wdqs subgraph analysis readable by others (duration: 00m 22s)
  • 16:06 ebernhardson@deploy1002: Started deploy [airflow-dags/search@ee5a291]: make public data from wdqs subgraph analysis readable by others
  • 16:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2220 (T364069)', diff saved to https://phabricator.wikimedia.org/P64877 and previous config saved to /var/cache/conftool/dbconfig/20240613-160453-marostegui.json
  • 16:04 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 16:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 16:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T364069)', diff saved to https://phabricator.wikimedia.org/P64876 and previous config saved to /var/cache/conftool/dbconfig/20240613-160431-marostegui.json
  • 16:04 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P64875 and previous config saved to /var/cache/conftool/dbconfig/20240613-160110-marostegui.json
  • 15:54 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 15:53 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 50%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64874 and previous config saved to /var/cache/conftool/dbconfig/20240613-155310-arnaudb.json
  • 15:52 elukey: drop mediawiki-services-restbase docker images from the Docker Registry - T367427
  • 15:51 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 15:50 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2002.codfw.wmnet with OS bookworm
  • 15:50 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-fe2002.codfw.wmnet with OS bookworm
  • 15:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P64873 and previous config saved to /var/cache/conftool/dbconfig/20240613-154924-marostegui.json
  • 15:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P64872 and previous config saved to /var/cache/conftool/dbconfig/20240613-154603-marostegui.json
  • 15:44 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: host reimage
  • 15:42 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: host reimage
  • 15:41 cdobbins@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_eqsin
  • 15:38 cdobbins@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_eqsin
  • 15:38 ChrisDobbins901_: cdobbins@cumin1002 sudo -i cookbook sre.cdn.roll-reboot --alias 'cp-upload_eqsin' --batchsize 1 --reason T366555 --task-id T366555 --grace-sleep 5400
  • 15:38 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply
  • 15:38 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 25%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64871 and previous config saved to /var/cache/conftool/dbconfig/20240613-153805-arnaudb.json
  • 15:37 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/ratelimit: apply
  • 15:37 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply
  • 15:37 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2002.codfw.wmnet with OS bookworm
  • 15:36 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe2002.codfw.wmnet with OS bookworm
  • 15:35 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/ratelimit: apply
  • 15:34 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 15:34 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 15:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P64870 and previous config saved to /var/cache/conftool/dbconfig/20240613-153417-marostegui.json
  • 15:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T367261)', diff saved to https://phabricator.wikimedia.org/P64869 and previous config saved to /var/cache/conftool/dbconfig/20240613-153056-marostegui.json
  • 15:28 Lucas_WMDE: STOPPED lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki enwiki --current --all --touched-after=20240524120000 --start '["55386869"]' 2>&1 | tee -a ~/T315510-enwiki-9; date # Ctrl+C – had slowed down, unnecessary work by this point; was at --start '["55914913"]'
  • 15:28 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 15:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2214 (T367261)', diff saved to https://phabricator.wikimedia.org/P64868 and previous config saved to /var/cache/conftool/dbconfig/20240613-152748-marostegui.json
  • 15:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 15:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 15:27 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 15:26 elukey: drop mediawiki-services-parsoid docker images from the Docker Registry - T367427
  • 15:25 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2002.codfw.wmnet with OS bookworm
  • 15:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 15:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 15:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367261)', diff saved to https://phabricator.wikimedia.org/P64867 and previous config saved to /var/cache/conftool/dbconfig/20240613-152420-marostegui.json
  • 15:23 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 10%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64866 and previous config saved to /var/cache/conftool/dbconfig/20240613-152300-arnaudb.json
  • 15:22 elukey: drop eventgate-ci docker images from the Docker Registry
  • 15:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T364069)', diff saved to https://phabricator.wikimedia.org/P64865 and previous config saved to /var/cache/conftool/dbconfig/20240613-151910-marostegui.json
  • 15:15 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 15:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P64864 and previous config saved to /var/cache/conftool/dbconfig/20240613-150913-marostegui.json
  • 15:08 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:07 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:07 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:07 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:07 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:07 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:05 volans: upgrading spicerack on cumin1002 to v8.6.0
  • 15:04 topranks: rebooting lsw1-f6-codfw to upgrade JunOS on switch T365983
  • 15:04 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:35:00 on an-worker[1169-1171].eqiad.wmnet,es1039.eqiad.wmnet,ms-be1080.eqiad.wmnet with reason: JunOS upgrade lsw1-f6-eqiad
  • 15:04 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:35:00 on an-worker[1169-1171].eqiad.wmnet,es1039.eqiad.wmnet,ms-be1080.eqiad.wmnet with reason: JunOS upgrade lsw1-f6-eqiad
  • 15:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64863 and previous config saved to /var/cache/conftool/dbconfig/20240613-150332-ladsgroup.json
  • 15:03 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-f6-eqiad,lsw1-f6-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f6-eqiad
  • 15:03 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-f6-eqiad,lsw1-f6-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f6-eqiad
  • 15:01 cdanis@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:01 cdanis@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:00 cdanis@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:59 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1003
  • 14:59 cdanis@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:59 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1003
  • 14:59 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:57 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 14:57 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:57 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1003
  • 14:57 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1003
  • 14:55 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:55 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P64862 and previous config saved to /var/cache/conftool/dbconfig/20240613-145406-marostegui.json
  • 14:53 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:51 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1039.eqiad.wmnet with reason: T365983
  • 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1039.eqiad.wmnet with reason: T365983
  • 14:50 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 depool ahead of T365983', diff saved to https://phabricator.wikimedia.org/P64861 and previous config saved to /var/cache/conftool/dbconfig/20240613-145035-arnaudb.json
  • 14:49 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:49 moritzm: rebalance ganeti/B in eqiad following reboots
  • 14:49 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 14:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64860 and previous config saved to /var/cache/conftool/dbconfig/20240613-144825-ladsgroup.json
  • 14:47 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1003
  • 14:46 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:46 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:45 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1003
  • 14:44 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 14:44 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 14:44 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 14:44 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 14:44 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 14:41 hashar@deploy1002: Finished deploy [gerrit/gerrit@89042ad]: Gerrit to snapshot version 3.9.5-22-g7380128525 on gerrit1003 # T358762 (duration: 00m 05s)
  • 14:41 hashar@deploy1002: Started deploy [gerrit/gerrit@89042ad]: Gerrit to snapshot version 3.9.5-22-g7380128525 on gerrit1003 # T358762
  • 14:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367261)', diff saved to https://phabricator.wikimedia.org/P64859 and previous config saved to /var/cache/conftool/dbconfig/20240613-143859-marostegui.json
  • 14:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T367261)', diff saved to https://phabricator.wikimedia.org/P64858 and previous config saved to /var/cache/conftool/dbconfig/20240613-143554-marostegui.json
  • 14:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 14:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 14:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64857 and previous config saved to /var/cache/conftool/dbconfig/20240613-143531-marostegui.json
  • 14:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64856 and previous config saved to /var/cache/conftool/dbconfig/20240613-143318-ladsgroup.json
  • 14:32 hashar@deploy1002: Finished deploy [gerrit/gerrit@89042ad]: Gerrit to snapshot version 3.9.5-22-g7380128525 on gerrit2002 # T358762 (duration: 00m 07s)
  • 14:32 hashar@deploy1002: Started deploy [gerrit/gerrit@89042ad]: Gerrit to snapshot version 3.9.5-22-g7380128525 on gerrit2002 # T358762
  • 14:27 bblack: authdns-update for https://gerrit.wikimedia.org/r/1042490 (remaps some Facebook ranges to codfw+eqiad)
  • 14:24 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
  • 14:21 cgoubert@deploy1002: Finished scap: Change mwapi listener to mw-api-int - T333120 (duration: 06m 47s)
  • 14:21 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
  • 14:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P64855 and previous config saved to /var/cache/conftool/dbconfig/20240613-142024-marostegui.json
  • 14:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64854 and previous config saved to /var/cache/conftool/dbconfig/20240613-141810-ladsgroup.json
  • 14:16 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 14:16 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 14:15 cgoubert@deploy1002: Started scap: Change mwapi listener to mw-api-int - T333120
  • 14:05 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Load EntitySchema on Test Wikidata clients (T363153) (duration: 14m 14s)
  • 14:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P64853 and previous config saved to /var/cache/conftool/dbconfig/20240613-140517-marostegui.json
  • 14:03 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 14:00 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1033.eqiad.wmnet with reason: reimage and move to OVS
  • 14:00 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: sync
  • 13:59 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1033.eqiad.wmnet with reason: reimage and move to OVS
  • 13:59 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: sync
  • 13:56 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
  • 13:55 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: sync
  • 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64852 and previous config saved to /var/cache/conftool/dbconfig/20240613-135523-ladsgroup.json
  • 13:55 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: sync
  • 13:55 claime: roll-restarting shellbox-constraints
  • 13:53 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Load EntitySchema on Test Wikidata clients (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:51 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Load EntitySchema on Test Wikidata clients (T363153)
  • 13:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64851 and previous config saved to /var/cache/conftool/dbconfig/20240613-135010-marostegui.json
  • 13:48 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 13:47 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 13:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64850 and previous config saved to /var/cache/conftool/dbconfig/20240613-134701-marostegui.json
  • 13:47 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:40:00 on lsw1-f6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
  • 13:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 13:46 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:40:00 on lsw1-f6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
  • 13:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 13:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367261)', diff saved to https://phabricator.wikimedia.org/P64849 and previous config saved to /var/cache/conftool/dbconfig/20240613-134639-marostegui.json
  • 13:45 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [svwikt] Add a temporary logo for the 100.000 pages (T364247) (duration: 13m 24s)
  • 13:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T352010)', diff saved to https://phabricator.wikimedia.org/P64848 and previous config saved to /var/cache/conftool/dbconfig/20240613-134456-ladsgroup.json
  • 13:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 13:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64847 and previous config saved to /var/cache/conftool/dbconfig/20240613-134017-ladsgroup.json
  • 13:36 logmsgbot: lucaswerkmeister-wmde@deploy1002 superpes, lucaswerkmeister-wmde: Continuing with sync
  • 13:34 logmsgbot: lucaswerkmeister-wmde@deploy1002 superpes, lucaswerkmeister-wmde: Backport for [svwikt] Add a temporary logo for the 100.000 pages (T364247) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:33 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:33 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:32 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [svwikt] Add a temporary logo for the 100.000 pages (T364247)
  • 13:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P64846 and previous config saved to /var/cache/conftool/dbconfig/20240613-133132-marostegui.json
  • 13:30 volans: upgrading spicerack on cumin2002 to v8.6.0
  • 13:26 moritzm: installing pillow security updates
  • 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64845 and previous config saved to /var/cache/conftool/dbconfig/20240613-132512-ladsgroup.json
  • 13:18 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1032.eqiad.wmnet with OS bookworm
  • 13:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64844 and previous config saved to /var/cache/conftool/dbconfig/20240613-131746-ladsgroup.json
  • 13:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 13:17 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 13:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P64843 and previous config saved to /var/cache/conftool/dbconfig/20240613-131625-marostegui.json
  • 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P64842 and previous config saved to /var/cache/conftool/dbconfig/20240613-131006-ladsgroup.json
  • 13:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 13:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 13:06 moritzm: installing pillow security updates
  • 13:03 jmm@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet
  • 13:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367261)', diff saved to https://phabricator.wikimedia.org/P64841 and previous config saved to /var/cache/conftool/dbconfig/20240613-130117-marostegui.json
  • 12:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T367261)', diff saved to https://phabricator.wikimedia.org/P64840 and previous config saved to /var/cache/conftool/dbconfig/20240613-125700-marostegui.json
  • 12:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 12:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 12:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T367261)', diff saved to https://phabricator.wikimedia.org/P64839 and previous config saved to /var/cache/conftool/dbconfig/20240613-125648-marostegui.json
  • 12:52 jmm@cumin1002: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
  • 12:51 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
  • 12:48 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
  • 12:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P64838 and previous config saved to /var/cache/conftool/dbconfig/20240613-124141-marostegui.json
  • 12:39 elukey: reset BIOS/BMC to factory default on sretest1001 - T365372
  • 12:30 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1032.eqiad.wmnet with OS bookworm
  • 12:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P64837 and previous config saved to /var/cache/conftool/dbconfig/20240613-122634-marostegui.json
  • 12:26 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1032.eqiad.wmnet with reason: reimage and move to OVS
  • 12:26 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1032.eqiad.wmnet with reason: reimage and move to OVS
  • 12:21 ladsgroup@deploy1002: Finished scap: Backport for Temporarily bump circuit breaking threshold to 350 (duration: 12m 13s)
  • 12:20 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:19 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:17 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:16 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:15 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:12 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 12:12 ladsgroup@deploy1002: ladsgroup: Backport for Temporarily bump circuit breaking threshold to 350 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T367261)', diff saved to https://phabricator.wikimedia.org/P64836 and previous config saved to /var/cache/conftool/dbconfig/20240613-121127-marostegui.json
  • 12:09 ladsgroup@deploy1002: Started scap: Backport for Temporarily bump circuit breaking threshold to 350
  • 12:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T367261)', diff saved to https://phabricator.wikimedia.org/P64835 and previous config saved to /var/cache/conftool/dbconfig/20240613-120711-marostegui.json
  • 12:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 12:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 12:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 12:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 12:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367261)', diff saved to https://phabricator.wikimedia.org/P64834 and previous config saved to /var/cache/conftool/dbconfig/20240613-120644-marostegui.json
  • 11:58 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 11:57 fabfur: enabling puppet && repool cp4037 (T360454)
  • 11:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P64832 and previous config saved to /var/cache/conftool/dbconfig/20240613-115137-marostegui.json
  • 11:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P64831 and previous config saved to /var/cache/conftool/dbconfig/20240613-113630-marostegui.json
  • 11:35 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 11:29 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 11:28 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 11:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2001.codfw.wmnet
  • 11:22 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 11:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367261)', diff saved to https://phabricator.wikimedia.org/P64830 and previous config saved to /var/cache/conftool/dbconfig/20240613-112122-marostegui.json
  • 11:20 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubemaster2001.codfw.wmnet
  • 11:19 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2003.codfw.wmnet
  • 11:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T367261)', diff saved to https://phabricator.wikimedia.org/P64829 and previous config saved to /var/cache/conftool/dbconfig/20240613-111706-marostegui.json
  • 11:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 11:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64828 and previous config saved to /var/cache/conftool/dbconfig/20240613-111655-ladsgroup.json
  • 11:16 moritzm: installing pillow security updates
  • 11:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 11:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 11:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367261)', diff saved to https://phabricator.wikimedia.org/P64827 and previous config saved to /var/cache/conftool/dbconfig/20240613-111642-marostegui.json
  • 11:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 11:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P64826 and previous config saved to /var/cache/conftool/dbconfig/20240613-111633-ladsgroup.json
  • 11:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2002.codfw.wmnet
  • 11:09 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 11:08 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:08 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:07 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubemaster2002.codfw.wmnet
  • 11:01 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P64825 and previous config saved to /var/cache/conftool/dbconfig/20240613-110135-marostegui.json
  • 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P64824 and previous config saved to /var/cache/conftool/dbconfig/20240613-110126-ladsgroup.json
  • 10:59 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:58 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1001.eqiad.wmnet
  • 10:55 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 10:52 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 10:49 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 10:49 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubemaster1001.eqiad.wmnet
  • 10:48 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:48 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:48 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 10:48 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:47 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1002.eqiad.wmnet
  • 10:47 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:46 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:46 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P64823 and previous config saved to /var/cache/conftool/dbconfig/20240613-104628-marostegui.json
  • 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P64822 and previous config saved to /var/cache/conftool/dbconfig/20240613-104619-ladsgroup.json
  • 10:43 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 10:42 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 10:41 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 10:41 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2010.codfw.wmnet
  • 10:41 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubemaster1002.eqiad.wmnet
  • 10:39 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 10:34 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main2010.codfw.wmnet
  • 10:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2009.codfw.wmnet
  • 10:33 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 10:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367261)', diff saved to https://phabricator.wikimedia.org/P64821 and previous config saved to /var/cache/conftool/dbconfig/20240613-103120-marostegui.json
  • 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P64820 and previous config saved to /var/cache/conftool/dbconfig/20240613-103111-ladsgroup.json
  • 10:31 cmooney@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1003']
  • 10:30 cmooney@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1003']
  • 10:29 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 10:29 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 10:28 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main2009.codfw.wmnet
  • 10:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2008.codfw.wmnet
  • 10:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T367261)', diff saved to https://phabricator.wikimedia.org/P64819 and previous config saved to /var/cache/conftool/dbconfig/20240613-102659-marostegui.json
  • 10:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 10:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[2287-2290].codfw.wmnet
  • 10:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[2287-2290].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
  • 10:26 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 10:26 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 10:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 10:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 10:23 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[2287-2290].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
  • 10:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 10:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 10:22 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main2008.codfw.wmnet
  • 10:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2007.codfw.wmnet
  • 10:21 hashar: Gerrit upgrade completed
  • 10:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 10:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 10:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367261)', diff saved to https://phabricator.wikimedia.org/P64818 and previous config saved to /var/cache/conftool/dbconfig/20240613-102016-marostegui.json
  • 10:20 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:15 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main2007.codfw.wmnet
  • 10:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2006.codfw.wmnet
  • 10:10 fabfur: cp4037 depooled && puppet disable to profile benthos configuration (T360454)
  • 10:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main2006.codfw.wmnet
  • 10:09 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 10:08 hashar@deploy1002: Finished deploy [gerrit/gerrit@ee8252a]: Gerrit to snapshot version 3.9.5-21-g553ea468a1 on gerrit1003 # T367029 T367135 (duration: 00m 06s)
  • 10:08 hashar@deploy1002: Started deploy [gerrit/gerrit@ee8252a]: Gerrit to snapshot version 3.9.5-21-g553ea468a1 on gerrit1003 # T367029 T367135
  • 10:06 cgoubert@cumin1002: START - Cookbook sre.hosts.decommission for hosts mw[2287-2290].codfw.wmnet
  • 10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[2281,2283-2286].codfw.wmnet
  • 10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[2281,2283-2286].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
  • 10:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P64816 and previous config saved to /var/cache/conftool/dbconfig/20240613-100509-marostegui.json
  • 10:04 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[2281,2283-2286].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
  • 10:04 hashar@deploy1002: Finished deploy [gerrit/gerrit@ee8252a]: Gerrit to snapshot version 3.9.5-21-g553ea468a1 (duration: 00m 08s)
  • 10:04 hashar@deploy1002: Started deploy [gerrit/gerrit@ee8252a]: Gerrit to snapshot version 3.9.5-21-g553ea468a1
  • 10:03 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl2003.codfw.wmnet
  • 10:03 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:03 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 10:02 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 10:02 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:01 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 09:59 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 09:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main1010.eqiad.wmnet
  • 09:53 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 09:52 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main1010.eqiad.wmnet
  • 09:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main1009.eqiad.wmnet
  • 09:50 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2001.eqiad.wmnet
  • 09:50 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2003.eqiad.wmnet
  • 09:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P64815 and previous config saved to /var/cache/conftool/dbconfig/20240613-095002-marostegui.json
  • 09:47 cgoubert@cumin1002: START - Cookbook sre.hosts.decommission for hosts mw[2281,2283-2286].codfw.wmnet
  • 09:46 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl2003.codfw.wmnet
  • 09:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main1009.eqiad.wmnet
  • 09:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main1008.eqiad.wmnet
  • 09:39 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main1008.eqiad.wmnet
  • 09:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main1007.eqiad.wmnet
  • 09:39 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2001.codfw.wmnet
  • 09:38 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 09:37 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:37 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367261)', diff saved to https://phabricator.wikimedia.org/P64814 and previous config saved to /var/cache/conftool/dbconfig/20240613-093455-marostegui.json
  • 09:33 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main1007.eqiad.wmnet
  • 09:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main1006.eqiad.wmnet
  • 09:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1224 (T367261)', diff saved to https://phabricator.wikimedia.org/P64813 and previous config saved to /var/cache/conftool/dbconfig/20240613-093158-marostegui.json
  • 09:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 09:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 09:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367261)', diff saved to https://phabricator.wikimedia.org/P64812 and previous config saved to /var/cache/conftool/dbconfig/20240613-093136-marostegui.json
  • 09:26 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main1006.eqiad.wmnet
  • 09:22 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 09:17 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 09:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P64811 and previous config saved to /var/cache/conftool/dbconfig/20240613-091629-marostegui.json
  • 09:12 arnaudb@cumin1002: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64810 and previous config saved to /var/cache/conftool/dbconfig/20240613-091200-arnaudb.json
  • 09:07 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
  • 09:07 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 09:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P64809 and previous config saved to /var/cache/conftool/dbconfig/20240613-090122-marostegui.json
  • 08:59 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 08:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64808 and previous config saved to /var/cache/conftool/dbconfig/20240613-085654-arnaudb.json
  • 08:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367261)', diff saved to https://phabricator.wikimedia.org/P64807 and previous config saved to /var/cache/conftool/dbconfig/20240613-084615-marostegui.json
  • 08:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T367261)', diff saved to https://phabricator.wikimedia.org/P64806 and previous config saved to /var/cache/conftool/dbconfig/20240613-084310-marostegui.json
  • 08:43 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 08:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 08:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 08:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367261)', diff saved to https://phabricator.wikimedia.org/P64805 and previous config saved to /var/cache/conftool/dbconfig/20240613-084248-marostegui.json
  • 08:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64804 and previous config saved to /var/cache/conftool/dbconfig/20240613-084149-arnaudb.json
  • 08:37 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 08:36 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 08:30 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 08:29 kart_: Updated MinT to 2024-06-12-111204-production (T363563)
  • 08:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P64803 and previous config saved to /var/cache/conftool/dbconfig/20240613-082741-marostegui.json
  • 08:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db2125 (re)pooling @ 25%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64802 and previous config saved to /var/cache/conftool/dbconfig/20240613-082643-arnaudb.json
  • 08:25 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 08:15 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 08:13 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 08:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P64801 and previous config saved to /var/cache/conftool/dbconfig/20240613-081234-marostegui.json
  • 08:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db2125 (re)pooling @ 10%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64800 and previous config saved to /var/cache/conftool/dbconfig/20240613-081138-arnaudb.json
  • 08:11 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 08:08 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2125.codfw.wmnet with reason: index issue
  • 08:08 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2125.codfw.wmnet with reason: index issue
  • 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'index error depool db2125', diff saved to https://phabricator.wikimedia.org/P64799 and previous config saved to /var/cache/conftool/dbconfig/20240613-080624-arnaudb.json
  • 08:06 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 07:59 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 07:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367261)', diff saved to https://phabricator.wikimedia.org/P64798 and previous config saved to /var/cache/conftool/dbconfig/20240613-075727-marostegui.json
  • 07:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64797 and previous config saved to /var/cache/conftool/dbconfig/20240613-075500-root.json
  • 07:54 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T367261)', diff saved to https://phabricator.wikimedia.org/P64796 and previous config saved to /var/cache/conftool/dbconfig/20240613-075420-marostegui.json
  • 07:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 07:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64795 and previous config saved to /var/cache/conftool/dbconfig/20240613-075358-marostegui.json
  • 07:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64794 and previous config saved to /var/cache/conftool/dbconfig/20240613-073955-root.json
  • 07:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P64793 and previous config saved to /var/cache/conftool/dbconfig/20240613-073851-marostegui.json
  • 07:28 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64792 and previous config saved to /var/cache/conftool/dbconfig/20240613-072450-root.json
  • 07:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P64791 and previous config saved to /var/cache/conftool/dbconfig/20240613-072344-marostegui.json
  • 07:21 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
  • 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64790 and previous config saved to /var/cache/conftool/dbconfig/20240613-070944-root.json
  • 07:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64789 and previous config saved to /var/cache/conftool/dbconfig/20240613-070837-marostegui.json
  • 07:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64788 and previous config saved to /var/cache/conftool/dbconfig/20240613-070531-marostegui.json
  • 07:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 07:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 07:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T367261)', diff saved to https://phabricator.wikimedia.org/P64787 and previous config saved to /var/cache/conftool/dbconfig/20240613-070509-marostegui.json
  • 06:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64786 and previous config saved to /var/cache/conftool/dbconfig/20240613-065439-root.json
  • 06:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P64785 and previous config saved to /var/cache/conftool/dbconfig/20240613-065002-marostegui.json
  • 06:42 moritzm: rebalance ganeti clusters in eqiad following reboots
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64784 and previous config saved to /var/cache/conftool/dbconfig/20240613-063934-root.json
  • 06:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P64783 and previous config saved to /var/cache/conftool/dbconfig/20240613-063455-marostegui.json
  • 06:27 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T367261)', diff saved to https://phabricator.wikimedia.org/P64782 and previous config saved to /var/cache/conftool/dbconfig/20240613-061948-marostegui.json
  • 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T367261)', diff saved to https://phabricator.wikimedia.org/P64781 and previous config saved to /var/cache/conftool/dbconfig/20240613-061636-marostegui.json
  • 06:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 06:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367261)', diff saved to https://phabricator.wikimedia.org/P64780 and previous config saved to /var/cache/conftool/dbconfig/20240613-061613-marostegui.json
  • 06:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P64779 and previous config saved to /var/cache/conftool/dbconfig/20240613-060927-ladsgroup.json
  • 06:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 06:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 06:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64778 and previous config saved to /var/cache/conftool/dbconfig/20240613-060905-ladsgroup.json
  • 06:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P64777 and previous config saved to /var/cache/conftool/dbconfig/20240613-060107-marostegui.json
  • 05:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2218 (T364069)', diff saved to https://phabricator.wikimedia.org/P64776 and previous config saved to /var/cache/conftool/dbconfig/20240613-055747-marostegui.json
  • 05:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance
  • 05:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance
  • 05:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T364069)', diff saved to https://phabricator.wikimedia.org/P64775 and previous config saved to /var/cache/conftool/dbconfig/20240613-055725-marostegui.json
  • 05:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P64774 and previous config saved to /var/cache/conftool/dbconfig/20240613-055358-ladsgroup.json
  • 05:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1238.eqiad.wmnet with reason: Long schema change
  • 05:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1238.eqiad.wmnet with reason: Long schema change
  • 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P64773 and previous config saved to /var/cache/conftool/dbconfig/20240613-054600-marostegui.json
  • 05:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P64772 and previous config saved to /var/cache/conftool/dbconfig/20240613-054218-marostegui.json
  • 05:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P64771 and previous config saved to /var/cache/conftool/dbconfig/20240613-053851-ladsgroup.json
  • 05:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367261)', diff saved to https://phabricator.wikimedia.org/P64770 and previous config saved to /var/cache/conftool/dbconfig/20240613-053052-marostegui.json
  • 05:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T367261)', diff saved to https://phabricator.wikimedia.org/P64769 and previous config saved to /var/cache/conftool/dbconfig/20240613-052746-marostegui.json
  • 05:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 05:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 05:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367261)', diff saved to https://phabricator.wikimedia.org/P64768 and previous config saved to /var/cache/conftool/dbconfig/20240613-052723-marostegui.json
  • 05:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P64767 and previous config saved to /var/cache/conftool/dbconfig/20240613-052711-marostegui.json
  • 05:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64766 and previous config saved to /var/cache/conftool/dbconfig/20240613-052344-ladsgroup.json
  • 05:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P64765 and previous config saved to /var/cache/conftool/dbconfig/20240613-051216-marostegui.json
  • 05:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T364069)', diff saved to https://phabricator.wikimedia.org/P64764 and previous config saved to /var/cache/conftool/dbconfig/20240613-051204-marostegui.json
  • 04:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P64763 and previous config saved to /var/cache/conftool/dbconfig/20240613-045709-marostegui.json
  • 04:55 marostegui: dbmaint eqiad s5 deploy schema change on db1230 T364299
  • 04:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Long schema change
  • 04:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Long schema change
  • 04:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1230 T367146', diff saved to https://phabricator.wikimedia.org/P64762 and previous config saved to /var/cache/conftool/dbconfig/20240613-045254-root.json
  • 04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1183 to s5 primary and set section read-write T367146', diff saved to https://phabricator.wikimedia.org/P64761 and previous config saved to /var/cache/conftool/dbconfig/20240613-045141-root.json
  • 04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T367146', diff saved to https://phabricator.wikimedia.org/P64760 and previous config saved to /var/cache/conftool/dbconfig/20240613-045121-root.json
  • 04:51 marostegui: Starting s5 eqiad failover from db1230 to db1183 - T367146
  • 04:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367261)', diff saved to https://phabricator.wikimedia.org/P64759 and previous config saved to /var/cache/conftool/dbconfig/20240613-044201-marostegui.json
  • 04:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T367261)', diff saved to https://phabricator.wikimedia.org/P64758 and previous config saved to /var/cache/conftool/dbconfig/20240613-043848-marostegui.json
  • 04:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 04:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 04:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 04:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 04:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s5 T367146
  • 04:32 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1183 with weight 0 T367146', diff saved to https://phabricator.wikimedia.org/P64757 and previous config saved to /var/cache/conftool/dbconfig/20240613-043239-root.json
  • 04:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s5 T367146
  • 00:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T364069)', diff saved to https://phabricator.wikimedia.org/P64756 and previous config saved to /var/cache/conftool/dbconfig/20240613-004247-marostegui.json
  • 00:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2208.codfw.wmnet with reason: Maintenance
  • 00:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2208.codfw.wmnet with reason: Maintenance
  • 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64755 and previous config saved to /var/cache/conftool/dbconfig/20240613-003507-ladsgroup.json
  • 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 00:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 00:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P64754 and previous config saved to /var/cache/conftool/dbconfig/20240613-003444-ladsgroup.json
  • 00:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P64753 and previous config saved to /var/cache/conftool/dbconfig/20240613-001937-ladsgroup.json
  • 00:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P64752 and previous config saved to /var/cache/conftool/dbconfig/20240613-000430-ladsgroup.json

2024-06-12

  • 23:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P64751 and previous config saved to /var/cache/conftool/dbconfig/20240612-234923-ladsgroup.json
  • 22:17 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 22:13 krinkle@deploy1002: Finished scap: Backport for Move etcd.php from wmf-config/ to src/ (T308932) (duration: 13m 42s)
  • 22:10 eevans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 22:08 eevans@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 22:07 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4037.ulsfo.wmnet with OS bullseye
  • 22:06 eevans@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 22:04 krinkle@deploy1002: krinkle: Continuing with sync
  • 22:04 eevans@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 22:03 krinkle@deploy1002: krinkle: Backport for Move etcd.php from wmf-config/ to src/ (T308932) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:59 krinkle@deploy1002: Started scap: Backport for Move etcd.php from wmf-config/ to src/ (T308932)
  • 21:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
  • 21:42 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Apply remote logging fix (r1042273) - eevans@cumin1002
  • 21:41 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
  • 21:36 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: sync
  • 21:36 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: sync
  • 21:36 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 21:35 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 21:34 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 21:33 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 21:33 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 21:32 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 21:31 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: sync
  • 21:31 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: sync
  • 21:30 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 21:30 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 21:28 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: sync
  • 21:28 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 21:28 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 21:27 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 21:26 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 21:25 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 21:24 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 21:22 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: sync
  • 21:22 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: sync
  • 21:21 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Apply remote logging fix (r1042273) - eevans@cumin1002
  • 21:20 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs1010.eqiad.wmnet: Apply remote logging fix (r1042273) - eevans@cumin1002
  • 21:19 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
  • 21:18 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
  • 21:17 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 21:17 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 21:13 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs1010.eqiad.wmnet: Apply remote logging fix (r1042273) - eevans@cumin1002
  • 21:11 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 21:05 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
  • 21:05 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 21:04 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
  • 20:53 cjming: end of UTC late backport window
  • 20:52 cjming@deploy1002: Finished scap: Backport for Don't squish images in non-responsive skins e.g. Vector 2010 (T113101) (duration: 12m 52s)
  • 20:47 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 20:44 cjming@deploy1002: cjming, jdlrobson: Continuing with sync
  • 20:42 cjming@deploy1002: cjming, jdlrobson: Backport for Don't squish images in non-responsive skins e.g. Vector 2010 (T113101) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:39 cjming@deploy1002: Started scap: Backport for Don't squish images in non-responsive skins e.g. Vector 2010 (T113101)
  • 20:29 cjming@deploy1002: Finished scap: Backport for Disable quick surveys using deprecated configuration (T367128) (duration: 11m 59s)
  • 20:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T367261)', diff saved to https://phabricator.wikimedia.org/P64750 and previous config saved to /var/cache/conftool/dbconfig/20240612-202233-marostegui.json
  • 20:21 cjming@deploy1002: jdlrobson, cjming: Continuing with sync
  • 20:19 cjming@deploy1002: jdlrobson, cjming: Backport for Disable quick surveys using deprecated configuration (T367128) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:17 cjming@deploy1002: Started scap: Backport for Disable quick surveys using deprecated configuration (T367128)
  • 20:10 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_codfw
  • 20:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P64749 and previous config saved to /var/cache/conftool/dbconfig/20240612-200726-marostegui.json
  • 20:00 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 19:59 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 19:58 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.9 refs T361403
  • 19:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P64748 and previous config saved to /var/cache/conftool/dbconfig/20240612-195219-marostegui.json
  • 19:49 hashar@deploy1002: Finished deploy [gerrit/gerrit@e4c49f9]: wm-patch-demo: silently ignore errors - T367155 (duration: 00m 07s)
  • 19:49 hashar@deploy1002: Started deploy [gerrit/gerrit@e4c49f9]: wm-patch-demo: silently ignore errors - T367155
  • 19:48 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 19:48 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 19:48 brennen: 1.43.0-wmf.9 train (T361403): blockers (hopefully) resolved, rolling to group1
  • 19:46 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 19:45 brennen@deploy1002: Finished scap: Backport for Call NamespaceRegistrationHandler::setConstants() earlier (T367334 T363153) (duration: 13m 06s)
  • 19:45 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 19:43 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 19:43 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 19:41 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 19:40 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 19:40 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 19:39 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 19:39 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 19:38 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 19:37 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 19:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T367261)', diff saved to https://phabricator.wikimedia.org/P64747 and previous config saved to /var/cache/conftool/dbconfig/20240612-193712-marostegui.json
  • 19:36 brennen@deploy1002: brennen: Continuing with sync
  • 19:36 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 19:36 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 19:36 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 19:35 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 19:35 brennen@deploy1002: brennen: Backport for Call NamespaceRegistrationHandler::setConstants() earlier (T367334 T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 19:35 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 19:34 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 19:34 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 19:34 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 19:33 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 19:32 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 19:32 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 19:32 brennen@deploy1002: Started scap: Backport for Call NamespaceRegistrationHandler::setConstants() earlier (T367334 T363153)
  • 19:32 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 19:31 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 19:31 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 19:30 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 19:30 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 19:30 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 19:29 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 19:29 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 19:28 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 19:27 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 19:26 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 19:25 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 19:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2209 (T367261)', diff saved to https://phabricator.wikimedia.org/P64746 and previous config saved to /var/cache/conftool/dbconfig/20240612-192327-marostegui.json
  • 19:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance
  • 19:23 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 19:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance
  • 19:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T367261)', diff saved to https://phabricator.wikimedia.org/P64745 and previous config saved to /var/cache/conftool/dbconfig/20240612-192303-marostegui.json
  • 19:22 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 19:22 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 19:22 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 19:19 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 19:19 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 19:18 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 19:17 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 19:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 19:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 19:11 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 19:10 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 19:09 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 19:08 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 19:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P64744 and previous config saved to /var/cache/conftool/dbconfig/20240612-190755-marostegui.json
  • 19:06 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 19:06 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 19:03 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 19:02 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 19:02 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 19:02 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 18:59 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 18:59 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 18:59 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 18:58 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 18:58 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 18:57 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 18:55 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 18:52 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 18:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P64742 and previous config saved to /var/cache/conftool/dbconfig/20240612-185248-marostegui.json
  • 18:51 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 18:49 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 18:48 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 18:42 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 18:41 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 18:40 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 18:40 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 18:39 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 18:39 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 18:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T367261)', diff saved to https://phabricator.wikimedia.org/P64741 and previous config saved to /var/cache/conftool/dbconfig/20240612-183741-marostegui.json
  • 18:24 ejegg: fundraising civicrm upgraded from 955166d1 to 76857844
  • 18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2205 (T367261)', diff saved to https://phabricator.wikimedia.org/P64740 and previous config saved to /var/cache/conftool/dbconfig/20240612-182343-marostegui.json
  • 18:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 18:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T367261)', diff saved to https://phabricator.wikimedia.org/P64739 and previous config saved to /var/cache/conftool/dbconfig/20240612-182321-marostegui.json
  • 18:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P64738 and previous config saved to /var/cache/conftool/dbconfig/20240612-180814-marostegui.json
  • 18:04 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 18:01 brennen: 1.43.0-wmf.9 train (T361403): currently blocked on T367334, holding at group0 until resolved.
  • 17:59 mutante: gitlab-replica-old - downtime, renaming to gitlab-replica-b
  • 17:58 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on gitlab-replica-old.wikimedia.org with reason: renaming gitlab-replica
  • 17:58 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab-replica-old.wikimedia.org with reason: renaming gitlab-replica
  • 17:58 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 17:57 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab1003.wikimedia.org with reason: renaming gitlab-replica
  • 17:57 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab1003.wikimedia.org with reason: renaming gitlab-replica
  • 17:56 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 17:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P64737 and previous config saved to /var/cache/conftool/dbconfig/20240612-175306-marostegui.json
  • 17:52 brett: authdns-update run on dns1004 (T364891)
  • 17:51 brett: Repool ulsfo as A:cp-text nvme upgrades are complete (T364891)
  • 17:49 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 17:39 brett: Remove downtime of cache_text/cp text servers in ulsfo - T364891
  • 17:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T367261)', diff saved to https://phabricator.wikimedia.org/P64736 and previous config saved to /var/cache/conftool/dbconfig/20240612-173759-marostegui.json
  • 17:30 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=cache_text,dc=ulsfo
  • 17:26 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 17:25 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 17:25 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 17:25 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 17:24 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 17:24 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 17:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T367261)', diff saved to https://phabricator.wikimedia.org/P64735 and previous config saved to /var/cache/conftool/dbconfig/20240612-172406-marostegui.json
  • 17:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 17:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 17:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T367261)', diff saved to https://phabricator.wikimedia.org/P64734 and previous config saved to /var/cache/conftool/dbconfig/20240612-172344-marostegui.json
  • 17:13 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 17:13 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 17:10 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 17:09 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 17:09 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:sessionstore
  • 17:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P64733 and previous config saved to /var/cache/conftool/dbconfig/20240612-170837-marostegui.json
  • 16:56 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 16:55 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 16:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P64732 and previous config saved to /var/cache/conftool/dbconfig/20240612-165329-marostegui.json
  • 16:38 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 16:31 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 16:28 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 16:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T367261)', diff saved to https://phabricator.wikimedia.org/P64730 and previous config saved to /var/cache/conftool/dbconfig/20240612-162426-marostegui.json
  • 16:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 16:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 16:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T367261)', diff saved to https://phabricator.wikimedia.org/P64729 and previous config saved to /var/cache/conftool/dbconfig/20240612-162403-marostegui.json
  • 16:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P64728 and previous config saved to /var/cache/conftool/dbconfig/20240612-162134-ladsgroup.json
  • 16:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 16:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 16:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P64727 and previous config saved to /var/cache/conftool/dbconfig/20240612-162110-ladsgroup.json
  • 16:20 brett: cumin 'A:cp-text and A:ulsfo' 'systemctl poweroff' - T364891
  • 16:19 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 16:19 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 8 hosts with reason: T364891
  • 16:18 brett@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on 8 hosts with reason: T364891
  • 16:18 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 16:18 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 16:17 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 16:17 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 16:17 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 16:13 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 16:11 jhathaway@deploy1002: Finished scap: (no justification provided) (duration: 03m 19s)
  • 16:10 jhathaway@deploy1002: Started scap: (no justification provided)
  • 16:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P64726 and previous config saved to /var/cache/conftool/dbconfig/20240612-160856-marostegui.json
  • 16:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P64725 and previous config saved to /var/cache/conftool/dbconfig/20240612-160603-ladsgroup.json
  • 16:05 eevans@cumin1002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:sessionstore
  • 16:00 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 15:55 otto@deploy1002: Finished scap: Backport for Remove EventLoggingLegacyConverter code - it has been moved to EventLogging (T353817) (duration: 12m 19s)
  • 15:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P64724 and previous config saved to /var/cache/conftool/dbconfig/20240612-155349-marostegui.json
  • 15:53 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 15:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P64723 and previous config saved to /var/cache/conftool/dbconfig/20240612-155056-ladsgroup.json
  • 15:47 otto@deploy1002: otto: Continuing with sync
  • 15:46 otto@deploy1002: otto: Backport for Remove EventLoggingLegacyConverter code - it has been moved to EventLogging (T353817) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:43 otto@deploy1002: Started scap: Backport for Remove EventLoggingLegacyConverter code - it has been moved to EventLogging (T353817)
  • 15:42 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 15:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T367261)', diff saved to https://phabricator.wikimedia.org/P64722 and previous config saved to /var/cache/conftool/dbconfig/20240612-153842-marostegui.json
  • 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P64721 and previous config saved to /var/cache/conftool/dbconfig/20240612-153549-ladsgroup.json
  • 15:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1027.eqiad.wmnet
  • 15:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding sretest2001 to codfw - jhancock@cumin2002"
  • 15:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1027.eqiad.wmnet
  • 15:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding sretest2001 to codfw - jhancock@cumin2002"
  • 15:31 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1027.eqiad.wmnet
  • 15:28 denisse@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=logstash,name=eqiad
  • 15:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 15:27 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 15:25 volans: uploaded spicerack_8.6.0 to apt.wikimedia.org bullseye-wikimedia
  • 15:25 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1003']
  • 15:24 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1003']
  • 15:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T367261)', diff saved to https://phabricator.wikimedia.org/P64720 and previous config saved to /var/cache/conftool/dbconfig/20240612-152403-marostegui.json
  • 15:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 15:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 15:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T367261)', diff saved to https://phabricator.wikimedia.org/P64719 and previous config saved to /var/cache/conftool/dbconfig/20240612-152351-marostegui.json
  • 15:23 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1003']
  • 15:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1027.eqiad.wmnet
  • 15:12 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1003']
  • 15:12 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 15:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P64718 and previous config saved to /var/cache/conftool/dbconfig/20240612-150844-marostegui.json
  • 15:02 cdanis: T364907 💙[email protected] ~ 🕚☕ sudo -i reprepro --keepunreferencedfiles includedeb bullseye-wikimedia ~/otelcol-contrib_0.102.0_linux_amd64.deb
  • 15:02 brett: authdns-update run on dns1004
  • 15:01 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 15:00 brett: Depooling ulsfo in preparation for A:cp-text downtime/poweroff for nvme upgrades (T364891)
  • 15:00 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Revert "Only register EntitySchema namespace when feature is enabled", Revert "Allow loading EntitySchema on client (only) wikis" (duration: 12m 36s)
  • 14:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P64717 and previous config saved to /var/cache/conftool/dbconfig/20240612-145337-marostegui.json
  • 14:53 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:53 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:51 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
  • 14:50 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Revert "Only register EntitySchema namespace when feature is enabled", Revert "Allow loading EntitySchema on client (only) wikis" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-main-eqiad
  • 14:49 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:49 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:47 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Revert "Only register EntitySchema namespace when feature is enabled", Revert "Allow loading EntitySchema on client (only) wikis"
  • 14:46 oblivian@deploy1002: Finished scap: Backport for Use the statsd-exporter service where available (T365265) (duration: 12m 05s)
  • 14:44 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1003.eqiad.wmnet with OS bookworm
  • 14:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T367261)', diff saved to https://phabricator.wikimedia.org/P64716 and previous config saved to /var/cache/conftool/dbconfig/20240612-143830-marostegui.json
  • 14:38 oblivian@deploy1002: oblivian: Continuing with sync
  • 14:37 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1003
  • 14:37 oblivian@deploy1002: oblivian: Backport for Use the statsd-exporter service where available (T365265) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:36 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1003
  • 14:35 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:35 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1003 to a new rack - kamila@cumin1002"
  • 14:34 moritzm: failover ganeti master in eqiad to ganeti1028
  • 14:34 oblivian@deploy1002: Started scap: Backport for Use the statsd-exporter service where available (T365265)
  • 14:34 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1003 to a new rack - kamila@cumin1002"
  • 14:31 moritzm: installing gst-plugins-base1.0 security updates
  • 14:31 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 14:31 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:29 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1038.eqiad.wmnet
  • 14:29 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1038.eqiad.wmnet
  • 14:28 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:27 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:27 claime: trafficserver: move 95% of traffic to mw-on-k8s
  • 14:27 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Allow loading EntitySchema on client (only) wikis (T363153), Only register EntitySchema namespace when feature is enabled (T363153) (duration: 12m 32s)
  • 14:27 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:24 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
  • 14:24 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T367261)', diff saved to https://phabricator.wikimedia.org/P64715 and previous config saved to /var/cache/conftool/dbconfig/20240612-142412-marostegui.json
  • 14:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 14:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 14:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 14:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 14:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367261)', diff saved to https://phabricator.wikimedia.org/P64714 and previous config saved to /var/cache/conftool/dbconfig/20240612-142335-marostegui.json
  • 14:22 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:22 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:22 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 14:21 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
  • 14:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1038.eqiad.wmnet
  • 14:20 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s5
  • 14:20 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s8
  • 14:20 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 14:20 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 14:19 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:19 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 14:19 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 14:19 jayme@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 14:18 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
  • 14:17 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Allow loading EntitySchema on client (only) wikis (T363153), Only register EntitySchema namespace when feature is enabled (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:15 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:15 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1020.eqiad.wmnet
  • 14:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Allow loading EntitySchema on client (only) wikis (T363153), Only register EntitySchema namespace when feature is enabled (T363153)
  • 14:10 moritzm: installing libarchive security updates
  • 14:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P64713 and previous config saved to /var/cache/conftool/dbconfig/20240612-140827-marostegui.json
  • 14:07 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1020.eqiad.wmnet
  • 14:02 vgutierrez: repool text@esams with IPIP encapsulation enabled - T366466
  • 14:02 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bookworm
  • 14:00 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 13:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1038.eqiad.wmnet
  • 13:55 dcausse@deploy1002: Finished deploy [wdqs/wdqs@1cf4017]: deploy to test server wdqs2023 (fix loadData.sh) (duration: 00m 13s)
  • 13:54 dcausse@deploy1002: Started deploy [wdqs/wdqs@1cf4017]: deploy to test server wdqs2023 (fix loadData.sh)
  • 13:53 vgutierrez: rolling restart of pybal on lvs3010 and lvs3008 - T366466
  • 13:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P64712 and previous config saved to /var/cache/conftool/dbconfig/20240612-135319-marostegui.json
  • 13:49 fabfur: depooled cp4037 to test benthos/haproxy configuration (T365718)
  • 13:48 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb1020.eqiad.wmnet with reason: T366555
  • 13:48 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb1020.eqiad.wmnet with reason: T366555
  • 13:48 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 13:46 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s8
  • 13:46 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s5
  • 13:46 cgoubert@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-main-eqiad
  • 13:45 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet,service=s4
  • 13:45 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet,service=s6
  • 13:45 claime: Starting kafka-main reboots in eqiad
  • 13:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 13:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 13:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T364069)', diff saved to https://phabricator.wikimedia.org/P64710 and previous config saved to /var/cache/conftool/dbconfig/20240612-134414-marostegui.json
  • 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1022.eqiad.wmnet
  • 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1022.eqiad.wmnet
  • 13:39 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for poolcounter2004.codfw.wmnet: Renew puppet certificate - elukey@cumin1002
  • 13:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367261)', diff saved to https://phabricator.wikimedia.org/P64709 and previous config saved to /var/cache/conftool/dbconfig/20240612-133812-marostegui.json
  • 13:38 elukey@cumin1002: START - Cookbook sre.puppet.renew-cert for poolcounter2004.codfw.wmnet: Renew puppet certificate - elukey@cumin1002
  • 13:37 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for poolcounter2003.codfw.wmnet: Renew puppet certificate - elukey@cumin1002
  • 13:36 elukey@cumin1002: START - Cookbook sre.puppet.renew-cert for poolcounter2003.codfw.wmnet: Renew puppet certificate - elukey@cumin1002
  • 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1022.eqiad.wmnet
  • 13:36 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for poolcounter1004.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
  • 13:35 elukey@cumin1002: START - Cookbook sre.puppet.renew-cert for poolcounter1004.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
  • 13:35 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for poolcounter1005.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
  • 13:34 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1010.eqiad.wmnet
  • 13:34 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1010.eqiad.wmnet
  • 13:34 elukey@cumin1002: START - Cookbook sre.puppet.renew-cert for poolcounter1005.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
  • 13:34 brouberol@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 13:31 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:31 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add ntp-[abc].anycast.wmnet addresses - sukhe@cumin1002"
  • 13:30 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add ntp-[abc].anycast.wmnet addresses - sukhe@cumin1002"
  • 13:30 brouberol@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 13:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P64708 and previous config saved to /var/cache/conftool/dbconfig/20240612-132907-marostegui.json
  • 13:28 sukhe: add ntp-[abc].anycast.wmnet: 10.3.0.[5-7]/32: T366360
  • 13:28 sukhe@cumin1002: START - Cookbook sre.dns.netbox
  • 13:26 vgutierrez: depool text@esams before enabling IPIP encapsulation - T366466
  • 13:26 dcausse@deploy1002: Finished deploy [wdqs/wdqs@43b966f]: deploy to test server wdqs2023 (duration: 00m 14s)
  • 13:25 dcausse@deploy1002: Started deploy [wdqs/wdqs@43b966f]: deploy to test server wdqs2023
  • 13:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T367261)', diff saved to https://phabricator.wikimedia.org/P64707 and previous config saved to /var/cache/conftool/dbconfig/20240612-132351-marostegui.json
  • 13:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 13:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 13:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1022.eqiad.wmnet
  • 13:21 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Only register EntitySchema namespace when feature is enabled (T363153) (duration: 12m 15s)
  • 13:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet
  • 13:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1021.eqiad.wmnet
  • 13:18 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1010.eqiad.wmnet with reason: Troubleshooting remote logging — T350567
  • 13:18 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1010.eqiad.wmnet with reason: Troubleshooting remote logging — T350567
  • 13:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P64706 and previous config saved to /var/cache/conftool/dbconfig/20240612-131400-marostegui.json
  • 13:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1021.eqiad.wmnet
  • 13:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on logstash1031.eqiad.wmnet with reason: reboot/ganeti
  • 13:13 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
  • 13:13 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on logstash1031.eqiad.wmnet with reason: reboot/ganeti
  • 13:12 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Only register EntitySchema namespace when feature is enabled (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 13:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 13:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Only register EntitySchema namespace when feature is enabled (T363153)
  • 13:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64705 and previous config saved to /var/cache/conftool/dbconfig/20240612-130232-root.json
  • 13:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 13:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T364069)', diff saved to https://phabricator.wikimedia.org/P64704 and previous config saved to /var/cache/conftool/dbconfig/20240612-125853-marostegui.json
  • 12:58 ladsgroup@deploy1002: Finished scap: Backport for override circuit breaking threshold for ES hosts (duration: 16m 34s)
  • 12:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
  • 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet
  • 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1020.eqiad.wmnet
  • 12:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 12:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 12:50 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 12:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1020.eqiad.wmnet
  • 12:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on logstash1030.eqiad.wmnet with reason: reboot/ganeti
  • 12:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64703 and previous config saved to /var/cache/conftool/dbconfig/20240612-124727-root.json
  • 12:47 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on logstash1030.eqiad.wmnet with reason: reboot/ganeti
  • 12:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 12:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 12:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T367261)', diff saved to https://phabricator.wikimedia.org/P64702 and previous config saved to /var/cache/conftool/dbconfig/20240612-124456-marostegui.json
  • 12:44 ladsgroup@deploy1002: ladsgroup: Backport for override circuit breaking threshold for ES hosts synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:42 ladsgroup@deploy1002: Started scap: Backport for override circuit breaking threshold for ES hosts
  • 12:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1003.eqiad.wmnet
  • 12:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping1003.eqiad.wmnet
  • 12:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64701 and previous config saved to /var/cache/conftool/dbconfig/20240612-123222-root.json
  • 12:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P64700 and previous config saved to /var/cache/conftool/dbconfig/20240612-122948-marostegui.json
  • 12:29 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 12:29 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 12:28 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 12:25 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 12:25 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 12:25 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
  • 12:24 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/echostore: apply
  • 12:18 Emperor: restart swift-proxy on ms-fe1013 T360913
  • 12:17 Emperor: restart swift-proxy on ms-fe2011 ms-fe2014 T360913
  • 12:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64699 and previous config saved to /var/cache/conftool/dbconfig/20240612-121716-root.json
  • 12:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P64698 and previous config saved to /var/cache/conftool/dbconfig/20240612-121441-marostegui.json
  • 12:14 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
  • 12:14 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • 12:13 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
  • 12:13 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 12:13 jayme@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 12:12 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/echostore: apply
  • 12:12 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/echostore: apply
  • 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet
  • 12:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1033.eqiad.wmnet
  • 12:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1033.eqiad.wmnet
  • 12:10 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 12:10 jayme@deploy1002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 12:05 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 12:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1033.eqiad.wmnet
  • 12:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64697 and previous config saved to /var/cache/conftool/dbconfig/20240612-120211-root.json
  • 12:00 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 11:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T367261)', diff saved to https://phabricator.wikimedia.org/P64696 and previous config saved to /var/cache/conftool/dbconfig/20240612-115934-marostegui.json
  • 11:59 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 11:59 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 11:58 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 11:57 claime: Manual restart of dump_cloud_ip_ranges.service on A:puppetserver and A:puppetmaster
  • 11:55 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 11:55 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 11:54 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 11:54 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 11:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
  • 11:53 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1034.eqiad.wmnet
  • 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1034.eqiad.wmnet
  • 11:53 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 11:53 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 11:52 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 11:52 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 11:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T367261)', diff saved to https://phabricator.wikimedia.org/P64695 and previous config saved to /var/cache/conftool/dbconfig/20240612-115143-marostegui.json
  • 11:51 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 11:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 11:51 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 11:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 11:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T367261)', diff saved to https://phabricator.wikimedia.org/P64693 and previous config saved to /var/cache/conftool/dbconfig/20240612-115103-marostegui.json
  • 11:50 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 11:50 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 11:50 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 11:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64692 and previous config saved to /var/cache/conftool/dbconfig/20240612-114705-root.json
  • 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1034.eqiad.wmnet
  • 11:46 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
  • 11:45 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • 11:45 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
  • 11:45 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-mcrouter: apply
  • 11:45 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mw-mcrouter: apply
  • 11:44 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
  • 11:44 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
  • 11:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1191', diff saved to https://phabricator.wikimedia.org/P64691 and previous config saved to /var/cache/conftool/dbconfig/20240612-114410-root.json
  • 11:42 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:42 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:39 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 11:38 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 11:37 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 11:37 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:37 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 11:37 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:37 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:37 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 11:36 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 11:36 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 11:36 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
  • 11:36 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 11:36 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 11:35 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P64690 and previous config saved to /var/cache/conftool/dbconfig/20240612-113556-marostegui.json
  • 11:35 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 11:31 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 11:31 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:30 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:22 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1031.eqiad.wmnet with OS bookworm
  • 11:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P64689 and previous config saved to /var/cache/conftool/dbconfig/20240612-112048-marostegui.json
  • 11:14 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:14 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:13 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:12 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:12 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:10 moritzm: rebalance ganeti cluster in eqsin following reboots
  • 11:08 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 11:08 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for EntitySchemaSlotViewRenderer: Fix Phan failure (duration: 12m 10s)
  • 11:08 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 11:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T367261)', diff saved to https://phabricator.wikimedia.org/P64688 and previous config saved to /var/cache/conftool/dbconfig/20240612-110541-marostegui.json
  • 11:04 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety" "Zabe" --reason "per request T367217"
  • 11:03 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl1003.eqiad.wmnet
  • 11:03 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:03 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 11:01 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department" "Wikimedia Foundation/Legal" "Zabe" --reason "per request T367216"
  • 11:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5007.eqsin.wmnet
  • 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
  • 10:58 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
  • 10:58 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 10:58 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for EntitySchemaSlotViewRenderer: Fix Phan failure synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:57 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Global Advocacy/Conversation hours and Events" "Wikimedia Foundation/Legal/Global Advocacy/Conversation hours and Events" "Zabe" --reason "per request T367219"
  • 10:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T367261)', diff saved to https://phabricator.wikimedia.org/P64687 and previous config saved to /var/cache/conftool/dbconfig/20240612-105615-marostegui.json
  • 10:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 10:56 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for EntitySchemaSlotViewRenderer: Fix Phan failure
  • 10:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 10:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T367261)', diff saved to https://phabricator.wikimedia.org/P64686 and previous config saved to /var/cache/conftool/dbconfig/20240612-105554-marostegui.json
  • 10:54 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
  • 10:54 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 10:53 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Global Advocacy/About" "Wikimedia Foundation/Legal/Global Advocacy/About" "Zabe" --reason "per request T367219"
  • 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
  • 10:52 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
  • 10:48 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl1003.eqiad.wmnet
  • 10:46 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl1003.eqiad.wmnet
  • 10:41 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Global Advocacy" "Wikimedia Foundation/Legal/Global Advocacy" "Zabe" --reason "per request T367219"
  • 10:41 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1019.eqiad.wmnet
  • 10:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P64685 and previous config saved to /var/cache/conftool/dbconfig/20240612-104047-marostegui.json
  • 10:33 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1031.eqiad.wmnet with OS bookworm
  • 10:27 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1019.eqiad.wmnet
  • 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5007.eqsin.wmnet
  • 10:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P64684 and previous config saved to /var/cache/conftool/dbconfig/20240612-102540-marostegui.json
  • 10:25 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 10:25 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 10:25 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 10:24 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 10:24 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 10:23 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 10:23 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 10:23 godog: remove MediaWiki.jawiki.GrowthExperiments.NewcomerTask.update_.* from graphite hosts - T362633
  • 10:23 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 10:23 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 10:22 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 10:19 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet,service=s6
  • 10:19 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet,service=s4
  • 10:19 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Grants:Community Resources" "Wikimedia Foundation/Advancement/Community Growth/Community Resources" "Zabe" --reason "per request T365837"
  • 10:17 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 10:16 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 10:16 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 10:16 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 10:16 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 10:16 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 10:16 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 10:16 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 10:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on 9 hosts with reason: decommissioning
  • 10:15 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 10:15 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 10:15 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 10:15 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on 9 hosts with reason: decommissioning
  • 10:14 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 10:14 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 10:10 claime: Depooling mw2281.codfw.wmnet,mw22[83-90].codfw.wmnet for decommission - T367275
  • 10:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T367261)', diff saved to https://phabricator.wikimedia.org/P64683 and previous config saved to /var/cache/conftool/dbconfig/20240612-101032-marostegui.json
  • 10:08 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 10:07 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 10:07 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 10:07 zabe: zabe@mwmaint1002:~$ foreachwikiindblist 'all - s4' refreshImageMetadata.php --mime image/webp # T364680
  • 09:48 fabfur: disabling puppet on cp4037 to test benthos configuration (T360454)
  • 09:47 fabfur: disabling puppet on cp4037 to test benthos configuration
  • 09:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P64680 and previous config saved to /var/cache/conftool/dbconfig/20240612-094738-marostegui.json
  • 09:47 _joe_: running dump_cloud_ip_ranges on puppetmaster1001 to test fixed script
  • 09:43 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s7
  • 09:43 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s2
  • 09:33 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P64679 and previous config saved to /var/cache/conftool/dbconfig/20240612-093231-marostegui.json
  • 09:32 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T367261)', diff saved to https://phabricator.wikimedia.org/P64678 and previous config saved to /var/cache/conftool/dbconfig/20240612-091724-marostegui.json
  • 09:11 moritzm: failover ganeti cluster for eqsin to ganeti5004
  • 09:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T367261)', diff saved to https://phabricator.wikimedia.org/P64677 and previous config saved to /var/cache/conftool/dbconfig/20240612-090959-marostegui.json
  • 09:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 09:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 09:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T367261)', diff saved to https://phabricator.wikimedia.org/P64676 and previous config saved to /var/cache/conftool/dbconfig/20240612-090937-marostegui.json
  • 09:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64675 and previous config saved to /var/cache/conftool/dbconfig/20240612-090834-ladsgroup.json
  • 09:06 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
  • 09:04 Lucas_WMDE: START lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki enwiki --current --all --touched-after=20240524120000 --start '["55386869"]' 2>&1 | tee -a ~/T315510-enwiki-9; date
  • 09:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64674 and previous config saved to /var/cache/conftool/dbconfig/20240612-090435-ladsgroup.json
  • 09:04 Lucas_WMDE: STOPPED lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki enwiki --current --all --touched-after=20240524120000 --start '["55019880"]' 2>&1 | tee -a ~/T315510-enwiki-8; date # Ctrl+C, had become very slow, trying restart
  • 08:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P64673 and previous config saved to /var/cache/conftool/dbconfig/20240612-085430-marostegui.json
  • 08:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64672 and previous config saved to /var/cache/conftool/dbconfig/20240612-085329-ladsgroup.json
  • 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5006.eqsin.wmnet
  • 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5006.eqsin.wmnet
  • 08:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64671 and previous config saved to /var/cache/conftool/dbconfig/20240612-084929-ladsgroup.json
  • 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5006.eqsin.wmnet
  • 08:42 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1002.eqiad.wmnet with reason: host reimage
  • 08:42 zabe: zabe@mwmaint1002:~$ mwscript refreshImageMetadata.php commonswiki --mime image/webp # T364680
  • 08:39 slyngshede@cumin1002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Mike Pham out of all services on: 2200 hosts
  • 08:39 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1002.eqiad.wmnet with reason: host reimage
  • 08:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P64670 and previous config saved to /var/cache/conftool/dbconfig/20240612-083923-marostegui.json
  • 08:38 slyngshede@cumin1002: START - Cookbook sre.idm.logout Logging Mike Pham out of all services on: 2200 hosts
  • 08:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 50%: Maint over', diff saved to https://phabricator.wikimedia.org/P64669 and previous config saved to /var/cache/conftool/dbconfig/20240612-083824-ladsgroup.json
  • 08:36 Lucas_WMDE: lucaswerkmeister-wmde@deploy1002 ~ $ mwscript-k8s --comment 'T367174, P12703' extensions/Wikibase/repo/maintenance/changePropertyDataType.php wikidatawiki -- --property-id P12703 --new-data-type external-id --summary 'T367174' # succeeded
  • 08:35 Lucas_WMDE: lucaswerkmeister-wmde@deploy1002 ~ $ mwscript-k8s --comment 'T367174, P12583' extensions/Wikibase/repo/maintenance/changePropertyDataType.php wikidatawiki -- --property-id P12583 --new-data-type external-id --summary 'T367174' # succeeded
  • 08:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 50%: Maint over', diff saved to https://phabricator.wikimedia.org/P64668 and previous config saved to /var/cache/conftool/dbconfig/20240612-083424-ladsgroup.json
  • 08:28 brouberol@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 08:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 08:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 08:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2123', diff saved to https://phabricator.wikimedia.org/P64667 and previous config saved to /var/cache/conftool/dbconfig/20240612-082702-marostegui.json
  • 08:26 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_codfw
  • 08:26 fabfur: start rebooting all cp-upload_codfw hosts for T366555 (spaced 1.5 hrs)
  • 08:25 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
  • 08:25 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1002
  • 08:25 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1002
  • 08:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T367261)', diff saved to https://phabricator.wikimedia.org/P64666 and previous config saved to /var/cache/conftool/dbconfig/20240612-082415-marostegui.json
  • 08:24 brouberol@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 08:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64665 and previous config saved to /var/cache/conftool/dbconfig/20240612-082318-ladsgroup.json
  • 08:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 08:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 08:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64664 and previous config saved to /var/cache/conftool/dbconfig/20240612-081918-ladsgroup.json
  • 08:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
  • 08:17 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host ganeti1019.eqiad.wmnet with OS bullseye
  • 08:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64663 and previous config saved to /var/cache/conftool/dbconfig/20240612-081643-root.json
  • 08:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T367261)', diff saved to https://phabricator.wikimedia.org/P64662 and previous config saved to /var/cache/conftool/dbconfig/20240612-081551-marostegui.json
  • 08:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 08:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 08:15 brouberol@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 08:15 brouberol@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 08:12 brouberol@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 08:12 brouberol@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 08:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P64661 and previous config saved to /var/cache/conftool/dbconfig/20240612-081158-ladsgroup.json
  • 08:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 08:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 08:09 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 08:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 08:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 07:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5005.eqsin.wmnet
  • 07:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
  • 07:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1019.eqiad.wmnet with OS bullseye
  • 07:36 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host ganeti1019.eqiad.wmnet with OS bullseye
  • 07:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
  • 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5005.eqsin.wmnet
  • 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5004.eqsin.wmnet
  • 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
  • 07:20 marostegui: dbmaint optimize pagelinks on old s6 codfw master db2214 T364069
  • 07:16 kartik@deploy1002: Finished scap: Backport for Content Translation: Set MT threshold 85% in the Portuguese Wikipedia (T356356) (duration: 13m 11s)
  • 07:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Long schema change
  • 07:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
  • 07:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Long schema change
  • 07:14 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 07:14 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 07:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2214.codfw.wmnet with reason: Long schema change
  • 07:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2214.codfw.wmnet with reason: Long schema change
  • 07:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5004.eqsin.wmnet
  • 07:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2214 T367262', diff saved to https://phabricator.wikimedia.org/P64660 and previous config saved to /var/cache/conftool/dbconfig/20240612-071340-root.json
  • 07:12 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2129 to s6 primary T367262', diff saved to https://phabricator.wikimedia.org/P64659 and previous config saved to /var/cache/conftool/dbconfig/20240612-071158-root.json
  • 07:06 kartik@deploy1002: kartik: Continuing with sync
  • 07:05 kartik@deploy1002: kartik: Backport for Content Translation: Set MT threshold 85% in the Portuguese Wikipedia (T356356) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:04 marostegui: Starting s6 codfw failover from db2214 to db2129 - T367262
  • 07:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 07:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T364069)', diff saved to https://phabricator.wikimedia.org/P64658 and previous config saved to /var/cache/conftool/dbconfig/20240612-070302-marostegui.json
  • 07:02 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 07:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 07:02 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 07:02 kartik@deploy1002: Started scap: Backport for Content Translation: Set MT threshold 85% in the Portuguese Wikipedia (T356356)
  • 07:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T364069)', diff saved to https://phabricator.wikimedia.org/P64657 and previous config saved to /var/cache/conftool/dbconfig/20240612-070240-marostegui.json
  • 07:02 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 06:58 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1019.eqiad.wmnet with OS bullseye
  • 06:55 moritzm: remove ganeti1019 from eqiad cluster T367071
  • 06:54 moritzm: rebalance ganeti clusters in codfw following reboots
  • 06:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P64656 and previous config saved to /var/cache/conftool/dbconfig/20240612-064733-marostegui.json
  • 06:44 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 06:43 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 06:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s6 T367262
  • 06:42 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2129 with weight 0 T367262', diff saved to https://phabricator.wikimedia.org/P64655 and previous config saved to /var/cache/conftool/dbconfig/20240612-064200-root.json
  • 06:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s6 T367262
  • 06:40 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 06:40 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 06:38 hashar@deploy1002: Finished deploy [gerrit/gerrit@69984f7]: wm-zuul-status: fix reload button - T360550 (duration: 00m 07s)
  • 06:38 hashar@deploy1002: Started deploy [gerrit/gerrit@69984f7]: wm-zuul-status: fix reload button - T360550
  • 06:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P64654 and previous config saved to /var/cache/conftool/dbconfig/20240612-063225-marostegui.json
  • 06:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T364069)', diff saved to https://phabricator.wikimedia.org/P64653 and previous config saved to /var/cache/conftool/dbconfig/20240612-061718-marostegui.json
  • 05:59 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 05:59 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 05:58 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 05:58 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 05:51 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 05:51 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 05:17 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 05:17 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 05:17 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 05:16 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 05:16 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 05:16 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 00:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T364069)', diff saved to https://phabricator.wikimedia.org/P64652 and previous config saved to /var/cache/conftool/dbconfig/20240612-005420-marostegui.json
  • 00:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 00:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 00:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T364069)', diff saved to https://phabricator.wikimedia.org/P64651 and previous config saved to /var/cache/conftool/dbconfig/20240612-005347-marostegui.json
  • 00:53 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_codfw
  • 00:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P64650 and previous config saved to /var/cache/conftool/dbconfig/20240612-003840-marostegui.json
  • 00:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P64649 and previous config saved to /var/cache/conftool/dbconfig/20240612-002332-marostegui.json
  • 00:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T364069)', diff saved to https://phabricator.wikimedia.org/P64648 and previous config saved to /var/cache/conftool/dbconfig/20240612-000825-marostegui.json

2024-06-11

  • 23:45 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 23:45 eevans@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 22:56 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 22:29 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:aqs-codfw
  • 21:56 ladsgroup@deploy1002: Finished scap: Backport for Fix Linker::makeExternalLink build failures (T367127) (duration: 12m 33s)
  • 21:51 ejegg: fundraising civicrm upgraded from 7252b1b9 to f7855d25
  • 21:47 ladsgroup@deploy1002: matmarex, ladsgroup: Continuing with sync
  • 21:47 ladsgroup@deploy1002: matmarex, ladsgroup: Backport for Fix Linker::makeExternalLink build failures (T367127) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:44 ladsgroup@deploy1002: Started scap: Backport for Fix Linker::makeExternalLink build failures (T367127)
  • 21:42 ladsgroup@deploy1002: Finished scap: Backport for Reduce the threshold for section wide circuit breaking to 300 (duration: 12m 08s)
  • 21:33 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 21:32 ladsgroup@deploy1002: ladsgroup: Backport for Reduce the threshold for section wide circuit breaking to 300 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:30 ladsgroup@deploy1002: Started scap: Backport for Reduce the threshold for section wide circuit breaking to 300
  • 21:27 ladsgroup@deploy1002: Finished scap: Backport for [zghwiki] Add patroller and autopatrolled groups (T357411) (duration: 11m 53s)
  • 21:18 ladsgroup@deploy1002: pppery, ladsgroup: Continuing with sync
  • 21:18 ladsgroup@deploy1002: pppery, ladsgroup: Backport for [zghwiki] Add patroller and autopatrolled groups (T357411) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:16 ladsgroup@deploy1002: Started scap: Backport for [zghwiki] Add patroller and autopatrolled groups (T357411)
  • 21:15 ladsgroup@deploy1002: Finished scap: Backport for Stop writing to the old pagelinks columns of s2 (T352010) (duration: 12m 02s)
  • 21:06 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 21:05 ladsgroup@deploy1002: ladsgroup: Backport for Stop writing to the old pagelinks columns of s2 (T352010) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:03 ladsgroup@deploy1002: Started scap: Backport for Stop writing to the old pagelinks columns of s2 (T352010)
  • 21:01 ladsgroup@deploy1002: Finished scap: Backport for Avoid wrapping floated tables using computed styles (T366314) (duration: 14m 28s)
  • 20:52 ejegg: re-enabled fundraising scheduled jobs
  • 20:52 ladsgroup@deploy1002: jdlrobson, ladsgroup: Continuing with sync
  • 20:49 ladsgroup@deploy1002: jdlrobson, ladsgroup: Backport for Avoid wrapping floated tables using computed styles (T366314) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:46 ladsgroup@deploy1002: Started scap: Backport for Avoid wrapping floated tables using computed styles (T366314)
  • 20:46 ladsgroup@deploy1002: Finished scap: Backport for Drop unused config, enable responsive tables on group 0 (T301212 T366314) (duration: 14m 18s)
  • 20:36 ladsgroup@deploy1002: ladsgroup, jdlrobson: Continuing with sync
  • 20:34 ladsgroup@deploy1002: ladsgroup, jdlrobson: Backport for Drop unused config, enable responsive tables on group 0 (T301212 T366314) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:31 ladsgroup@deploy1002: Started scap: Backport for Drop unused config, enable responsive tables on group 0 (T301212 T366314)
  • 20:30 ladsgroup@deploy1002: Finished scap: Backport for [ptwikinews] Set atom feed link (T356003), [jawikinews] Set $wgArticleCountMethod to any (T364189) (duration: 12m 52s)
  • 20:21 ladsgroup@deploy1002: pppery, ladsgroup: Continuing with sync
  • 20:20 ladsgroup@deploy1002: pppery, ladsgroup: Backport for [ptwikinews] Set atom feed link (T356003), [jawikinews] Set $wgArticleCountMethod to any (T364189) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:17 ladsgroup@deploy1002: Started scap: Backport for [ptwikinews] Set atom feed link (T356003), [jawikinews] Set $wgArticleCountMethod to any (T364189)
  • 20:16 ladsgroup@deploy1002: Finished scap: Backport for MediaWiki.org: restrict unfuzzy rights to autoconfirmed (T366994) (duration: 12m 54s)
  • 20:13 eevans@cumin1002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:aqs-codfw
  • 20:07 ladsgroup@deploy1002: ladsgroup, pppery: Continuing with sync
  • 20:06 ladsgroup@deploy1002: ladsgroup, pppery: Backport for MediaWiki.org: restrict unfuzzy rights to autoconfirmed (T366994) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:03 ladsgroup@deploy1002: Started scap: Backport for MediaWiki.org: restrict unfuzzy rights to autoconfirmed (T366994)
  • 19:38 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1002
  • 19:38 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1002
  • 19:33 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
  • 19:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64646 and previous config saved to /var/cache/conftool/dbconfig/20240611-192403-ladsgroup.json
  • 19:23 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
  • 19:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64645 and previous config saved to /var/cache/conftool/dbconfig/20240611-190855-ladsgroup.json
  • 18:59 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:aqs-eqiad
  • 18:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64644 and previous config saved to /var/cache/conftool/dbconfig/20240611-185348-ladsgroup.json
  • 18:46 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:44 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:41 ebernhardson@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64643 and previous config saved to /var/cache/conftool/dbconfig/20240611-183841-ladsgroup.json
  • 18:37 ebernhardson@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:22 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.9 refs T361403
  • 18:19 ebernhardson@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:19 ebernhardson@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T364069)', diff saved to https://phabricator.wikimedia.org/P64642 and previous config saved to /var/cache/conftool/dbconfig/20240611-181526-marostegui.json
  • 18:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 18:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 18:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 18:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 18:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T364069)', diff saved to https://phabricator.wikimedia.org/P64641 and previous config saved to /var/cache/conftool/dbconfig/20240611-181448-marostegui.json
  • 18:10 brennen: 1.43.0-wmf.9 train (T361403): no blockers, rolling to group0
  • 18:08 ejegg: stopped fundraising scheduled jobs
  • 17:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P64640 and previous config saved to /var/cache/conftool/dbconfig/20240611-175941-marostegui.json
  • 17:59 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:58 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:56 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:56 taavi@deploy1002: Finished scap: Backport for wikitech: Stop loading OpenStackManager (T161553 T338477 T359544) (duration: 12m 00s)
  • 17:56 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:47 taavi@deploy1002: taavi: Continuing with sync
  • 17:47 taavi@deploy1002: taavi: Backport for wikitech: Stop loading OpenStackManager (T161553 T338477 T359544) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:45 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:45 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P64639 and previous config saved to /var/cache/conftool/dbconfig/20240611-174434-marostegui.json
  • 17:44 taavi@deploy1002: Started scap: Backport for wikitech: Stop loading OpenStackManager (T161553 T338477 T359544)
  • 17:37 rzl@deploy1002: Finished scap: (no justification provided) (duration: 11m 40s)
  • 17:33 rzl: rzl@cumin2002:~$ sudo cumin 'C:profile::mediawiki::webserver' 'enable-puppet T366649'
  • 17:33 rzl@deploy1002: rzl: Continuing with sync
  • 17:30 rzl@deploy1002: rzl: (no justification provided) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T364069)', diff saved to https://phabricator.wikimedia.org/P64638 and previous config saved to /var/cache/conftool/dbconfig/20240611-172928-marostegui.json
  • 17:26 rzl@deploy1002: Started scap: (no justification provided)
  • 17:14 rzl: rzl@cumin2002:~$ sudo cumin 'C:profile::mediawiki::webserver' 'disable-puppet T366649'
  • 17:11 ejegg: fundraising civicrm upgraded from ebfbad86 to 7252b1b9
  • 17:09 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:09 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:09 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
  • 17:08 ebernhardson@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:08 ebernhardson@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:04 ebernhardson@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:04 ebernhardson@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:04 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
  • 17:04 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
  • 16:59 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
  • 16:56 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
  • 16:56 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
  • 16:56 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
  • 16:53 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 16:53 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 16:51 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
  • 16:47 ryankemper@cumin2002: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop test cluster
  • 16:40 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:restbase-codfw
  • 16:37 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
  • 16:36 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
  • 16:35 ebernhardson@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:35 ebernhardson@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:33 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "updated wikikube-ctrl1002 status - kamila@cumin1002 - T366204"
  • 16:31 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1013.eqiad.wmnet|wikikube-worker1014.eqiad.wmnet|wikikube-worker1017.eqiad.wmnet|wikikube-worker1018.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 16:31 claime: pool and uncordon wikikube-worker1013.eqiad.wmnet,wikikube-worker1014.eqiad.wmnet,wikikube-worker1017.eqiad.wmnet,wikikube-worker1018.eqiad.wmnet - T351074
  • 16:31 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "updated wikikube-ctrl1002 status - kamila@cumin1002 - T366204"
  • 16:29 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
  • 16:28 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:27 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl1001.eqiad.wmnet
  • 16:26 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 16:21 arnaudb@cumin1002: dbctl commit (dc=all): 'es1038 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64637 and previous config saved to /var/cache/conftool/dbconfig/20240611-162154-arnaudb.json
  • 16:21 claime: homer 'cr*eqiad*' commit 'T351074'
  • 16:16 elukey: manual run of docker-report-k8s on build2001 (some failed results)
  • 16:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1017.eqiad.wmnet with OS bullseye
  • 16:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1018.eqiad.wmnet with OS bullseye
  • 16:07 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1002
  • 16:06 arnaudb@cumin1002: dbctl commit (dc=all): 'es1038 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64636 and previous config saved to /var/cache/conftool/dbconfig/20240611-160649-arnaudb.json
  • 16:06 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop test cluster
  • 16:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1014.eqiad.wmnet with OS bullseye
  • 16:05 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1002
  • 16:05 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:05 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update moved wikikube-ctrl1002 host in eqiad - kamila@cumin1002"
  • 16:04 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 16:04 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update moved wikikube-ctrl1002 host in eqiad - kamila@cumin1002"
  • 16:04 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 16:03 claime: roll restarting eventgate-main eqiad
  • 16:00 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 15:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1017.eqiad.wmnet with reason: host reimage
  • 15:51 arnaudb@cumin1002: dbctl commit (dc=all): 'es1038 (re)pooling @ 50%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64635 and previous config saved to /var/cache/conftool/dbconfig/20240611-155143-arnaudb.json
  • 15:51 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
  • 15:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1018.eqiad.wmnet with reason: host reimage
  • 15:50 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/termbox: apply
  • 15:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1014.eqiad.wmnet with reason: host reimage
  • 15:45 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1018.eqiad.wmnet with reason: host reimage
  • 15:45 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1017.eqiad.wmnet with reason: host reimage
  • 15:44 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1014.eqiad.wmnet with reason: host reimage
  • 14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:35:00 on 6 hosts with reason: upgrade lsw1-f5-eqiad
  • 14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:35:00 on 6 hosts with reason: upgrade lsw1-f5-eqiad
  • 14:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping2003.codfw.wmnet
  • 14:53 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1013.eqiad.wmnet with OS bullseye
  • 14:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1013.eqiad.wmnet on all recursors
  • 14:52 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1013.eqiad.wmnet on all recursors
  • 14:52 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-f5-eqiad,lsw1-f5-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: prep upgrade of device
  • 14:52 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:51 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1403 to wikikube-worker1014
  • 14:51 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-f5-eqiad,lsw1-f5-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: prep upgrade of device
  • 14:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1037.eqiad.wmnet
  • 14:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1037.eqiad.wmnet
  • 14:51 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from mw1403 to wikikube-worker1014.eqiad.wmnet
  • 14:51 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1403 to wikikube-worker1014.eqiad.wmnet
  • 14:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3007.esams.wmnet
  • 14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1402 to wikikube-worker1013
  • 14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1013
  • 14:46 arnaudb@cumin1002: dbctl commit (dc=all): 'es1038 depool T365982', diff saved to https://phabricator.wikimedia.org/P64631 and previous config saved to /var/cache/conftool/dbconfig/20240611-144624-arnaudb.json
  • 14:45 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1013
  • 14:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1402 to wikikube-worker1013 - cgoubert@cumin1002"
  • 14:45 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 14:44 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 14:44 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1402 to wikikube-worker1013 - cgoubert@cumin1002"
  • 14:44 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl1002.eqiad.wmnet
  • 14:44 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:44 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 14:44 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3007.esams.wmnet
  • 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1037.eqiad.wmnet
  • 14:42 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 14:41 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:39 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 14:38 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 14:38 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 14:38 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1402 to wikikube-worker1013
  • 14:36 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 14:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3008.esams.wmnet
  • 14:35 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3008.esams.wmnet
  • 14:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1038.eqiad.wmnet with reason: T365982
  • 14:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on es1038.eqiad.wmnet with reason: T365982
  • 14:29 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl1002.eqiad.wmnet
  • 14:29 claime: depooling mw1402 mw1403 mw1406 mw1411 for reimage to k8s - T351074
  • 14:29 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:28 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Enable Vector appearance menu & larger font-size on wikipedias (T362148) (duration: 19m 08s)
  • 14:28 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:20:00 on lsw1-f5-eqiad.mgmt with reason: prep upgrade of device
  • 14:28 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:20:00 on lsw1-f5-eqiad.mgmt with reason: prep upgrade of device
  • 14:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3008.esams.wmnet
  • 14:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1037.eqiad.wmnet
  • 14:20 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 14:19 logmsgbot: lucaswerkmeister-wmde@deploy1002 jdrewniak, lucaswerkmeister-wmde: Continuing with sync
  • 14:18 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl1002.eqiad.wmnet
  • 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1036.eqiad.wmnet
  • 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1036.eqiad.wmnet
  • 14:12 logmsgbot: lucaswerkmeister-wmde@deploy1002 jdrewniak, lucaswerkmeister-wmde: Backport for Enable Vector appearance menu & larger font-size on wikipedias (T362148) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3008.esams.wmnet
  • 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1036.eqiad.wmnet
  • 14:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Enable Vector appearance menu & larger font-size on wikipedias (T362148)
  • 14:08 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:07 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Enable CampaignEvents on swahili wikipedia (T366502) (duration: 14m 40s)
  • 14:05 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1035.eqiad.wmnet with OS bullseye
  • 14:04 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s3
  • 14:04 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1
  • 14:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1036.eqiad.wmnet
  • 14:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1017.eqiad.wmnet
  • 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1035.eqiad.wmnet
  • 13:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1035.eqiad.wmnet
  • 13:58 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, cmelo: Continuing with sync
  • 13:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:55 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:55 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, cmelo: Backport for Enable CampaignEvents on swahili wikipedia (T366502) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1035.eqiad.wmnet
  • 13:52 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Enable CampaignEvents on swahili wikipedia (T366502)
  • 13:52 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:51 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Configures the necessary user rights for CampaignEvents on swahili (T366502) (duration: 44m 51s)
  • 13:50 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts stat1007.eqiad.wmnet
  • 13:50 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:50 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1017.eqiad.wmnet
  • 13:49 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:49 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:48 btullis@cumin1002: START - Cookbook sre.dns.netbox
  • 13:47 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s3
  • 13:47 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s1
  • 13:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:45 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:45 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1035-38 - jclark@cumin1002"
  • 13:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1035.eqiad.wmnet
  • 13:45 vgutierrez: rolling switch from tcp-mss-clamper to ferm based MSS clamping on A:ncredir - T365689
  • 13:44 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1035-38 - jclark@cumin1002"
  • 13:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
  • 13:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet
  • 13:42 jiji@cumin1002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:wikikube-worker-eqiad
  • 13:40 btullis@cumin1002: START - Cookbook sre.hosts.decommission for hosts stat1007.eqiad.wmnet
  • 13:40 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 13:40 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts stat1006.eqiad.wmnet
  • 13:40 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:40 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: stat1006.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
  • 13:36 vgutierrez: repool ncredir6001 - T365689
  • 13:36 eevans@cumin1002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:restbase-codfw
  • 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet
  • 13:33 moritzm: failover ganeti cluster for esams01 to ganeti3005
  • 13:32 moritzm: failover ganeti cluster for esams02 to ganeti3006
  • 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3006.esams.wmnet
  • 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3006.esams.wmnet
  • 13:22 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1016.eqiad.wmnet,service=s5
  • 13:22 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1016.eqiad.wmnet,service=s8
  • 13:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
  • 13:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T352010)', diff saved to https://phabricator.wikimedia.org/P64630 and previous config saved to /var/cache/conftool/dbconfig/20240611-132043-ladsgroup.json
  • 13:19 logmsgbot: lucaswerkmeister-wmde@deploy1002 cmelo, lucaswerkmeister-wmde: Continuing with sync
  • 13:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3006.esams.wmnet
  • 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1031.eqiad.wmnet
  • 13:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1031.eqiad.wmnet
  • 13:15 vgutierrez: depool ncredir6001 - T365689
  • 13:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1031.eqiad.wmnet
  • 13:11 logmsgbot: lucaswerkmeister-wmde@deploy1002 cmelo, lucaswerkmeister-wmde: Backport for Configures the necessary user rights for CampaignEvents on swahili (T366502) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:10 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: stat1006.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
  • 13:09 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:09 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:09 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:07 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:07 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:06 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_codfw
  • 13:06 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3006.esams.wmnet
  • 13:06 vgutierrez: disable puppet on A:ncredir before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1035724 - T365689
  • 13:06 fabfur: start rebooting all cp-text_codfw hosts for T366555 (spaced 1.5 hrs)
  • 13:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Configures the necessary user rights for CampaignEvents on swahili (T366502)
  • 13:06 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 13:06 btullis@cumin1002: START - Cookbook sre.dns.netbox
  • 13:06 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P64629 and previous config saved to /var/cache/conftool/dbconfig/20240611-130535-ladsgroup.json
  • 13:04 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1016.eqiad.wmnet
  • 13:03 vgutierrez: repool text@eqiad with IPIP encapsulation enabled - T366466
  • 13:02 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 13:01 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 12:59 btullis@cumin1002: START - Cookbook sre.hosts.decommission for hosts stat1006.eqiad.wmnet
  • 12:53 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1016.eqiad.wmnet
  • 12:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P64628 and previous config saved to /var/cache/conftool/dbconfig/20240611-125028-ladsgroup.json
  • 12:50 vgutierrez: rolling restart of pybal on lvs1020 and lvs1017 - T366466
  • 12:49 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1016.eqiad.wmnet,service=s8
  • 12:49 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1016.eqiad.wmnet,service=s5
  • 12:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T352010)', diff saved to https://phabricator.wikimedia.org/P64627 and previous config saved to /var/cache/conftool/dbconfig/20240611-123521-ladsgroup.json
  • 12:32 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 12:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 12:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2205 (T352010)', diff saved to https://phabricator.wikimedia.org/P64626 and previous config saved to /var/cache/conftool/dbconfig/20240611-123046-ladsgroup.json
  • 12:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 12:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 12:26 fabfur: cancelled previous command (text@eqiad is going to be depooled at the same time)
  • 12:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3005.esams.wmnet
  • 12:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3005.esams.wmnet
  • 12:23 fabfur: start rebooting all cp-text_codfw hosts for T366555 (spaced 1.5 hrs)
  • 12:19 vgutierrez: depool text@eqiad before enabling IPIP encapsulation - T366466
  • 12:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
  • 12:14 sfaci@deploy1002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 12:13 sfaci@deploy1002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 12:13 sfaci@deploy1002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 12:11 sfaci@deploy1002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 12:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3005.esams.wmnet
  • 12:10 sfaci@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 12:09 sfaci@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 12:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64625 and previous config saved to /var/cache/conftool/dbconfig/20240611-120710-ladsgroup.json
  • 12:07 sfaci@deploy1002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 12:06 claime: Finished kafka-main reboots in codfw
  • 12:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-main-codfw
  • 12:05 sfaci@deploy1002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 12:05 sfaci@deploy1002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 12:04 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts stat1005.eqiad.wmnet
  • 12:04 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:04 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: stat1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
  • 12:04 moritzm: rebalance ganeti cluster in ulsfo following reboots
  • 12:04 sfaci@deploy1002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 12:03 sfaci@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 12:02 sfaci@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4005.ulsfo.wmnet
  • 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4005.ulsfo.wmnet
  • 11:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: repl issues
  • 11:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: repl issues
  • 11:57 sfaci@deploy1002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 11:55 sfaci@deploy1002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 11:55 sfaci@deploy1002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
  • 11:55 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: stat1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
  • 11:54 sfaci@deploy1002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
  • 11:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64624 and previous config saved to /var/cache/conftool/dbconfig/20240611-115203-ladsgroup.json
  • 11:51 jayme: removed similar-users deployments from all k8s clusters - T345274
  • 11:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64621 and previous config saved to /var/cache/conftool/dbconfig/20240611-113656-ladsgroup.json
  • 11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T364069)', diff saved to https://phabricator.wikimedia.org/P64620 and previous config saved to /var/cache/conftool/dbconfig/20240611-113452-marostegui.json
  • 11:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 11:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T364069)', diff saved to https://phabricator.wikimedia.org/P64619 and previous config saved to /var/cache/conftool/dbconfig/20240611-113430-marostegui.json
  • 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1030.eqiad.wmnet
  • 11:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64618 and previous config saved to /var/cache/conftool/dbconfig/20240611-113121-root.json
  • 11:29 moritzm: failover ganeti master in ulsfo to ganeti4008
  • 11:27 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: stat1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
  • 11:26 klausman@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 11:24 klausman@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 11:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
  • 11:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
  • 11:23 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:22 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64617 and previous config saved to /var/cache/conftool/dbconfig/20240611-112149-ladsgroup.json
  • 11:21 klausman@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 11:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P64616 and previous config saved to /var/cache/conftool/dbconfig/20240611-111922-marostegui.json
  • 11:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
  • 11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64615 and previous config saved to /var/cache/conftool/dbconfig/20240611-111616-root.json
  • 11:15 klausman@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 11:13 jayme: removing similar-users service - T345274
  • 11:12 btullis@cumin1002: START - Cookbook sre.dns.netbox
  • 11:09 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1015.eqiad.wmnet,service=s4
  • 11:09 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1015.eqiad.wmnet,service=s6
  • 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
  • 11:07 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1015.eqiad.wmnet
  • 11:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4007.ulsfo.wmnet
  • 11:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet
  • 11:06 cgoubert@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-main-codfw
  • 11:05 klausman@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 11:05 claime: Starting kafka-main reboots in codfw
  • 11:04 btullis@cumin1002: START - Cookbook sre.hosts.decommission for hosts stat1004.eqiad.wmnet
  • 11:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P64614 and previous config saved to /var/cache/conftool/dbconfig/20240611-110414-marostegui.json
  • 11:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet
  • 10:57 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 10:57 klausman@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 10:50 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4007.ulsfo.wmnet
  • 10:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T364069)', diff saved to https://phabricator.wikimedia.org/P64613 and previous config saved to /var/cache/conftool/dbconfig/20240611-104908-marostegui.json
  • 10:48 marostegui: dbmaint codfw s5 deploy schema change on db2123 T364069
  • 10:48 marostegui: dbmaint codfw s5 deploy schema change on db2123 T364299
  • 10:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2123.codfw.wmnet with reason: Long schema change
  • 10:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2123.codfw.wmnet with reason: Long schema change
  • 10:45 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1015.eqiad.wmnet
  • 10:45 claime: move 90% of traffic to mw-on-k8s - T362323
  • 10:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2123 T367145', diff saved to https://phabricator.wikimedia.org/P64612 and previous config saved to /var/cache/conftool/dbconfig/20240611-104336-root.json
  • 10:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4006.ulsfo.wmnet
  • 10:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet
  • 10:42 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2213 to s5 primary T367145', diff saved to https://phabricator.wikimedia.org/P64611 and previous config saved to /var/cache/conftool/dbconfig/20240611-104232-root.json
  • 10:42 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 10:42 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 10:42 marostegui: Starting s5 codfw failover from db2123 to db2213 - T367145
  • 10:41 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 10:40 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1015.eqiad.wmnet,service=s6
  • 10:40 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1015.eqiad.wmnet,service=s4
  • 10:39 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 10:39 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 10:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 10:38 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 10:38 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 10:37 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 10:37 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 10:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet
  • 10:34 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 10:32 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 10:29 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2213 from API/vslow/dump T367145', diff saved to https://phabricator.wikimedia.org/P64610 and previous config saved to /var/cache/conftool/dbconfig/20240611-102900-root.json
  • 10:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s5 T367145
  • 10:28 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2213 with weight 0 T367145', diff saved to https://phabricator.wikimedia.org/P64609 and previous config saved to /var/cache/conftool/dbconfig/20240611-102820-root.json
  • 10:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s5 T367145
  • 10:27 jayme@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 10:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64608 and previous config saved to /var/cache/conftool/dbconfig/20240611-102444-ladsgroup.json
  • 10:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 10:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 10:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64607 and previous config saved to /var/cache/conftool/dbconfig/20240611-102125-ladsgroup.json
  • 10:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 10:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 10:20 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4006.ulsfo.wmnet
  • 10:18 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1030.eqiad.wmnet
  • 10:16 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 10:16 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 10:16 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 10:16 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1014.eqiad.wmnet,service=s7
  • 10:16 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1014.eqiad.wmnet,service=s2
  • 10:16 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 10:15 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1029.eqiad.wmnet
  • 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
  • 10:15 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1014.eqiad.wmnet
  • 10:15 jayme@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 10:14 filippo@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-logging-eqiad
  • 10:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T360332)', diff saved to https://phabricator.wikimedia.org/P64606 and previous config saved to /var/cache/conftool/dbconfig/20240611-101400-arnaudb.json
  • 10:11 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 10:10 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 10:10 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 10:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
  • 10:09 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx1001.wikimedia.org
  • 10:08 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 10:08 jayme@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 10:07 brouberol@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 10:07 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:06 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:06 brouberol@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 10:06 brouberol@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:06 brouberol@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 10:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mx1001.wikimedia.org
  • 10:04 brouberol@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:04 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1014.eqiad.wmnet
  • 10:03 brouberol@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 10:02 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:02 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:02 brouberol@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:01 jmm@cumin2002: END (PASS) - Cookbook sre.pki.restart-reboot (exit_code=0) rolling reboot on A:pki
  • 10:01 brouberol@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:01 brouberol@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:00 brouberol@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:00 sukhe: [end] running authdns-update to send Bolivia (BO) and Paraguay (PY) to magru: T346722
  • 09:59 brouberol@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:59 brouberol@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:59 sukhe: [start] running authdns-update to send Bolivia (BO) and Paraguay (PY) to magru
  • 09:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64605 and previous config saved to /var/cache/conftool/dbconfig/20240611-095853-arnaudb.json
  • 09:58 brouberol@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:58 brouberol@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:57 brouberol@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:57 brouberol@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1029.eqiad.wmnet
  • 09:56 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1014.eqiad.wmnet,service=s2
  • 09:56 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1014.eqiad.wmnet,service=s7
  • 09:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1028.eqiad.wmnet
  • 09:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
  • 09:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
  • 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet. on all recursors
  • 09:49 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet. on all recursors
  • 09:45 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 09:44 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 09:44 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 09:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64604 and previous config saved to /var/cache/conftool/dbconfig/20240611-094347-arnaudb.json
  • 09:43 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet. on all recursors
  • 09:42 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet. on all recursors
  • 09:42 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 09:42 jmm@cumin2002: START - Cookbook sre.pki.restart-reboot rolling reboot on A:pki
  • 09:41 jayme@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 09:37 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp2027.codfw.wmnet
  • 09:36 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
  • 09:35 moritzm: rebalance ganeti clusters in codfw following reboots
  • 09:34 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 09:34 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 09:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T360332)', diff saved to https://phabricator.wikimedia.org/P64603 and previous config saved to /var/cache/conftool/dbconfig/20240611-092839-arnaudb.json
  • 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx2001.wikimedia.org
  • 09:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1026.eqiad.wmnet
  • 09:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet
  • 09:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T360332)', diff saved to https://phabricator.wikimedia.org/P64602 and previous config saved to /var/cache/conftool/dbconfig/20240611-092504-arnaudb.json
  • 09:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 09:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 09:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mx2001.wikimedia.org
  • 09:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet
  • 09:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2022.codfw.wmnet
  • 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
  • 09:16 filippo@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-logging-eqiad
  • 09:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
  • 09:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1026.eqiad.wmnet
  • 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet
  • 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
  • 09:04 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2022.codfw.wmnet
  • 09:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
  • 08:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2032.codfw.wmnet
  • 08:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2032.codfw.wmnet
  • 08:53 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:53 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2032.codfw.wmnet
  • 08:47 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:46 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:46 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 08:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2032.codfw.wmnet
  • 08:46 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 08:45 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 08:45 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 08:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2031.codfw.wmnet
  • 08:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2031.codfw.wmnet
  • 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet
  • 08:38 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki enwiki --current --all --touched-after=20240524120000 --start '["55019880"]' 2>&1 | tee -a ~/T315510-enwiki-8; date
  • 08:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2031.codfw.wmnet
  • 08:33 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp2027.ulsfo.wmnet
  • 08:32 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2027.codfw.wmnet
  • 08:31 marostegui: Install 10.11 on db1153 (non used x2 replica) T365805
  • 08:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1153.eqiad.wmnet with reason: Long schema change
  • 08:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1153.eqiad.wmnet with reason: Long schema change
  • 08:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2031.codfw.wmnet
  • 08:31 filippo@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-logging-codfw
  • 08:30 marostegui: Install 10.11 on db1153 (non used x2 replioca)
  • 08:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1222 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64600 and previous config saved to /var/cache/conftool/dbconfig/20240611-081314-root.json
  • 08:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1024.eqiad.wmnet
  • 08:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1024.eqiad.wmnet
  • 08:02 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:02 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 07:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1024.eqiad.wmnet
  • 07:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1222 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64599 and previous config saved to /var/cache/conftool/dbconfig/20240611-075809-root.json
  • 07:55 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2029.codfw.wmnet
  • 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2030.codfw.wmnet
  • 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2030.codfw.wmnet
  • 07:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2030.codfw.wmnet
  • 07:47 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1024.eqiad.wmnet
  • 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1023.eqiad.wmnet
  • 07:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2030.codfw.wmnet
  • 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
  • 07:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1222 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64598 and previous config saved to /var/cache/conftool/dbconfig/20240611-074304-root.json
  • 07:40 kart_: Updated MinT to 2024-06-11-052620-production (T364122, T346226, T357548)
  • 07:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64597 and previous config saved to /var/cache/conftool/dbconfig/20240611-074009-root.json
  • 07:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
  • 07:37 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 07:36 filippo@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-logging-codfw
  • 07:28 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 07:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1222 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64596 and previous config saved to /var/cache/conftool/dbconfig/20240611-072758-root.json
  • 07:26 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 07:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64595 and previous config saved to /var/cache/conftool/dbconfig/20240611-072504-root.json
  • 07:18 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 07:17 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 07:13 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 07:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1222 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64594 and previous config saved to /var/cache/conftool/dbconfig/20240611-071253-root.json
  • 07:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1023.eqiad.wmnet
  • 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64593 and previous config saved to /var/cache/conftool/dbconfig/20240611-070958-root.json
  • 07:05 arnaudb@deploy1002: Finished scap: Backport for Revert "dbconfig: temporary disable writes on es6" (duration: 11m 36s)
  • 07:02 moritzm: failover ganeti master in codfw to ganeti2020
  • 06:57 arnaudb@deploy1002: arnaudb: Continuing with sync
  • 06:56 arnaudb@deploy1002: arnaudb: Backport for Revert "dbconfig: temporary disable writes on es6" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64592 and previous config saved to /var/cache/conftool/dbconfig/20240611-065453-root.json
  • 06:54 arnaudb@deploy1002: Started scap: Backport for Revert "dbconfig: temporary disable writes on es6"
  • 06:40 arnaudb@cumin1002: dbctl commit (dc=all): 'mimic weight', diff saved to https://phabricator.wikimedia.org/P64591 and previous config saved to /var/cache/conftool/dbconfig/20240611-064041-arnaudb.json
  • 06:40 oblivian@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: incident in progress, blocking deploys --joe (duration: 15m 33s)
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64590 and previous config saved to /var/cache/conftool/dbconfig/20240611-063947-root.json
  • 06:39 arnaudb@cumin1002: dbctl commit (dc=all): 'mimic weight', diff saved to https://phabricator.wikimedia.org/P64589 and previous config saved to /var/cache/conftool/dbconfig/20240611-063903-arnaudb.json
  • 06:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote es1037 to es6 primary T367055', diff saved to https://phabricator.wikimedia.org/P64588 and previous config saved to /var/cache/conftool/dbconfig/20240611-063109-arnaudb.json
  • 06:30 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 06:30 arnaudb: Starting es6 eqiad failover from es1038 to es1037 - T367055
  • 06:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64587 and previous config saved to /var/cache/conftool/dbconfig/20240611-062441-root.json
  • 06:24 oblivian@deploy1002: Locking from deployment [ALL REPOSITORIES]: incident in progress, blocking deploys --joe
  • 06:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Set es1037 with weight 0 T367055', diff saved to https://phabricator.wikimedia.org/P64586 and previous config saved to /var/cache/conftool/dbconfig/20240611-062353-arnaudb.json
  • 06:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es6 T367055
  • 06:23 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es6 T367055
  • 06:19 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 06:14 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64585 and previous config saved to /var/cache/conftool/dbconfig/20240611-061413-root.json
  • 06:12 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 06:11 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 06:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64584 and previous config saved to /var/cache/conftool/dbconfig/20240611-060935-root.json
  • 06:09 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 06:07 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 06:07 arnaudb@deploy1002: Finished scap: Backport for dbconfig: temporary disable writes on es6 (T367055) (duration: 15m 42s)
  • 05:59 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64583 and previous config saved to /var/cache/conftool/dbconfig/20240611-055907-root.json
  • 05:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: maintenance
  • 05:58 arnaudb@deploy1002: arnaudb: Continuing with sync
  • 05:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: maintenance
  • 05:58 arnaudb@cumin1002: dbctl commit (dc=all): 'depool db1233', diff saved to https://phabricator.wikimedia.org/P64582 and previous config saved to /var/cache/conftool/dbconfig/20240611-055816-arnaudb.json
  • 05:56 arnaudb@deploy1002: arnaudb: Backport for dbconfig: temporary disable writes on es6 (T367055) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 05:51 arnaudb@deploy1002: Started scap: Backport for dbconfig: temporary disable writes on es6 (T367055)
  • 05:44 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64581 and previous config saved to /var/cache/conftool/dbconfig/20240611-054401-root.json
  • 05:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64580 and previous config saved to /var/cache/conftool/dbconfig/20240611-052856-root.json
  • 05:24 marostegui: dbmaint eqiad s3 deploy schema change on db1223 T364069
  • 05:22 marostegui: dbmaint eqiad s3 deploy schema change on db1223 T364299
  • 05:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1223.eqiad.wmnet with reason: Long schema change
  • 05:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1223.eqiad.wmnet with reason: Long schema change
  • 05:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1223 T367140', diff saved to https://phabricator.wikimedia.org/P64579 and previous config saved to /var/cache/conftool/dbconfig/20240611-052101-root.json
  • 05:20 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1157 to s3 primary and set section read-write T367140', diff saved to https://phabricator.wikimedia.org/P64578 and previous config saved to /var/cache/conftool/dbconfig/20240611-052000-root.json
  • 05:19 marostegui@cumin1002: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - T367140', diff saved to https://phabricator.wikimedia.org/P64577 and previous config saved to /var/cache/conftool/dbconfig/20240611-051941-root.json
  • 05:19 marostegui: Starting s3 eqiad failover from db1223 to db1157 - T367140
  • 05:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64576 and previous config saved to /var/cache/conftool/dbconfig/20240611-051351-root.json
  • 05:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s3 T367140
  • 05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1157 with weight 0 T367140', diff saved to https://phabricator.wikimedia.org/P64575 and previous config saved to /var/cache/conftool/dbconfig/20240611-050351-root.json
  • 05:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s3 T367140
  • 04:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64574 and previous config saved to /var/cache/conftool/dbconfig/20240611-045845-root.json
  • 04:57 marostegui: dbmaint eqiad s2 deploy schema change on db1222 T364299
  • 04:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1222.eqiad.wmnet with reason: Long schema change
  • 04:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1222.eqiad.wmnet with reason: Long schema change
  • 04:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1222 T366687', diff saved to https://phabricator.wikimedia.org/P64573 and previous config saved to /var/cache/conftool/dbconfig/20240611-045447-root.json
  • 04:54 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1162 to s2 primary and set section read-write T366687', diff saved to https://phabricator.wikimedia.org/P64572 and previous config saved to /var/cache/conftool/dbconfig/20240611-045359-root.json
  • 04:53 marostegui@cumin1002: dbctl commit (dc=all): 'Set s2 eqiad as read-only for maintenance - T366687', diff saved to https://phabricator.wikimedia.org/P64571 and previous config saved to /var/cache/conftool/dbconfig/20240611-045341-root.json
  • 04:53 marostegui: Starting s2 eqiad failover from db1222 to db1162 - T366687
  • 04:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2122 (T364069)', diff saved to https://phabricator.wikimedia.org/P64570 and previous config saved to /var/cache/conftool/dbconfig/20240611-044616-marostegui.json
  • 04:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 04:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 04:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64569 and previous config saved to /var/cache/conftool/dbconfig/20240611-044339-root.json
  • 04:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s2 T366687
  • 04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1162 with weight 0 T366687', diff saved to https://phabricator.wikimedia.org/P64568 and previous config saved to /var/cache/conftool/dbconfig/20240611-043333-marostegui.json
  • 04:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s2 T366687
  • 04:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T352010)', diff saved to https://phabricator.wikimedia.org/P64567 and previous config saved to /var/cache/conftool/dbconfig/20240611-041938-ladsgroup.json
  • 04:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P64566 and previous config saved to /var/cache/conftool/dbconfig/20240611-040432-ladsgroup.json
  • 04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.6 (duration: 01m 05s)
  • 04:00 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.9 refs T361403 (duration: 57m 19s)
  • 03:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P64565 and previous config saved to /var/cache/conftool/dbconfig/20240611-034925-ladsgroup.json
  • 03:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T352010)', diff saved to https://phabricator.wikimedia.org/P64564 and previous config saved to /var/cache/conftool/dbconfig/20240611-033418-ladsgroup.json
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.43.0-wmf.9 refs T361403
  • 00:40 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:restbase-eqiad

2024-06-10

  • 23:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 23:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 22:36 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:36 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:30 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:30 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:28 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:27 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:25 reedy@deploy1002: Synchronized wmf-config/: sync interwiki lists (duration: 10m 07s)
  • 22:14 reedy@deploy1002: Synchronized langlist-labs: Add fr and bn (duration: 14m 29s)
  • 21:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 21:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 21:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T364069)', diff saved to https://phabricator.wikimedia.org/P64563 and previous config saved to /var/cache/conftool/dbconfig/20240610-215622-marostegui.json
  • 21:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P64562 and previous config saved to /var/cache/conftool/dbconfig/20240610-214115-marostegui.json
  • 21:27 eevans@cumin1002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:restbase-eqiad
  • 21:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P64561 and previous config saved to /var/cache/conftool/dbconfig/20240610-212608-marostegui.json
  • 21:19 ejegg: fundraising python tools upgraded from 8c98b674 to c51f6e62
  • 21:19 ejegg: Standalone SmashPig upgraded from edf573bb to 1d1b770c
  • 21:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T364069)', diff saved to https://phabricator.wikimedia.org/P64560 and previous config saved to /var/cache/conftool/dbconfig/20240610-211101-marostegui.json
  • 20:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T352010)', diff saved to https://phabricator.wikimedia.org/P64559 and previous config saved to /var/cache/conftool/dbconfig/20240610-204622-ladsgroup.json
  • 20:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 20:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 20:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64558 and previous config saved to /var/cache/conftool/dbconfig/20240610-204600-ladsgroup.json
  • 20:36 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 20:36 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 20:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P64557 and previous config saved to /var/cache/conftool/dbconfig/20240610-203053-ladsgroup.json
  • 20:30 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 20:30 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 20:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P64556 and previous config saved to /var/cache/conftool/dbconfig/20240610-201546-ladsgroup.json
  • 20:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64555 and previous config saved to /var/cache/conftool/dbconfig/20240610-200039-ladsgroup.json
  • 19:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1224 (T364069)', diff saved to https://phabricator.wikimedia.org/P64554 and previous config saved to /var/cache/conftool/dbconfig/20240610-195826-marostegui.json
  • 19:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 19:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 19:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T364069)', diff saved to https://phabricator.wikimedia.org/P64553 and previous config saved to /var/cache/conftool/dbconfig/20240610-195804-marostegui.json
  • 19:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P64552 and previous config saved to /var/cache/conftool/dbconfig/20240610-194256-marostegui.json
  • 19:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P64551 and previous config saved to /var/cache/conftool/dbconfig/20240610-192749-marostegui.json
  • 19:22 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 19:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T364069)', diff saved to https://phabricator.wikimedia.org/P64550 and previous config saved to /var/cache/conftool/dbconfig/20240610-191242-marostegui.json
  • 19:02 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 19:02 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 18:11 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 17:50 amastilovic@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:50 amastilovic@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:47 amastilovic@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:46 amastilovic@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T364069)', diff saved to https://phabricator.wikimedia.org/P64547 and previous config saved to /var/cache/conftool/dbconfig/20240610-174349-marostegui.json
  • 17:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 17:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 17:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T364069)', diff saved to https://phabricator.wikimedia.org/P64546 and previous config saved to /var/cache/conftool/dbconfig/20240610-174327-marostegui.json
  • 17:37 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:36 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:30 dancy@deploy1002: Installation of scap version "4.87.0" completed for 285 hosts
  • 17:29 amastilovic@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:29 amastilovic@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P64545 and previous config saved to /var/cache/conftool/dbconfig/20240610-172820-marostegui.json
  • 17:25 dancy@deploy1002: Installing scap version "4.87.0" for 285 hosts
  • 17:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P64544 and previous config saved to /var/cache/conftool/dbconfig/20240610-171313-marostegui.json
  • 17:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 17:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 16:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T364069)', diff saved to https://phabricator.wikimedia.org/P64543 and previous config saved to /var/cache/conftool/dbconfig/20240610-165806-marostegui.json
  • 16:26 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 16:21 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 16:20 marostegui: Drop flaggedpage_pending from s1 T365568
  • 16:05 cdanis: 💙[email protected] ~ 🕛☕ sudo cumin -b 8 '*.codfw.wmnet and C:geoip::data::puppet%fetch_ipinfo_dbs=true' 'sha512sum /usr/share/GeoIPInfo/GeoLite2-ASN.mmdb || run-puppet-agent'
  • 16:01 cdanis: 💙[email protected] ~ 🕛☕ sudo systemctl restart sync-puppet-volatile
  • 16:00 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 16:00 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:cassandra-dev
  • 15:54 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 15:47 marostegui: Drop flaggedpage_pending from s3 T365568
  • 15:46 marostegui: Drop flaggedpage_pending from s5 T365568
  • 15:43 marostegui: Drop flaggedpage_pending from s2 T365568
  • 15:42 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:42 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 15:41 godog: bounce benthos@mw_accesslog_metrics.service on centrallog hosts
  • 15:41 marostegui: Drop flaggedpage_pending from s7 T365568
  • 15:40 marostegui: Drop flaggedpage_pending from s6 T365568
  • 15:34 ladsgroup@deploy1002: Synchronized portals: (no justification provided) (duration: 11m 20s)
  • 15:31 eevans@cumin1002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:cassandra-dev
  • 15:31 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 15:29 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 15:22 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: (no justification provided) (duration: 10m 28s)
  • 15:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2024.codfw.wmnet
  • 15:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2024.codfw.wmnet
  • 15:05 cdobbins@cumin1002: conftool action : set/pooled=yes; selector: name=4046.ulsfo.wmnet
  • 15:04 ladsgroup@deploy1002: Finished scap: Backport for errorpages: Add dark mode support (duration: 17m 15s)
  • 15:03 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 15:02 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 15:02 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:02 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:02 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 15:01 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 15:01 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 15:01 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:01 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4046.ulsfo.wmnet
  • 15:01 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 15:01 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 15:00 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 15:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
  • 14:59 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:59 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 14:58 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 14:58 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 14:57 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 14:57 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 14:56 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 14:56 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 14:56 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 14:56 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs1010.eqiad.wmnet: Apply update to Java 11 - eevans@cumin1002
  • 14:56 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 14:55 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 14:55 ladsgroup@deploy1002: ladsgroup and ebrahim: Continuing with sync
  • 14:54 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:54 ladsgroup@deploy1002: ladsgroup and ebrahim: Backport for errorpages: Add dark mode support synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2024.codfw.wmnet
  • 14:53 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:53 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 14:52 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 14:52 moritzm: powercycling ganeti1019, stuck on reboot
  • 14:52 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 14:52 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 14:52 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 14:52 ChrisDobbins901_: sudo -i cookbook sre.hosts.reboot-single -r 'Kernel upgrade' 'P{cp4046.*}'
  • 14:51 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 14:51 cdobbins@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4046.ulsfo.wmnet
  • 14:51 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 14:51 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 14:51 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 14:50 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 14:50 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 14:50 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 14:49 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 14:48 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 14:48 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs1010.eqiad.wmnet: Apply update to Java 11 - eevans@cumin1002
  • 14:47 urandom: aqs1010: restarting cassandra to apply upgrade to Java 11 — T350567
  • 14:47 ladsgroup@deploy1002: Started scap: Backport for errorpages: Add dark mode support
  • 14:46 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=cp4046.ulsfo.wmnet
  • 14:46 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:45 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
  • 14:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2023.codfw.wmnet
  • 14:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T364069)', diff saved to https://phabricator.wikimedia.org/P64539 and previous config saved to /var/cache/conftool/dbconfig/20240610-144501-marostegui.json
  • 14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 14:44 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 14:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64538 and previous config saved to /var/cache/conftool/dbconfig/20240610-144439-marostegui.json
  • 14:44 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 14:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic1107.eqiad.wmnet with reason: T365982
  • 14:43 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 14:43 swfrench@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:43 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic1107.eqiad.wmnet with reason: T365982
  • 14:42 swfrench@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:41 swfrench@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1019.eqiad.wmnet
  • 14:41 swfrench@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2023.codfw.wmnet
  • 14:39 swfrench@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:38 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 14:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 14:36 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 14:36 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 14:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1019.eqiad.wmnet
  • 14:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 14:31 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2023.codfw.wmnet
  • 14:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 14:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P64537 and previous config saved to /var/cache/conftool/dbconfig/20240610-142931-marostegui.json
  • 14:23 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 14:23 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 14:19 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 14:19 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 14:19 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/datasets-config: apply
  • 14:19 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/datasets-config: apply
  • 14:18 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/datasets-config: apply
  • 14:18 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/datasets-config-next: apply
  • 14:18 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/datasets-config-next: apply
  • 14:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P64536 and previous config saved to /var/cache/conftool/dbconfig/20240610-141422-marostegui.json
  • 14:11 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:10 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 13:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64535 and previous config saved to /var/cache/conftool/dbconfig/20240610-135914-marostegui.json
  • 13:57 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1107.eqiad.wmnet for T348977 - bking@cumin2002
  • 13:57 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1107.eqiad.wmnet for T348977 - bking@cumin2002
  • 13:57 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic1107 for T348977 - bking@cumin2002
  • 13:57 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1107 for T348977 - bking@cumin2002
  • 13:50 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4047.ulsfo.wmnet
  • 13:49 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
  • 13:48 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
  • 13:47 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 13:47 taavi@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad and A:lvs
  • 13:47 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 13:46 taavi@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad and A:lvs
  • 13:43 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/echoserver: apply
  • 13:43 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/echoserver: apply
  • 13:42 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:42 elukey@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:37 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
  • 13:36 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
  • 13:36 elukey: move recommendation-api on wikikube to prometheus metrics (offboarded from statsd) - T205870
  • 13:36 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
  • 13:35 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
  • 13:34 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
  • 13:34 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
  • 13:30 marostegui: dbmaint codfw s4 deploy schema change on db2140 T364069
  • 13:29 taavi: taavi@mw1447 ~ $ sudo /usr/local/sbin/restart-php-fpm-all php7.4-fpm 9223372 # leftover from me restarting LVS during deployment
  • 13:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Long schema change
  • 13:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Long schema change
  • 13:27 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
  • 13:26 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
  • 13:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64534 and previous config saved to /var/cache/conftool/dbconfig/20240610-132619-ladsgroup.json
  • 13:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 13:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 13:25 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
  • 13:25 elukey@deploy1002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
  • 13:20 ladsgroup@deploy1002: Finished scap: Backport for [huwiki] Add "suppressredirect" user right to editor user group (T366438) (duration: 15m 05s)
  • 13:19 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4047.ulsfo.wmnet
  • 13:18 taavi@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.restart-pybal (exit_code=99) rolling-restart of pybal on A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad and A:lvs
  • 13:18 taavi@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad and A:lvs
  • 13:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2021.codfw.wmnet
  • 13:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2021.codfw.wmnet
  • 13:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1018.eqiad.wmnet
  • 13:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1018.eqiad.wmnet
  • 13:11 taavi: restarting eqiad low-traffic LVS for https://gerrit.wikimedia.org/r/c/operations/puppet/+/941459
  • 13:11 ladsgroup@deploy1002: ladsgroup and gergesshamon: Continuing with sync
  • 13:10 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4047.ulsfo.wmnet
  • 13:10 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:09 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4047.ulsfo.wmnet
  • 13:09 fabfur: rebooting cp4047 (T366555)
  • 13:09 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:08 ladsgroup@deploy1002: ladsgroup and gergesshamon: Backport for [huwiki] Add "suppressredirect" user right to editor user group (T366438) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:07 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 13:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2021.codfw.wmnet
  • 13:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1018.eqiad.wmnet
  • 13:05 ladsgroup@deploy1002: Started scap: Backport for [huwiki] Add "suppressredirect" user right to editor user group (T366438)
  • 13:04 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 13:04 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 13:03 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:03 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 13:01 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 13:01 elukey@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 12:58 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 12:58 elukey@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 12:55 fabfur: repooling text@drmrs (IPIP encapsulation enabled) (T366466)
  • 12:53 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 12:50 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 12:50 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 12:49 elukey@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 12:48 elukey@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 12:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1018.eqiad.wmnet
  • 12:46 elukey@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 12:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2021.codfw.wmnet
  • 12:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2020.codfw.wmnet
  • 12:44 elukey@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2020.codfw.wmnet
  • 12:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1017.eqiad.wmnet
  • 12:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1017.eqiad.wmnet
  • 12:43 elukey@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 12:41 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:40 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2020.codfw.wmnet
  • 12:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1017.eqiad.wmnet
  • 12:30 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2020.codfw.wmnet
  • 12:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64532 and previous config saved to /var/cache/conftool/dbconfig/20240610-122847-arnaudb.json
  • 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2019.codfw.wmnet
  • 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
  • 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
  • 12:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1017.eqiad.wmnet
  • 12:20 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2019.codfw.wmnet
  • 12:15 oblivian@deploy1002: Finished scap: Deploying change to base mediawiki image (take 2) (duration: 22m 39s)
  • 12:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64531 and previous config saved to /var/cache/conftool/dbconfig/20240610-121341-arnaudb.json
  • 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2018.codfw.wmnet
  • 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2018.codfw.wmnet
  • 11:58 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 50%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64530 and previous config saved to /var/cache/conftool/dbconfig/20240610-115834-arnaudb.json
  • 11:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2018.codfw.wmnet
  • 11:53 oblivian@deploy1002: Started scap: Deploying change to base mediawiki image (take 2)
  • 11:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64528 and previous config saved to /var/cache/conftool/dbconfig/20240610-114957-marostegui.json
  • 11:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 11:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 11:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T364069)', diff saved to https://phabricator.wikimedia.org/P64527 and previous config saved to /var/cache/conftool/dbconfig/20240610-114934-marostegui.json
  • 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1016.eqiad.wmnet
  • 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1016.eqiad.wmnet
  • 11:44 oblivian@deploy1002: sync-world aborted: Deploying change to base mediawiki image (duration: 10m 21s)
  • 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2018.codfw.wmnet
  • 11:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 25%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64526 and previous config saved to /var/cache/conftool/dbconfig/20240610-114329-arnaudb.json
  • 11:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1016.eqiad.wmnet
  • 11:39 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 11:36 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2017.codfw.wmnet
  • 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2017.codfw.wmnet
  • 11:35 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 11:34 oblivian@deploy1002: Started scap: Deploying change to base mediawiki image
  • 11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P64525 and previous config saved to /var/cache/conftool/dbconfig/20240610-113426-marostegui.json
  • 11:34 oblivian@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: setting global lock while working on mw-on-k8s --joe. Ping me if you need urgent deployments (duration: 10m 22s)
  • 11:32 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 11:29 fabfur: restarting pybal on lvs6003,lvs6001 to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1039947 (T366466)
  • 11:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1016.eqiad.wmnet
  • 11:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 10%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64524 and previous config saved to /var/cache/conftool/dbconfig/20240610-112821-arnaudb.json
  • 11:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2017.codfw.wmnet
  • 11:26 fabfur: enabling && running puppet on A:lvs-drmrs to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1039947 (T366466)
  • 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1015.eqiad.wmnet
  • 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1015.eqiad.wmnet
  • 11:23 oblivian@deploy1002: Locking from deployment [ALL REPOSITORIES]: setting global lock while working on mw-on-k8s --joe. Ping me if you need urgent deployments
  • 11:19 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1015.eqiad.wmnet
  • 11:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P64523 and previous config saved to /var/cache/conftool/dbconfig/20240610-111917-marostegui.json
  • 11:19 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:19 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:18 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 5%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64522 and previous config saved to /var/cache/conftool/dbconfig/20240610-111315-arnaudb.json
  • 10:47 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
  • 10:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 1%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64519 and previous config saved to /var/cache/conftool/dbconfig/20240610-104303-arnaudb.json
  • 10:41 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
  • 10:41 fabfur: depooling text@drmrs to apply IPIP encapsulation patches (T366466)
  • 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2016.codfw.wmnet
  • 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2016.codfw.wmnet
  • 10:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 10:27 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 10:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2016.codfw.wmnet
  • 10:25 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2204 T367019', diff saved to https://phabricator.wikimedia.org/P64518 and previous config saved to /var/cache/conftool/dbconfig/20240610-102511-arnaudb.json
  • 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
  • 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1014.eqiad.wmnet
  • 10:21 claime: repooled all active/active mediawiki services from codfw
  • 10:21 cgoubert@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=api-ro,name=codfw
  • 10:21 cgoubert@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=appservers-ro,name=codfw
  • 10:21 cgoubert@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=mw-api-int-ro,name=codfw
  • 10:21 cgoubert@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=mw-api-ext-ro,name=codfw
  • 10:21 cgoubert@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=mw-web-ro,name=codfw
  • 10:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1014.eqiad.wmnet
  • 10:08 claime: depooled all active/active mediawiki services from codfw
  • 10:08 cgoubert@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=api-ro,name=codfw
  • 10:07 cgoubert@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=appservers-ro,name=codfw
  • 10:06 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2016.codfw.wmnet
  • 10:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
  • 10:05 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 10:02 cgoubert@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=mw-api-int-ro,name=codfw
  • 10:02 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 10:01 cgoubert@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=mw-api-ext-ro,name=codfw
  • 10:01 cgoubert@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=mw-web-ro,name=codfw
  • 10:01 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 09:57 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 09:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 26 hosts with reason: Issue from T367019
  • 09:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5:00:00 on 26 hosts with reason: Issue from T367019
  • 09:54 arnaudb@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 5:00:00 on 870 hosts with reason: Issue from T367019
  • 09:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5:00:00 on 870 hosts with reason: Issue from T367019
  • 09:53 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 09:53 jayme@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 09:47 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4048.ulsfo.wmnet
  • 09:37 godog: roll upgrade prometheus-statsd-exporter to baremetal - T302373
  • 09:34 taavi@deploy1002: Finished scap: Backport for Reapply "wikitech: Replace OSM class in Gerrit blocking hook" (duration: 11m 17s)
  • 09:25 taavi@deploy1002: taavi: Continuing with sync
  • 09:25 taavi@deploy1002: taavi: Backport for Reapply "wikitech: Replace OSM class in Gerrit blocking hook" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:24 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 09:24 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 09:22 taavi@deploy1002: Started scap: Backport for Reapply "wikitech: Replace OSM class in Gerrit blocking hook"
  • 09:22 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 09:22 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 09:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T364069)', diff saved to https://phabricator.wikimedia.org/P64517 and previous config saved to /var/cache/conftool/dbconfig/20240610-091631-marostegui.json
  • 09:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 09:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 09:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T364069)', diff saved to https://phabricator.wikimedia.org/P64516 and previous config saved to /var/cache/conftool/dbconfig/20240610-091606-marostegui.json
  • 09:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db2207 to s2 primary T367019', diff saved to https://phabricator.wikimedia.org/P64515 and previous config saved to /var/cache/conftool/dbconfig/20240610-091506-arnaudb.json
  • 09:14 arnaudb: Starting s2 codfw failover from db2204 to db2207 - T367019
  • 09:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2015.codfw.wmnet
  • 09:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2015.codfw.wmnet
  • 09:01 godog: upload prometheus-statsd-exporter 0.26.1-1 to apt - T302373
  • 09:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P64514 and previous config saved to /var/cache/conftool/dbconfig/20240610-090058-marostegui.json
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1013.eqiad.wmnet
  • 08:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db2207 with weight 0 T367019', diff saved to https://phabricator.wikimedia.org/P64513 and previous config saved to /var/cache/conftool/dbconfig/20240610-085721-arnaudb.json
  • 08:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s2 T367019
  • 08:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s2 T367019
  • 08:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64512 and previous config saved to /var/cache/conftool/dbconfig/20240610-085548-arnaudb.json
  • 08:54 godog: upgrade prometheus-statsd-exporter on webperf - T302373
  • 08:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1013.eqiad.wmnet
  • 08:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2015.codfw.wmnet
  • 08:51 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:50 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new entries for cr2-codfw peering to ssw1-d8-codfw - cmooney@cumin1002"
  • 08:50 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new entries for cr2-codfw peering to ssw1-d8-codfw - cmooney@cumin1002"
  • 08:48 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4048.ulsfo.wmnet
  • 08:47 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 08:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet
  • 08:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2015.codfw.wmnet
  • 08:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P64511 and previous config saved to /var/cache/conftool/dbconfig/20240610-084550-marostegui.json
  • 08:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2014.codfw.wmnet
  • 08:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2014.codfw.wmnet
  • 08:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet
  • 08:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1012.eqiad.wmnet
  • 08:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64510 and previous config saved to /var/cache/conftool/dbconfig/20240610-084042-arnaudb.json
  • 08:39 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4048.ulsfo.wmnet
  • 08:39 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4048.ulsfo.wmnet
  • 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ping1004.eqiad.wmnet with OS bookworm
  • 08:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2014.codfw.wmnet
  • 08:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1012.eqiad.wmnet
  • 08:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2013.codfw.wmnet
  • 08:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1011.eqiad.wmnet
  • 08:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ping1004.eqiad.wmnet with reason: host reimage
  • 08:14 kostajh: UTC morning deploys done
  • 08:13 kharlan@deploy1002: Finished scap: Backport for IPInfo: Switch to using GeoLite2 data (T361884) (duration: 14m 07s)
  • 08:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 25%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64507 and previous config saved to /var/cache/conftool/dbconfig/20240610-081030-arnaudb.json
  • 08:04 kharlan@deploy1002: kharlan: Continuing with sync
  • 08:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1003.wikimedia.org with reason: Gerrit upgrade
  • 08:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1003.wikimedia.org with reason: Gerrit upgrade
  • 08:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit2002.wikimedia.org with reason: Gerrit upgrade
  • 08:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit2002.wikimedia.org with reason: Gerrit upgrade
  • 08:02 kharlan@deploy1002: kharlan: Backport for IPInfo: Switch to using GeoLite2 data (T361884) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:59 kharlan@deploy1002: Started scap: Backport for IPInfo: Switch to using GeoLite2 data (T361884)
  • 07:58 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2013.codfw.wmnet
  • 07:58 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet
  • 07:57 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ping1004.eqiad.wmnet with OS bookworm
  • 07:57 kharlan@deploy1002: kharlan: Backport for IPInfo: Switch to using GeoLite2 data (T361884) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:56 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 07:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 10%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64506 and previous config saved to /var/cache/conftool/dbconfig/20240610-075524-arnaudb.json
  • 07:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet
  • 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1010.eqiad.wmnet
  • 07:53 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 07:53 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 07:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2012.codfw.wmnet
  • 07:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2012.codfw.wmnet
  • 07:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64505 and previous config saved to /var/cache/conftool/dbconfig/20240610-075056-root.json
  • 07:50 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 07:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1010.eqiad.wmnet
  • 07:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2207.codfw.wmnet
  • 07:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2012.codfw.wmnet
  • 07:43 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db2207.codfw.wmnet
  • 07:41 arnaudb@cumin1002: dbctl commit (dc=all): 'depool db2207 maintenance', diff saved to https://phabricator.wikimedia.org/P64504 and previous config saved to /var/cache/conftool/dbconfig/20240610-074157-arnaudb.json
  • 07:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: maintenance
  • 07:41 kharlan@deploy1002: Started scap: Backport for IPInfo: Switch to using GeoLite2 data (T361884)
  • 07:41 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: maintenance
  • 07:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Revert db2207 with weight 500 T367019', diff saved to https://phabricator.wikimedia.org/P64503 and previous config saved to /var/cache/conftool/dbconfig/20240610-073838-arnaudb.json
  • 07:37 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 07:37 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet
  • 07:37 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1010.eqiad.wmnet
  • 07:37 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet
  • 07:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet
  • 07:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1009.eqiad.wmnet
  • 07:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64502 and previous config saved to /var/cache/conftool/dbconfig/20240610-073549-root.json
  • 07:35 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 07:34 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 07:33 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 07:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2012.codfw.wmnet
  • 07:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2011.codfw.wmnet
  • 07:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2011.codfw.wmnet
  • 07:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1009.eqiad.wmnet
  • 07:26 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
  • 07:25 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
  • 07:24 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
  • 07:23 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
  • 07:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2011.codfw.wmnet
  • 07:22 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
  • 07:22 jayme@deploy1002: helmfile [staging] START helmfile.d/services/push-notifications: apply
  • 07:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64501 and previous config saved to /var/cache/conftool/dbconfig/20240610-072043-root.json
  • 07:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet
  • 07:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2011.codfw.wmnet
  • 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2010.codfw.wmnet
  • 07:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet
  • 07:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2010.codfw.wmnet
  • 07:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64500 and previous config saved to /var/cache/conftool/dbconfig/20240610-070537-root.json
  • 07:03 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2010.codfw.wmnet
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T364069)', diff saved to https://phabricator.wikimedia.org/P64499 and previous config saved to /var/cache/conftool/dbconfig/20240610-070249-marostegui.json
  • 07:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 07:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T364069)', diff saved to https://phabricator.wikimedia.org/P64498 and previous config saved to /var/cache/conftool/dbconfig/20240610-070224-marostegui.json
  • 07:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2009.codfw.wmnet
  • 06:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2009.codfw.wmnet
  • 06:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 06:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 06:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 06:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 06:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64497 and previous config saved to /var/cache/conftool/dbconfig/20240610-065640-ladsgroup.json
  • 06:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2009.codfw.wmnet
  • 06:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64496 and previous config saved to /var/cache/conftool/dbconfig/20240610-065031-root.json
  • 06:47 marostegui: dbmaint codfw s4 deploy schema change on db2140 T364299
  • 06:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P64495 and previous config saved to /var/cache/conftool/dbconfig/20240610-064716-marostegui.json
  • 06:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Long schema change
  • 06:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Long schema change
  • 06:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2009.codfw.wmnet
  • 06:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P64494 and previous config saved to /var/cache/conftool/dbconfig/20240610-064132-ladsgroup.json
  • 06:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db2207 with weight 0 T367019', diff saved to https://phabricator.wikimedia.org/P64493 and previous config saved to /var/cache/conftool/dbconfig/20240610-063912-arnaudb.json
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2140 T367017', diff saved to https://phabricator.wikimedia.org/P64492 and previous config saved to /var/cache/conftool/dbconfig/20240610-063904-root.json
  • 06:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s2 T367019
  • 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2179 to s4 primary T367017', diff saved to https://phabricator.wikimedia.org/P64491 and previous config saved to /var/cache/conftool/dbconfig/20240610-063830-root.json
  • 06:38 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s2 T367019
  • 06:38 marostegui: Starting s4 codfw failover from db2140 to db2179 - T367017
  • 06:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64490 and previous config saved to /var/cache/conftool/dbconfig/20240610-063524-root.json
  • 06:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P64489 and previous config saved to /var/cache/conftool/dbconfig/20240610-063208-marostegui.json
  • 06:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P64488 and previous config saved to /var/cache/conftool/dbconfig/20240610-062624-ladsgroup.json
  • 06:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64487 and previous config saved to /var/cache/conftool/dbconfig/20240610-062017-root.json
  • 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2179 from API/vslow/dump T367017', diff saved to https://phabricator.wikimedia.org/P64486 and previous config saved to /var/cache/conftool/dbconfig/20240610-061939-root.json
  • 06:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s4 T367017
  • 06:18 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2179 with weight 0 T367017', diff saved to https://phabricator.wikimedia.org/P64485 and previous config saved to /var/cache/conftool/dbconfig/20240610-061849-root.json
  • 06:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s4 T367017
  • 06:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T364069)', diff saved to https://phabricator.wikimedia.org/P64484 and previous config saved to /var/cache/conftool/dbconfig/20240610-061658-marostegui.json
  • 06:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64483 and previous config saved to /var/cache/conftool/dbconfig/20240610-061116-ladsgroup.json
  • 05:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 05:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 05:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T352010)', diff saved to https://phabricator.wikimedia.org/P64482 and previous config saved to /var/cache/conftool/dbconfig/20240610-052941-ladsgroup.json
  • 05:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P64481 and previous config saved to /var/cache/conftool/dbconfig/20240610-051432-ladsgroup.json
  • 05:13 marostegui: dbmaint codfw s7 deploy schema change on db2218 T364299
  • 05:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Long schema change
  • 05:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Long schema change
  • 05:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2218 T366875', diff saved to https://phabricator.wikimedia.org/P64480 and previous config saved to /var/cache/conftool/dbconfig/20240610-050738-root.json
  • 05:06 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2121 to s7 primary T366875', diff saved to https://phabricator.wikimedia.org/P64479 and previous config saved to /var/cache/conftool/dbconfig/20240610-050637-marostegui.json
  • 05:06 marostegui: Starting s7 codfw failover from db2218 to db2121 - T366875
  • 04:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P64478 and previous config saved to /var/cache/conftool/dbconfig/20240610-045922-ladsgroup.json
  • 04:52 kart_: Updated Apertium to 2024-06-07-143238-production (T356252)
  • 04:49 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
  • 04:49 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply
  • 04:44 marostegui: Rename flaggedpage_pending in s5 T365568
  • 04:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T352010)', diff saved to https://phabricator.wikimedia.org/P64477 and previous config saved to /var/cache/conftool/dbconfig/20240610-044414-ladsgroup.json
  • 04:42 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
  • 04:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/apertium: apply
  • 04:37 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply
  • 04:37 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2121 from API/vslow/dump T366875', diff saved to https://phabricator.wikimedia.org/P64476 and previous config saved to /var/cache/conftool/dbconfig/20240610-043741-root.json
  • 04:37 kartik@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply
  • 04:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T366875
  • 04:36 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2121 with weight 0 T366875', diff saved to https://phabricator.wikimedia.org/P64475 and previous config saved to /var/cache/conftool/dbconfig/20240610-043649-root.json
  • 04:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s7 T366875
  • 04:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T364069)', diff saved to https://phabricator.wikimedia.org/P64474 and previous config saved to /var/cache/conftool/dbconfig/20240610-043615-marostegui.json
  • 04:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 04:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 04:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 04:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance

2024-06-09

  • 23:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64473 and previous config saved to /var/cache/conftool/dbconfig/20240609-234110-ladsgroup.json
  • 23:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 23:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 23:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T352010)', diff saved to https://phabricator.wikimedia.org/P64472 and previous config saved to /var/cache/conftool/dbconfig/20240609-234047-ladsgroup.json
  • 23:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 23:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 23:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T352010)', diff saved to https://phabricator.wikimedia.org/P64471 and previous config saved to /var/cache/conftool/dbconfig/20240609-232921-ladsgroup.json
  • 23:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P64470 and previous config saved to /var/cache/conftool/dbconfig/20240609-232539-ladsgroup.json
  • 23:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P64469 and previous config saved to /var/cache/conftool/dbconfig/20240609-231413-ladsgroup.json
  • 23:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P64468 and previous config saved to /var/cache/conftool/dbconfig/20240609-231031-ladsgroup.json
  • 22:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P64467 and previous config saved to /var/cache/conftool/dbconfig/20240609-225905-ladsgroup.json
  • 22:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T352010)', diff saved to https://phabricator.wikimedia.org/P64466 and previous config saved to /var/cache/conftool/dbconfig/20240609-225523-ladsgroup.json
  • 22:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T352010)', diff saved to https://phabricator.wikimedia.org/P64465 and previous config saved to /var/cache/conftool/dbconfig/20240609-224357-ladsgroup.json
  • 19:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T352010)', diff saved to https://phabricator.wikimedia.org/P64464 and previous config saved to /var/cache/conftool/dbconfig/20240609-192428-ladsgroup.json
  • 19:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 19:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 19:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T352010)', diff saved to https://phabricator.wikimedia.org/P64463 and previous config saved to /var/cache/conftool/dbconfig/20240609-192404-ladsgroup.json
  • 19:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P64462 and previous config saved to /var/cache/conftool/dbconfig/20240609-190856-ladsgroup.json
  • 18:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P64461 and previous config saved to /var/cache/conftool/dbconfig/20240609-185347-ladsgroup.json
  • 18:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T352010)', diff saved to https://phabricator.wikimedia.org/P64460 and previous config saved to /var/cache/conftool/dbconfig/20240609-183839-ladsgroup.json
  • 16:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T364299)', diff saved to https://phabricator.wikimedia.org/P64459 and previous config saved to /var/cache/conftool/dbconfig/20240609-160621-marostegui.json
  • 15:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P64458 and previous config saved to /var/cache/conftool/dbconfig/20240609-155113-marostegui.json
  • 15:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P64457 and previous config saved to /var/cache/conftool/dbconfig/20240609-153605-marostegui.json
  • 15:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T364299)', diff saved to https://phabricator.wikimedia.org/P64456 and previous config saved to /var/cache/conftool/dbconfig/20240609-152057-marostegui.json
  • 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T352010)', diff saved to https://phabricator.wikimedia.org/P64455 and previous config saved to /var/cache/conftool/dbconfig/20240609-152020-ladsgroup.json
  • 15:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 15:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T352010)', diff saved to https://phabricator.wikimedia.org/P64454 and previous config saved to /var/cache/conftool/dbconfig/20240609-151956-ladsgroup.json
  • 15:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P64453 and previous config saved to /var/cache/conftool/dbconfig/20240609-150448-ladsgroup.json
  • 14:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P64452 and previous config saved to /var/cache/conftool/dbconfig/20240609-144940-ladsgroup.json
  • 14:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T352010)', diff saved to https://phabricator.wikimedia.org/P64451 and previous config saved to /var/cache/conftool/dbconfig/20240609-143432-ladsgroup.json
  • 14:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T352010)', diff saved to https://phabricator.wikimedia.org/P64450 and previous config saved to /var/cache/conftool/dbconfig/20240609-143128-ladsgroup.json
  • 14:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 14:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 14:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T352010)', diff saved to https://phabricator.wikimedia.org/P64449 and previous config saved to /var/cache/conftool/dbconfig/20240609-143105-ladsgroup.json
  • 14:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T364069)', diff saved to https://phabricator.wikimedia.org/P64448 and previous config saved to /var/cache/conftool/dbconfig/20240609-143032-marostegui.json
  • 14:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P64447 and previous config saved to /var/cache/conftool/dbconfig/20240609-141557-ladsgroup.json
  • 14:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P64446 and previous config saved to /var/cache/conftool/dbconfig/20240609-141524-marostegui.json
  • 14:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P64445 and previous config saved to /var/cache/conftool/dbconfig/20240609-140049-ladsgroup.json
  • 14:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P64444 and previous config saved to /var/cache/conftool/dbconfig/20240609-140016-marostegui.json
  • 13:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T352010)', diff saved to https://phabricator.wikimedia.org/P64443 and previous config saved to /var/cache/conftool/dbconfig/20240609-134541-ladsgroup.json
  • 13:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T364069)', diff saved to https://phabricator.wikimedia.org/P64442 and previous config saved to /var/cache/conftool/dbconfig/20240609-134508-marostegui.json
  • 12:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T364299)', diff saved to https://phabricator.wikimedia.org/P64441 and previous config saved to /var/cache/conftool/dbconfig/20240609-120817-marostegui.json
  • 12:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 12:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 12:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64440 and previous config saved to /var/cache/conftool/dbconfig/20240609-120753-marostegui.json
  • 12:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T364069)', diff saved to https://phabricator.wikimedia.org/P64439 and previous config saved to /var/cache/conftool/dbconfig/20240609-120400-marostegui.json
  • 12:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 12:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 11:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P64438 and previous config saved to /var/cache/conftool/dbconfig/20240609-115245-marostegui.json
  • 11:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P64437 and previous config saved to /var/cache/conftool/dbconfig/20240609-113737-marostegui.json
  • 11:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64436 and previous config saved to /var/cache/conftool/dbconfig/20240609-112229-marostegui.json
  • 11:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T352010)', diff saved to https://phabricator.wikimedia.org/P64435 and previous config saved to /var/cache/conftool/dbconfig/20240609-111945-ladsgroup.json
  • 11:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P64434 and previous config saved to /var/cache/conftool/dbconfig/20240609-110437-ladsgroup.json
  • 10:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P64433 and previous config saved to /var/cache/conftool/dbconfig/20240609-104929-ladsgroup.json
  • 10:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T352010)', diff saved to https://phabricator.wikimedia.org/P64432 and previous config saved to /var/cache/conftool/dbconfig/20240609-103421-ladsgroup.json
  • 09:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 09:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 09:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T364069)', diff saved to https://phabricator.wikimedia.org/P64431 and previous config saved to /var/cache/conftool/dbconfig/20240609-095854-marostegui.json
  • 09:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P64430 and previous config saved to /var/cache/conftool/dbconfig/20240609-094346-marostegui.json
  • 09:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P64429 and previous config saved to /var/cache/conftool/dbconfig/20240609-092837-marostegui.json
  • 09:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T364069)', diff saved to https://phabricator.wikimedia.org/P64428 and previous config saved to /var/cache/conftool/dbconfig/20240609-091329-marostegui.json
  • 08:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T364069)', diff saved to https://phabricator.wikimedia.org/P64427 and previous config saved to /var/cache/conftool/dbconfig/20240609-080149-marostegui.json
  • 08:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 08:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 08:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64426 and previous config saved to /var/cache/conftool/dbconfig/20240609-080125-marostegui.json
  • 07:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64425 and previous config saved to /var/cache/conftool/dbconfig/20240609-075533-marostegui.json
  • 07:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 07:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 07:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P64424 and previous config saved to /var/cache/conftool/dbconfig/20240609-074617-marostegui.json
  • 07:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P64423 and previous config saved to /var/cache/conftool/dbconfig/20240609-073109-marostegui.json
  • 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64422 and previous config saved to /var/cache/conftool/dbconfig/20240609-071601-marostegui.json
  • 06:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T352010)', diff saved to https://phabricator.wikimedia.org/P64421 and previous config saved to /var/cache/conftool/dbconfig/20240609-064733-ladsgroup.json
  • 06:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 06:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 06:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T352010)', diff saved to https://phabricator.wikimedia.org/P64420 and previous config saved to /var/cache/conftool/dbconfig/20240609-064709-ladsgroup.json
  • 06:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T352010)', diff saved to https://phabricator.wikimedia.org/P64419 and previous config saved to /var/cache/conftool/dbconfig/20240609-063607-ladsgroup.json
  • 06:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 06:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 06:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T352010)', diff saved to https://phabricator.wikimedia.org/P64418 and previous config saved to /var/cache/conftool/dbconfig/20240609-063543-ladsgroup.json
  • 06:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P64417 and previous config saved to /var/cache/conftool/dbconfig/20240609-063201-ladsgroup.json
  • 06:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P64416 and previous config saved to /var/cache/conftool/dbconfig/20240609-062033-ladsgroup.json
  • 06:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P64415 and previous config saved to /var/cache/conftool/dbconfig/20240609-061653-ladsgroup.json
  • 06:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P64414 and previous config saved to /var/cache/conftool/dbconfig/20240609-060525-ladsgroup.json
  • 06:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T352010)', diff saved to https://phabricator.wikimedia.org/P64413 and previous config saved to /var/cache/conftool/dbconfig/20240609-060146-ladsgroup.json
  • 05:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T352010)', diff saved to https://phabricator.wikimedia.org/P64412 and previous config saved to /var/cache/conftool/dbconfig/20240609-055017-ladsgroup.json
  • 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64411 and previous config saved to /var/cache/conftool/dbconfig/20240609-054833-marostegui.json
  • 05:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 05:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T364069)', diff saved to https://phabricator.wikimedia.org/P64410 and previous config saved to /var/cache/conftool/dbconfig/20240609-054809-marostegui.json
  • 05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P64409 and previous config saved to /var/cache/conftool/dbconfig/20240609-053301-marostegui.json
  • 05:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T352010)', diff saved to https://phabricator.wikimedia.org/P64408 and previous config saved to /var/cache/conftool/dbconfig/20240609-052358-ladsgroup.json
  • 05:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T352010)', diff saved to https://phabricator.wikimedia.org/P64407 and previous config saved to /var/cache/conftool/dbconfig/20240609-052334-ladsgroup.json
  • 05:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P64406 and previous config saved to /var/cache/conftool/dbconfig/20240609-051753-marostegui.json
  • 05:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P64405 and previous config saved to /var/cache/conftool/dbconfig/20240609-050826-ladsgroup.json
  • 05:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T364069)', diff saved to https://phabricator.wikimedia.org/P64404 and previous config saved to /var/cache/conftool/dbconfig/20240609-050245-marostegui.json
  • 04:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P64403 and previous config saved to /var/cache/conftool/dbconfig/20240609-045319-ladsgroup.json
  • 04:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T352010)', diff saved to https://phabricator.wikimedia.org/P64402 and previous config saved to /var/cache/conftool/dbconfig/20240609-043811-ladsgroup.json
  • 02:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T364069)', diff saved to https://phabricator.wikimedia.org/P64401 and previous config saved to /var/cache/conftool/dbconfig/20240609-025921-marostegui.json
  • 02:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 02:59 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 02:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64400 and previous config saved to /var/cache/conftool/dbconfig/20240609-025856-marostegui.json
  • 02:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P64399 and previous config saved to /var/cache/conftool/dbconfig/20240609-024349-marostegui.json
  • 02:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P64398 and previous config saved to /var/cache/conftool/dbconfig/20240609-022840-marostegui.json
  • 02:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64397 and previous config saved to /var/cache/conftool/dbconfig/20240609-021333-marostegui.json
  • 02:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T352010)', diff saved to https://phabricator.wikimedia.org/P64396 and previous config saved to /var/cache/conftool/dbconfig/20240609-020120-ladsgroup.json
  • 02:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 02:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 01:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T364299)', diff saved to https://phabricator.wikimedia.org/P64395 and previous config saved to /var/cache/conftool/dbconfig/20240609-012432-marostegui.json
  • 01:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P64394 and previous config saved to /var/cache/conftool/dbconfig/20240609-010922-marostegui.json
  • 00:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P64393 and previous config saved to /var/cache/conftool/dbconfig/20240609-005414-marostegui.json
  • 00:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T364299)', diff saved to https://phabricator.wikimedia.org/P64392 and previous config saved to /var/cache/conftool/dbconfig/20240609-003906-marostegui.json
  • 00:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64391 and previous config saved to /var/cache/conftool/dbconfig/20240609-000718-marostegui.json
  • 00:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 00:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 00:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 00:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 00:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T364069)', diff saved to https://phabricator.wikimedia.org/P64390 and previous config saved to /var/cache/conftool/dbconfig/20240609-000640-marostegui.json

2024-06-08

  • 23:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P64389 and previous config saved to /var/cache/conftool/dbconfig/20240608-235132-marostegui.json
  • 23:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P64388 and previous config saved to /var/cache/conftool/dbconfig/20240608-233623-marostegui.json
  • 23:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T364069)', diff saved to https://phabricator.wikimedia.org/P64387 and previous config saved to /var/cache/conftool/dbconfig/20240608-232115-marostegui.json
  • 22:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T352010)', diff saved to https://phabricator.wikimedia.org/P64386 and previous config saved to /var/cache/conftool/dbconfig/20240608-222832-ladsgroup.json
  • 22:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 22:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 22:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228 (T352010)', diff saved to https://phabricator.wikimedia.org/P64385 and previous config saved to /var/cache/conftool/dbconfig/20240608-222808-ladsgroup.json
  • 22:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228', diff saved to https://phabricator.wikimedia.org/P64384 and previous config saved to /var/cache/conftool/dbconfig/20240608-221259-ladsgroup.json
  • 21:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228', diff saved to https://phabricator.wikimedia.org/P64383 and previous config saved to /var/cache/conftool/dbconfig/20240608-215751-ladsgroup.json
  • 21:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228 (T352010)', diff saved to https://phabricator.wikimedia.org/P64382 and previous config saved to /var/cache/conftool/dbconfig/20240608-214243-ladsgroup.json
  • 21:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T364299)', diff saved to https://phabricator.wikimedia.org/P64381 and previous config saved to /var/cache/conftool/dbconfig/20240608-212701-marostegui.json
  • 21:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 21:26 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 21:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T364299)', diff saved to https://phabricator.wikimedia.org/P64380 and previous config saved to /var/cache/conftool/dbconfig/20240608-212637-marostegui.json
  • 21:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T364069)', diff saved to https://phabricator.wikimedia.org/P64379 and previous config saved to /var/cache/conftool/dbconfig/20240608-211527-marostegui.json
  • 21:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 21:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 21:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T364069)', diff saved to https://phabricator.wikimedia.org/P64378 and previous config saved to /var/cache/conftool/dbconfig/20240608-211503-marostegui.json
  • 21:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P64377 and previous config saved to /var/cache/conftool/dbconfig/20240608-211128-marostegui.json
  • 20:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P64376 and previous config saved to /var/cache/conftool/dbconfig/20240608-205955-marostegui.json
  • 20:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P64375 and previous config saved to /var/cache/conftool/dbconfig/20240608-205618-marostegui.json
  • 20:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P64374 and previous config saved to /var/cache/conftool/dbconfig/20240608-204447-marostegui.json
  • 20:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T364299)', diff saved to https://phabricator.wikimedia.org/P64373 and previous config saved to /var/cache/conftool/dbconfig/20240608-204106-marostegui.json
  • 20:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T364069)', diff saved to https://phabricator.wikimedia.org/P64372 and previous config saved to /var/cache/conftool/dbconfig/20240608-202939-marostegui.json
  • 20:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T352010)', diff saved to https://phabricator.wikimedia.org/P64371 and previous config saved to /var/cache/conftool/dbconfig/20240608-202016-ladsgroup.json
  • 20:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 20:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 20:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 20:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 20:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T352010)', diff saved to https://phabricator.wikimedia.org/P64370 and previous config saved to /var/cache/conftool/dbconfig/20240608-201948-ladsgroup.json
  • 20:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P64369 and previous config saved to /var/cache/conftool/dbconfig/20240608-200440-ladsgroup.json
  • 19:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P64368 and previous config saved to /var/cache/conftool/dbconfig/20240608-194932-ladsgroup.json
  • 19:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T352010)', diff saved to https://phabricator.wikimedia.org/P64367 and previous config saved to /var/cache/conftool/dbconfig/20240608-193424-ladsgroup.json
  • 18:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T352010)', diff saved to https://phabricator.wikimedia.org/P64366 and previous config saved to /var/cache/conftool/dbconfig/20240608-182811-ladsgroup.json
  • 18:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 18:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 18:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T352010)', diff saved to https://phabricator.wikimedia.org/P64365 and previous config saved to /var/cache/conftool/dbconfig/20240608-182747-ladsgroup.json
  • 18:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2129 (T364069)', diff saved to https://phabricator.wikimedia.org/P64364 and previous config saved to /var/cache/conftool/dbconfig/20240608-181559-marostegui.json
  • 18:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 18:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 18:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T364069)', diff saved to https://phabricator.wikimedia.org/P64363 and previous config saved to /var/cache/conftool/dbconfig/20240608-181536-marostegui.json
  • 18:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P64362 and previous config saved to /var/cache/conftool/dbconfig/20240608-181238-ladsgroup.json
  • 18:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P64361 and previous config saved to /var/cache/conftool/dbconfig/20240608-180027-marostegui.json
  • 17:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P64360 and previous config saved to /var/cache/conftool/dbconfig/20240608-175730-ladsgroup.json
  • 17:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P64359 and previous config saved to /var/cache/conftool/dbconfig/20240608-174519-marostegui.json
  • 17:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T352010)', diff saved to https://phabricator.wikimedia.org/P64358 and previous config saved to /var/cache/conftool/dbconfig/20240608-174222-ladsgroup.json
  • 17:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T364069)', diff saved to https://phabricator.wikimedia.org/P64357 and previous config saved to /var/cache/conftool/dbconfig/20240608-173011-marostegui.json
  • 17:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T364299)', diff saved to https://phabricator.wikimedia.org/P64356 and previous config saved to /var/cache/conftool/dbconfig/20240608-171628-marostegui.json
  • 17:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 17:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 15:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T364069)', diff saved to https://phabricator.wikimedia.org/P64355 and previous config saved to /var/cache/conftool/dbconfig/20240608-152142-marostegui.json
  • 15:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 15:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 14:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 14:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 14:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T364299)', diff saved to https://phabricator.wikimedia.org/P64354 and previous config saved to /var/cache/conftool/dbconfig/20240608-144229-marostegui.json
  • 14:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P64353 and previous config saved to /var/cache/conftool/dbconfig/20240608-142721-marostegui.json
  • 14:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1228 (T352010)', diff saved to https://phabricator.wikimedia.org/P64352 and previous config saved to /var/cache/conftool/dbconfig/20240608-141514-ladsgroup.json
  • 14:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1228.eqiad.wmnet with reason: Maintenance
  • 14:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1228.eqiad.wmnet with reason: Maintenance
  • 14:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T352010)', diff saved to https://phabricator.wikimedia.org/P64351 and previous config saved to /var/cache/conftool/dbconfig/20240608-141450-ladsgroup.json
  • 14:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P64350 and previous config saved to /var/cache/conftool/dbconfig/20240608-141212-marostegui.json
  • 13:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P64349 and previous config saved to /var/cache/conftool/dbconfig/20240608-135942-ladsgroup.json
  • 13:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 13:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 13:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T364299)', diff saved to https://phabricator.wikimedia.org/P64348 and previous config saved to /var/cache/conftool/dbconfig/20240608-135704-marostegui.json
  • 13:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P64347 and previous config saved to /var/cache/conftool/dbconfig/20240608-134434-ladsgroup.json
  • 13:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 13:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 13:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T352010)', diff saved to https://phabricator.wikimedia.org/P64346 and previous config saved to /var/cache/conftool/dbconfig/20240608-134110-ladsgroup.json
  • 13:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T352010)', diff saved to https://phabricator.wikimedia.org/P64345 and previous config saved to /var/cache/conftool/dbconfig/20240608-132926-ladsgroup.json
  • 13:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P64344 and previous config saved to /var/cache/conftool/dbconfig/20240608-132602-ladsgroup.json
  • 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P64343 and previous config saved to /var/cache/conftool/dbconfig/20240608-131054-ladsgroup.json
  • 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T352010)', diff saved to https://phabricator.wikimedia.org/P64342 and previous config saved to /var/cache/conftool/dbconfig/20240608-125546-ladsgroup.json
  • 11:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T352010)', diff saved to https://phabricator.wikimedia.org/P64341 and previous config saved to /var/cache/conftool/dbconfig/20240608-113928-ladsgroup.json
  • 11:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T352010)', diff saved to https://phabricator.wikimedia.org/P64340 and previous config saved to /var/cache/conftool/dbconfig/20240608-113905-ladsgroup.json
  • 11:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P64339 and previous config saved to /var/cache/conftool/dbconfig/20240608-112357-ladsgroup.json
  • 11:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P64338 and previous config saved to /var/cache/conftool/dbconfig/20240608-110849-ladsgroup.json
  • 10:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T352010)', diff saved to https://phabricator.wikimedia.org/P64337 and previous config saved to /var/cache/conftool/dbconfig/20240608-105341-ladsgroup.json
  • 10:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T364299)', diff saved to https://phabricator.wikimedia.org/P64336 and previous config saved to /var/cache/conftool/dbconfig/20240608-105032-marostegui.json
  • 10:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 10:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 10:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T364299)', diff saved to https://phabricator.wikimedia.org/P64335 and previous config saved to /var/cache/conftool/dbconfig/20240608-105008-marostegui.json
  • 10:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P64334 and previous config saved to /var/cache/conftool/dbconfig/20240608-103501-marostegui.json
  • 10:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P64333 and previous config saved to /var/cache/conftool/dbconfig/20240608-101953-marostegui.json
  • 10:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T364299)', diff saved to https://phabricator.wikimedia.org/P64332 and previous config saved to /var/cache/conftool/dbconfig/20240608-100443-marostegui.json
  • 06:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T364299)', diff saved to https://phabricator.wikimedia.org/P64331 and previous config saved to /var/cache/conftool/dbconfig/20240608-064353-marostegui.json
  • 06:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 06:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 06:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T364299)', diff saved to https://phabricator.wikimedia.org/P64330 and previous config saved to /var/cache/conftool/dbconfig/20240608-064328-marostegui.json
  • 06:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P64329 and previous config saved to /var/cache/conftool/dbconfig/20240608-062820-marostegui.json
  • 06:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P64328 and previous config saved to /var/cache/conftool/dbconfig/20240608-061313-marostegui.json
  • 05:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T364299)', diff saved to https://phabricator.wikimedia.org/P64327 and previous config saved to /var/cache/conftool/dbconfig/20240608-055804-marostegui.json
  • 05:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T352010)', diff saved to https://phabricator.wikimedia.org/P64326 and previous config saved to /var/cache/conftool/dbconfig/20240608-054609-ladsgroup.json
  • 05:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 05:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 05:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T352010)', diff saved to https://phabricator.wikimedia.org/P64325 and previous config saved to /var/cache/conftool/dbconfig/20240608-054545-ladsgroup.json
  • 05:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P64324 and previous config saved to /var/cache/conftool/dbconfig/20240608-053037-ladsgroup.json
  • 05:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T352010)', diff saved to https://phabricator.wikimedia.org/P64323 and previous config saved to /var/cache/conftool/dbconfig/20240608-052817-ladsgroup.json
  • 05:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 05:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 05:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64322 and previous config saved to /var/cache/conftool/dbconfig/20240608-052753-ladsgroup.json
  • 05:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P64321 and previous config saved to /var/cache/conftool/dbconfig/20240608-051529-ladsgroup.json
  • 05:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P64320 and previous config saved to /var/cache/conftool/dbconfig/20240608-051244-ladsgroup.json
  • 05:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T352010)', diff saved to https://phabricator.wikimedia.org/P64319 and previous config saved to /var/cache/conftool/dbconfig/20240608-050021-ladsgroup.json
  • 04:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P64318 and previous config saved to /var/cache/conftool/dbconfig/20240608-045736-ladsgroup.json
  • 04:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64317 and previous config saved to /var/cache/conftool/dbconfig/20240608-044228-ladsgroup.json
  • 02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T352010)', diff saved to https://phabricator.wikimedia.org/P64316 and previous config saved to /var/cache/conftool/dbconfig/20240608-024534-ladsgroup.json
  • 02:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 02:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T352010)', diff saved to https://phabricator.wikimedia.org/P64315 and previous config saved to /var/cache/conftool/dbconfig/20240608-024511-ladsgroup.json
  • 02:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T364299)', diff saved to https://phabricator.wikimedia.org/P64314 and previous config saved to /var/cache/conftool/dbconfig/20240608-024455-marostegui.json
  • 02:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 02:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 02:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T364299)', diff saved to https://phabricator.wikimedia.org/P64313 and previous config saved to /var/cache/conftool/dbconfig/20240608-024431-marostegui.json
  • 02:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T352010)', diff saved to https://phabricator.wikimedia.org/P64312 and previous config saved to /var/cache/conftool/dbconfig/20240608-023735-ladsgroup.json
  • 02:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 02:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 02:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T352010)', diff saved to https://phabricator.wikimedia.org/P64311 and previous config saved to /var/cache/conftool/dbconfig/20240608-023711-ladsgroup.json
  • 02:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P64310 and previous config saved to /var/cache/conftool/dbconfig/20240608-023003-ladsgroup.json
  • 02:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P64309 and previous config saved to /var/cache/conftool/dbconfig/20240608-022923-marostegui.json
  • 02:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P64308 and previous config saved to /var/cache/conftool/dbconfig/20240608-022203-ladsgroup.json
  • 02:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P64307 and previous config saved to /var/cache/conftool/dbconfig/20240608-021455-ladsgroup.json
  • 02:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P64306 and previous config saved to /var/cache/conftool/dbconfig/20240608-021415-marostegui.json
  • 02:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P64305 and previous config saved to /var/cache/conftool/dbconfig/20240608-020655-ladsgroup.json
  • 01:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T352010)', diff saved to https://phabricator.wikimedia.org/P64304 and previous config saved to /var/cache/conftool/dbconfig/20240608-015947-ladsgroup.json
  • 01:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T364299)', diff saved to https://phabricator.wikimedia.org/P64303 and previous config saved to /var/cache/conftool/dbconfig/20240608-015906-marostegui.json
  • 01:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T352010)', diff saved to https://phabricator.wikimedia.org/P64302 and previous config saved to /var/cache/conftool/dbconfig/20240608-015147-ladsgroup.json

2024-06-07

  • 22:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T364299)', diff saved to https://phabricator.wikimedia.org/P64301 and previous config saved to /var/cache/conftool/dbconfig/20240607-224306-marostegui.json
  • 22:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 22:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 22:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T364299)', diff saved to https://phabricator.wikimedia.org/P64300 and previous config saved to /var/cache/conftool/dbconfig/20240607-224242-marostegui.json
  • 22:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T364069)', diff saved to https://phabricator.wikimedia.org/P64299 and previous config saved to /var/cache/conftool/dbconfig/20240607-223300-marostegui.json
  • 22:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P64298 and previous config saved to /var/cache/conftool/dbconfig/20240607-222734-marostegui.json
  • 22:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P64297 and previous config saved to /var/cache/conftool/dbconfig/20240607-221752-marostegui.json
  • 22:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P64296 and previous config saved to /var/cache/conftool/dbconfig/20240607-221224-marostegui.json
  • 22:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P64295 and previous config saved to /var/cache/conftool/dbconfig/20240607-220244-marostegui.json
  • 21:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T364299)', diff saved to https://phabricator.wikimedia.org/P64294 and previous config saved to /var/cache/conftool/dbconfig/20240607-215716-marostegui.json
  • 21:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T364069)', diff saved to https://phabricator.wikimedia.org/P64293 and previous config saved to /var/cache/conftool/dbconfig/20240607-214736-marostegui.json
  • 21:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64292 and previous config saved to /var/cache/conftool/dbconfig/20240607-211842-ladsgroup.json
  • 21:18 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 21:18 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 21:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64291 and previous config saved to /var/cache/conftool/dbconfig/20240607-211818-ladsgroup.json
  • 21:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P64290 and previous config saved to /var/cache/conftool/dbconfig/20240607-210310-ladsgroup.json
  • 20:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P64289 and previous config saved to /var/cache/conftool/dbconfig/20240607-204801-ladsgroup.json
  • 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64288 and previous config saved to /var/cache/conftool/dbconfig/20240607-203253-ladsgroup.json
  • 19:42 dduvall@deploy1002: Finished scap: Backport for mediawiki.diff: Fix color regression and also use one more token (T366845) (duration: 16m 10s)
  • 19:33 dduvall@deploy1002: dduvall: Continuing with sync
  • 19:28 dduvall@deploy1002: dduvall: Backport for mediawiki.diff: Fix color regression and also use one more token (T366845) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 19:26 dduvall@deploy1002: Started scap: Backport for mediawiki.diff: Fix color regression and also use one more token (T366845)
  • 19:25 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 19:25 eevans@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 19:07 cdanis@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 19:06 cdanis@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 18:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T364299)', diff saved to https://phabricator.wikimedia.org/P64287 and previous config saved to /var/cache/conftool/dbconfig/20240607-184232-marostegui.json
  • 18:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 18:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 18:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64286 and previous config saved to /var/cache/conftool/dbconfig/20240607-184208-marostegui.json
  • 18:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P64285 and previous config saved to /var/cache/conftool/dbconfig/20240607-182700-marostegui.json
  • 18:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P64284 and previous config saved to /var/cache/conftool/dbconfig/20240607-181151-marostegui.json
  • 18:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T352010)', diff saved to https://phabricator.wikimedia.org/P64283 and previous config saved to /var/cache/conftool/dbconfig/20240607-181021-ladsgroup.json
  • 18:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 18:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 18:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T352010)', diff saved to https://phabricator.wikimedia.org/P64282 and previous config saved to /var/cache/conftool/dbconfig/20240607-180958-ladsgroup.json
  • 17:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64281 and previous config saved to /var/cache/conftool/dbconfig/20240607-175643-marostegui.json
  • 17:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P64280 and previous config saved to /var/cache/conftool/dbconfig/20240607-175450-ladsgroup.json
  • 17:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P64279 and previous config saved to /var/cache/conftool/dbconfig/20240607-173942-ladsgroup.json
  • 17:31 topranks: resetting line card 1/0 on cr2-codfw to enable new 100G link to ssw1-d8-codfw T364095
  • 17:28 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on cloudsw1-b1-codfw.mgmt,cr2-eqord,pfw3-codfw with reason: bouncing fpc 1 pic 0 on cr2-codfw
  • 17:28 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on cloudsw1-b1-codfw.mgmt,cr2-eqord,pfw3-codfw with reason: bouncing fpc 1 pic 0 on cr2-codfw
  • 17:24 topranks: re-route traffic from cr2-eqord away from circuit to cr2-codfw to allow for line card reset T364095
  • 17:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T352010)', diff saved to https://phabricator.wikimedia.org/P64278 and previous config saved to /var/cache/conftool/dbconfig/20240607-172432-ladsgroup.json
  • 17:23 topranks: disable IP transit to Lumen AS3356 from cr2-eqiad to allow line card reset T364095
  • 17:12 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt with reason: bouncing fpc 1 pic 0 on cr2-codfw
  • 17:12 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt with reason: bouncing fpc 1 pic 0 on cr2-codfw
  • 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T352010)', diff saved to https://phabricator.wikimedia.org/P64277 and previous config saved to /var/cache/conftool/dbconfig/20240607-170634-ladsgroup.json
  • 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T352010)', diff saved to https://phabricator.wikimedia.org/P64276 and previous config saved to /var/cache/conftool/dbconfig/20240607-170555-ladsgroup.json
  • 16:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64275 and previous config saved to /var/cache/conftool/dbconfig/20240607-165616-marostegui.json
  • 16:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 16:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 16:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T364299)', diff saved to https://phabricator.wikimedia.org/P64274 and previous config saved to /var/cache/conftool/dbconfig/20240607-165533-marostegui.json
  • 16:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P64273 and previous config saved to /var/cache/conftool/dbconfig/20240607-165047-ladsgroup.json
  • 16:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P64272 and previous config saved to /var/cache/conftool/dbconfig/20240607-164025-marostegui.json
  • 16:38 cdobbins@cumin1002: conftool action : set/pooled=yes; selector: name=4048.ulsfo.wmnet
  • 16:36 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=cp4048.ulsfo.wmnet
  • 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P64271 and previous config saved to /var/cache/conftool/dbconfig/20240607-163539-ladsgroup.json
  • 16:32 topranks: enabling new transport circuit from cr1-drmrs to cr2-eqiad T343385
  • 16:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P64270 and previous config saved to /var/cache/conftool/dbconfig/20240607-162516-marostegui.json
  • 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T352010)', diff saved to https://phabricator.wikimedia.org/P64269 and previous config saved to /var/cache/conftool/dbconfig/20240607-162031-ladsgroup.json
  • 16:19 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 16:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T364299)', diff saved to https://phabricator.wikimedia.org/P64268 and previous config saved to /var/cache/conftool/dbconfig/20240607-161007-marostegui.json
  • 16:08 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 16:07 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:07 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for moved telxius transpoort eqiad drmrs - cmooney@cumin1002"
  • 16:06 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 16:06 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for moved telxius transpoort eqiad drmrs - cmooney@cumin1002"
  • 16:05 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 16:03 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 15:59 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 15:59 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 15:53 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:53 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merging pending cr2-codfw changes - sukhe@cumin1002"
  • 15:52 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merging pending cr2-codfw changes - sukhe@cumin1002"
  • 15:45 sukhe@cumin1002: START - Cookbook sre.dns.netbox
  • 15:37 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 15:35 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 15:34 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 15:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 15:30 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 15:30 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 15:25 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 15:24 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 15:24 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 15:24 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 15:24 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 15:23 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 15:14 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Apply update to Java 11 - eevans@cumin1002
  • 15:10 topranks: disabling netbox service on primary netbox server netbox1001 to restore db from backup
  • 15:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on netbox1002.eqiad.wmnet with reason: Restoring DB from backup on netboxdb1002
  • 15:01 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on netbox1002.eqiad.wmnet with reason: Restoring DB from backup on netboxdb1002
  • 14:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T352010)', diff saved to https://phabricator.wikimedia.org/P64267 and previous config saved to /var/cache/conftool/dbconfig/20240607-145937-ladsgroup.json
  • 14:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 14:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 14:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T352010)', diff saved to https://phabricator.wikimedia.org/P64266 and previous config saved to /var/cache/conftool/dbconfig/20240607-145913-ladsgroup.json
  • 14:55 topranks: enabling port et-1/0/2 for 100G mode on cr2-codfw T364095
  • 14:53 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Apply update to Java 11 - eevans@cumin1002
  • 14:46 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:46 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new entries for cr2-codfw peering to ssw1-d8-codfw - cmooney@cumin1002"
  • 14:45 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new entries for cr2-codfw peering to ssw1-d8-codfw - cmooney@cumin1002"
  • 14:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P64265 and previous config saved to /var/cache/conftool/dbconfig/20240607-144404-ladsgroup.json
  • 14:43 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 14:39 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:39 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:38 jhathaway@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:38 jhathaway@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:37 jhathaway@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:37 jhathaway@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P64264 and previous config saved to /var/cache/conftool/dbconfig/20240607-142856-ladsgroup.json
  • 14:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T352010)', diff saved to https://phabricator.wikimedia.org/P64263 and previous config saved to /var/cache/conftool/dbconfig/20240607-141349-ladsgroup.json
  • 14:02 Emperor: restart swift-proxy on ms-fe1009 ms-fe1011 ms-fe1012 ms-fe1014 T360913
  • 13:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T364069)', diff saved to https://phabricator.wikimedia.org/P64262 and previous config saved to /var/cache/conftool/dbconfig/20240607-132342-marostegui.json
  • 13:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 13:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 13:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T364069)', diff saved to https://phabricator.wikimedia.org/P64261 and previous config saved to /var/cache/conftool/dbconfig/20240607-132319-marostegui.json
  • 13:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P64260 and previous config saved to /var/cache/conftool/dbconfig/20240607-130811-marostegui.json
  • 13:05 moritzm: uploaded wmf-laptop 1.0.0 to component/wmf-laptop for bookworm-wikimedia
  • 13:04 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:04 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:02 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:01 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:01 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P64259 and previous config saved to /var/cache/conftool/dbconfig/20240607-125303-marostegui.json
  • 12:49 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64258 and previous config saved to /var/cache/conftool/dbconfig/20240607-124641-ladsgroup.json
  • 12:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 12:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 12:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64257 and previous config saved to /var/cache/conftool/dbconfig/20240607-124616-ladsgroup.json
  • 12:44 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 12:44 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:41 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 12:40 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:38 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 12:38 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T364069)', diff saved to https://phabricator.wikimedia.org/P64256 and previous config saved to /var/cache/conftool/dbconfig/20240607-123754-marostegui.json
  • 12:33 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 12:31 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 12:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64255 and previous config saved to /var/cache/conftool/dbconfig/20240607-123108-ladsgroup.json
  • 12:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T364299)', diff saved to https://phabricator.wikimedia.org/P64254 and previous config saved to /var/cache/conftool/dbconfig/20240607-122413-marostegui.json
  • 12:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 12:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 12:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T364299)', diff saved to https://phabricator.wikimedia.org/P64253 and previous config saved to /var/cache/conftool/dbconfig/20240607-122349-marostegui.json
  • 12:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64252 and previous config saved to /var/cache/conftool/dbconfig/20240607-121559-ladsgroup.json
  • 12:08 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply
  • 12:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P64251 and previous config saved to /var/cache/conftool/dbconfig/20240607-120841-marostegui.json
  • 12:08 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/ratelimit: apply
  • 12:07 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply
  • 12:07 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host phab1004.eqiad.wmnet
  • 12:07 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/ratelimit: apply
  • 12:07 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 12:07 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 12:01 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host phab1004.eqiad.wmnet
  • 12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64250 and previous config saved to /var/cache/conftool/dbconfig/20240607-120051-ladsgroup.json
  • 11:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P64249 and previous config saved to /var/cache/conftool/dbconfig/20240607-115333-marostegui.json
  • 11:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
  • 11:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
  • 11:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1013.eqiad.wmnet
  • 11:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T364299)', diff saved to https://phabricator.wikimedia.org/P64248 and previous config saved to /var/cache/conftool/dbconfig/20240607-113824-marostegui.json
  • 11:36 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb1013.eqiad.wmnet
  • 11:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet
  • 11:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet
  • 11:28 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 11:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1014.eqiad.wmnet
  • 11:28 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 11:22 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb1014.eqiad.wmnet
  • 11:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
  • 11:12 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
  • 11:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet
  • 11:05 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet
  • 11:05 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2002.wikimedia.org
  • 11:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
  • 11:01 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 11:01 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 11:00 jelto@cumin2002: START - Cookbook sre.hosts.reboot-single for host gitlab2002.wikimedia.org
  • 11:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64246 and previous config saved to /var/cache/conftool/dbconfig/20240607-110025-ladsgroup.json
  • 11:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 11:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 11:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T352010)', diff saved to https://phabricator.wikimedia.org/P64245 and previous config saved to /var/cache/conftool/dbconfig/20240607-110000-ladsgroup.json
  • 10:57 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
  • 10:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet
  • 10:50 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
  • 10:50 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host rdb2010.codfw.wmnet
  • 10:50 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
  • 10:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P64244 and previous config saved to /var/cache/conftool/dbconfig/20240607-104452-ladsgroup.json
  • 10:33 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 10:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P64243 and previous config saved to /var/cache/conftool/dbconfig/20240607-102944-ladsgroup.json
  • 10:23 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 10:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T352010)', diff saved to https://phabricator.wikimedia.org/P64242 and previous config saved to /var/cache/conftool/dbconfig/20240607-101436-ladsgroup.json
  • 10:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-eqiad
  • 09:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki1002.eqiad.wmnet
  • 09:56 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 09:56 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:56 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 09:54 cgoubert@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
  • 09:54 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-codfw
  • 09:54 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:53 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:53 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:52 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:52 moritzm: powercycle pki1002
  • 09:43 jynus: upgrading and restarting db1239 T360751
  • 09:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki1002.eqiad.wmnet
  • 09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2002.wikimedia.org
  • 09:38 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2002.wikimedia.org
  • 09:36 cgoubert@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
  • 09:35 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:35 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
  • 09:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
  • 09:30 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:26 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:25 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:24 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:22 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2028.codfw.wmnet
  • 09:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2028.codfw.wmnet
  • 09:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T352010)', diff saved to https://phabricator.wikimedia.org/P64241 and previous config saved to /var/cache/conftool/dbconfig/20240607-091849-ladsgroup.json
  • 09:18 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:18 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
  • 09:11 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4049.ulsfo.wmnet
  • 09:06 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2028.codfw.wmnet
  • 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2027.codfw.wmnet
  • 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2027.codfw.wmnet
  • 09:03 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:03 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 08:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2027.codfw.wmnet
  • 08:51 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2099.codfw.wmnet
  • 08:51 jynus@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:51 jynus@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2099.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
  • 08:50 jynus@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2099.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
  • 08:49 taavi: import opentofu 1.7.2 to apt.wikimedia.org T365696
  • 08:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2027.codfw.wmnet
  • 08:48 jynus: reboot dbprov1001,1002,2001,2002
  • 08:46 jynus@cumin1002: START - Cookbook sre.dns.netbox
  • 08:41 jynus@cumin1002: START - Cookbook sre.hosts.decommission for hosts db2099.codfw.wmnet
  • 08:40 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2098.codfw.wmnet
  • 08:40 jynus@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:39 jynus@cumin1002: START - Cookbook sre.dns.netbox
  • 08:39 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2097.codfw.wmnet
  • 08:39 jynus@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:39 jynus@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2097.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
  • 08:37 jynus@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2097.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
  • 08:35 jynus@cumin1002: START - Cookbook sre.dns.netbox
  • 08:19 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4049.ulsfo.wmnet
  • 08:19 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4049.ulsfo.wmnet
  • 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2025.codfw.wmnet
  • 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
  • 08:15 jynus: deleted from zarcillo db2097, db2098, db2099 T362802 T366877 T362883
  • 08:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
  • 08:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2025.codfw.wmnet
  • 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki-root1002.eqiad.wmnet
  • 07:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T364299)', diff saved to https://phabricator.wikimedia.org/P64239 and previous config saved to /var/cache/conftool/dbconfig/20240607-075742-marostegui.json
  • 07:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 07:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 07:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki-root1002.eqiad.wmnet
  • 07:56 ryankemper@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 07:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host seaborgium.wikimedia.org
  • 07:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host seaborgium.wikimedia.org
  • 07:45 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2097.codfw.wmnet with reason: about to decommission
  • 07:45 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2097.codfw.wmnet with reason: about to decommission
  • 07:45 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2099.codfw.wmnet with reason: about to decommission
  • 07:44 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2099.codfw.wmnet with reason: about to decommission
  • 07:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast1003.wikimedia.org with OS bookworm
  • 07:19 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2098.codfw.wmnet with reason: about to decommission
  • 07:19 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2098.codfw.wmnet with reason: about to decommission
  • 07:12 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast1003.wikimedia.org with reason: host reimage
  • 07:07 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast1003.wikimedia.org with reason: host reimage
  • 06:52 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast1003.wikimedia.org with OS bookworm
  • 06:51 moritzm: reimaging bast1003 to bookworm
  • 06:36 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 06:34 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 06:31 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 05:15 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 04:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 04:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 04:35 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 04:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T352010)', diff saved to https://phabricator.wikimedia.org/P64238 and previous config saved to /var/cache/conftool/dbconfig/20240607-043343-ladsgroup.json
  • 04:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 04:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 04:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T352010)', diff saved to https://phabricator.wikimedia.org/P64237 and previous config saved to /var/cache/conftool/dbconfig/20240607-043320-ladsgroup.json
  • 04:23 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 04:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P64236 and previous config saved to /var/cache/conftool/dbconfig/20240607-041812-ladsgroup.json
  • 04:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P64235 and previous config saved to /var/cache/conftool/dbconfig/20240607-040302-ladsgroup.json
  • 04:02 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 04:01 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 03:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T352010)', diff saved to https://phabricator.wikimedia.org/P64234 and previous config saved to /var/cache/conftool/dbconfig/20240607-034755-ladsgroup.json
  • 03:35 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 03:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T352010)', diff saved to https://phabricator.wikimedia.org/P64233 and previous config saved to /var/cache/conftool/dbconfig/20240607-033141-ladsgroup.json
  • 03:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 03:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 03:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64232 and previous config saved to /var/cache/conftool/dbconfig/20240607-033118-ladsgroup.json
  • 03:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T364069)', diff saved to https://phabricator.wikimedia.org/P64231 and previous config saved to /var/cache/conftool/dbconfig/20240607-032809-marostegui.json
  • 03:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 03:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 03:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T364069)', diff saved to https://phabricator.wikimedia.org/P64230 and previous config saved to /var/cache/conftool/dbconfig/20240607-032746-marostegui.json
  • 03:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P64229 and previous config saved to /var/cache/conftool/dbconfig/20240607-031610-ladsgroup.json
  • 03:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P64228 and previous config saved to /var/cache/conftool/dbconfig/20240607-031238-marostegui.json
  • 03:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P64227 and previous config saved to /var/cache/conftool/dbconfig/20240607-030102-ladsgroup.json
  • 02:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P64226 and previous config saved to /var/cache/conftool/dbconfig/20240607-025729-marostegui.json
  • 02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64225 and previous config saved to /var/cache/conftool/dbconfig/20240607-024554-ladsgroup.json
  • 02:44 ejegg: fundraising civicrm upgraded from 757f8528 to ebfbad86
  • 02:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T364069)', diff saved to https://phabricator.wikimedia.org/P64224 and previous config saved to /var/cache/conftool/dbconfig/20240607-024221-marostegui.json
  • 02:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T352010)', diff saved to https://phabricator.wikimedia.org/P64223 and previous config saved to /var/cache/conftool/dbconfig/20240607-021501-ladsgroup.json
  • 02:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T352010)', diff saved to https://phabricator.wikimedia.org/P64222 and previous config saved to /var/cache/conftool/dbconfig/20240607-021418-ladsgroup.json
  • 01:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P64221 and previous config saved to /var/cache/conftool/dbconfig/20240607-015910-ladsgroup.json
  • 01:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P64220 and previous config saved to /var/cache/conftool/dbconfig/20240607-014403-ladsgroup.json
  • afk: fundraising civicrm upgraded from 286bd2b8 to 757f8528
  • 01:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T352010)', diff saved to https://phabricator.wikimedia.org/P64219 and previous config saved to /var/cache/conftool/dbconfig/20240607-012855-ladsgroup.json
  • 01:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 01:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 01:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T352010)', diff saved to https://phabricator.wikimedia.org/P64218 and previous config saved to /var/cache/conftool/dbconfig/20240607-011438-ladsgroup.json
  • 00:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P64217 and previous config saved to /var/cache/conftool/dbconfig/20240607-005930-ladsgroup.json
  • 00:55 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 00:55 ryankemper@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
  • 00:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P64216 and previous config saved to /var/cache/conftool/dbconfig/20240607-004423-ladsgroup.json
  • 00:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T352010)', diff saved to https://phabricator.wikimedia.org/P64215 and previous config saved to /var/cache/conftool/dbconfig/20240607-002915-ladsgroup.json
  • 00:23 bd808@deploy1002: Finished scap: Backport for Revert "wikitech: Replace OSM class in Gerrit blocking hook" (duration: 11m 24s)
  • 00:15 bd808@deploy1002: bd808 and trainbranchbot: Continuing with sync
  • 00:14 bd808@deploy1002: bd808 and trainbranchbot: Backport for Revert "wikitech: Replace OSM class in Gerrit blocking hook" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 00:12 bd808@deploy1002: Started scap: Backport for Revert "wikitech: Replace OSM class in Gerrit blocking hook"

2024-06-06

  • 23:32 bd808@deploy1002: Finished scap: Backport for wikitech: Replace OSM class in Gerrit blocking hook (T161553) (duration: 11m 24s)
  • 23:23 bd808@deploy1002: taavi and bd808: Continuing with sync
  • 23:23 bd808@deploy1002: taavi and bd808: Backport for wikitech: Replace OSM class in Gerrit blocking hook (T161553) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:20 bd808@deploy1002: Started scap: Backport for wikitech: Replace OSM class in Gerrit blocking hook (T161553)
  • 23:16 bd808@deploy1002: Finished scap: Backport for wikitech: Update Phabricator Conduit calls to disable/enable users (T366587) (duration: 12m 01s)
  • 23:07 bd808@deploy1002: bd808: Continuing with sync
  • 23:06 bd808@deploy1002: bd808: Backport for wikitech: Update Phabricator Conduit calls to disable/enable users (T366587) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:04 bd808@deploy1002: Started scap: Backport for wikitech: Update Phabricator Conduit calls to disable/enable users (T366587)
  • 21:46 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
  • 21:27 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
  • 21:10 jdrewniak@deploy1002: Finished scap: Backport for Disable font size options on specified pages for all wikis (T366625) (duration: 12m 50s)
  • 21:01 jdrewniak@deploy1002: jdrewniak and toyofuku: Continuing with sync
  • 21:00 jdrewniak@deploy1002: jdrewniak and toyofuku: Backport for Disable font size options on specified pages for all wikis (T366625) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:57 jdrewniak@deploy1002: Started scap: Backport for Disable font size options on specified pages for all wikis (T366625)
  • 20:54 urbanecm@deploy1002: Finished scap: Backport for testwiki: Enable CommunityConfiguration (T360954) (duration: 12m 09s)
  • 20:50 urbanecm: mwscript extensions/GrowthExperiments/maintenance/migrateCommunityConfig.php --wiki=testwiki # T360954
  • 20:46 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 20:44 urbanecm@deploy1002: urbanecm: Backport for testwiki: Enable CommunityConfiguration (T360954) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:42 urbanecm@deploy1002: Started scap: Backport for testwiki: Enable CommunityConfiguration (T360954)
  • 20:41 urbanecm@deploy1002: Finished scap: Backport for [mswiktionary] Rename namespace "Wiktionary" to "Wikikamus" (T366549), Improve navigation link handling in CommunityConfiguration (T364938 T365504 T360954), Drop logging level for unsupported providers to DEBUG (T366519 T360954) (duration: 19m 42s)
  • 20:33 urbanecm@deploy1002: urbanecm and sgimeno and gergesshamon: Continuing with sync
  • 20:32 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 20:31 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 20:30 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 20:29 ejegg: fundraising civicrm upgraded from 71ed6bed to 286bd2b8
  • 20:28 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 20:26 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 20:26 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 20:24 urbanecm@deploy1002: urbanecm and sgimeno and gergesshamon: Backport for [mswiktionary] Rename namespace "Wiktionary" to "Wikikamus" (T366549), Improve navigation link handling in CommunityConfiguration (T364938 T365504 T360954), Drop logging level for unsupported providers to DEBUG (T366519 T360954) synced to the testservers (https://wikitech.wikimedia.org/wiki
  • 20:22 urbanecm@deploy1002: Started scap: Backport for [mswiktionary] Rename namespace "Wiktionary" to "Wikikamus" (T366549), Improve navigation link handling in CommunityConfiguration (T364938 T365504 T360954), Drop logging level for unsupported providers to DEBUG (T366519 T360954)
  • 20:21 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 20:20 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 20:20 urbanecm@deploy1002: Finished scap: Backport for Assign applychangetags right to group "all" on plwiktionary (T363638), InitialiseSettings: Enable AutoModerator on trwiki (T362622), InitaliseSettings-labs: Deploy Automoderator patroller workstream survey to cawiki (T362969) (duration: 14m 10s)
  • 20:19 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 20:18 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 20:13 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 20:13 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 20:11 urbanecm@deploy1002: wargo and urbanecm and jsn and kgraessle: Continuing with sync
  • 20:08 urbanecm@deploy1002: wargo and urbanecm and jsn and kgraessle: Backport for Assign applychangetags right to group "all" on plwiktionary (T363638), InitialiseSettings: Enable AutoModerator on trwiki (T362622), InitaliseSettings-labs: Deploy Automoderator patroller workstream survey to cawiki (T362969) synced to the testservers (https://wikitech.wikimedia.org/wiki
  • 20:06 urbanecm@deploy1002: Started scap: Backport for Assign applychangetags right to group "all" on plwiktionary (T363638), InitialiseSettings: Enable AutoModerator on trwiki (T362622), InitaliseSettings-labs: Deploy Automoderator patroller workstream survey to cawiki (T362969)
  • 20:02 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
  • 19:31 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@a8843e6]: Deploying latest DAGs to the analytics Airflow instance. T358707. (duration: 00m 26s)
  • 19:30 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@a8843e6]: Deploying latest DAGs to the analytics Airflow instance. T358707.
  • 18:29 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.8 refs T361402
  • 18:17 thcipriani@deploy1002: Finished deploy [releng/jenkins-deploy@3be9893] (releasing): (no justification provided) (duration: 00m 43s)
  • 18:17 thcipriani@deploy1002: Started deploy [releng/jenkins-deploy@3be9893] (releasing): (no justification provided)
  • 17:57 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 17:57 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - kamila@cumin1002"
  • 17:56 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - kamila@cumin1002"
  • 17:48 topranks: re-enabling pybal on lvs1017 after cable move T366361
  • 17:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T364069)', diff saved to https://phabricator.wikimedia.org/P64211 and previous config saved to /var/cache/conftool/dbconfig/20240606-173121-marostegui.json
  • 17:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 17:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 17:26 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on lvs1017.eqiad.wmnet with reason: moving lvs1017 link back to ssw1-e1-codfw
  • 17:26 topranks: disabling pybal on lvs1017 to move traffic to lvs1020 in advance of cable move T366361
  • 17:26 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on lvs1017.eqiad.wmnet with reason: moving lvs1017 link back to ssw1-e1-codfw
  • 17:23 topranks: re-enabling pybal on lvs1018 after cable move T366361
  • 17:15 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on lvs1018.eqiad.wmnet with reason: moving lvs1018 link back to ssw1-e1-codfw
  • 17:15 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on lvs1018.eqiad.wmnet with reason: moving lvs1018 link back to ssw1-e1-codfw
  • 17:15 cmooney@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 0:20:00 on lvs1019.eqiad.wmnet with reason: moving lvs1018 link back to ssw1-e1-codfw
  • 17:14 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on lvs1019.eqiad.wmnet with reason: moving lvs1018 link back to ssw1-e1-codfw
  • 17:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T352010)', diff saved to https://phabricator.wikimedia.org/P64210 and previous config saved to /var/cache/conftool/dbconfig/20240606-171359-ladsgroup.json
  • 17:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 17:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T352010)', diff saved to https://phabricator.wikimedia.org/P64209 and previous config saved to /var/cache/conftool/dbconfig/20240606-171336-ladsgroup.json
  • 17:11 topranks: disabling pybal on lvs1018 to move traffic to lvs1020 in advance of cable move T366361
  • 17:11 topranks: re-enabling pybal on lvs1019 after cable move T366361
  • 16:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P64208 and previous config saved to /var/cache/conftool/dbconfig/20240606-165828-ladsgroup.json
  • 16:52 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on lvs1019.eqiad.wmnet with reason: moving lvs1019 link back to ssw1-f1-codfw
  • 16:51 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on lvs1019.eqiad.wmnet with reason: moving lvs1019 link back to ssw1-f1-codfw
  • 16:50 topranks: disabling pybal on lvs1019 to move traffic to lvs1020 in advance of cable move T366361
  • 16:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P64207 and previous config saved to /var/cache/conftool/dbconfig/20240606-164320-ladsgroup.json
  • 16:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T352010)', diff saved to https://phabricator.wikimedia.org/P64206 and previous config saved to /var/cache/conftool/dbconfig/20240606-162812-ladsgroup.json
  • 16:28 hashar@deploy1002: Finished deploy [integration/docroot@eee90e6]: (no justification provided) (duration: 00m 05s)
  • 16:28 hashar@deploy1002: Started deploy [integration/docroot@eee90e6]: (no justification provided)
  • 16:25 dancy@deploy1002: Installation of scap version "4.86.1" completed for 285 hosts
  • 16:25 dancy@deploy1002: Installing scap version "4.86.1" for 285 hosts
  • 16:24 dancy@deploy1002: Installing scap version "4.86.1" for 286 hosts
  • 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T352010)', diff saved to https://phabricator.wikimedia.org/P64205 and previous config saved to /var/cache/conftool/dbconfig/20240606-161338-ladsgroup.json
  • 16:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 16:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T352010)', diff saved to https://phabricator.wikimedia.org/P64204 and previous config saved to /var/cache/conftool/dbconfig/20240606-161312-ladsgroup.json
  • 16:10 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: reimage still running
  • 16:10 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: reimage still running
  • 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T352010)', diff saved to https://phabricator.wikimedia.org/P64203 and previous config saved to /var/cache/conftool/dbconfig/20240606-160028-ladsgroup.json
  • 16:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 16:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T352010)', diff saved to https://phabricator.wikimedia.org/P64202 and previous config saved to /var/cache/conftool/dbconfig/20240606-160004-ladsgroup.json
  • 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P64201 and previous config saved to /var/cache/conftool/dbconfig/20240606-155804-ladsgroup.json
  • 15:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P64199 and previous config saved to /var/cache/conftool/dbconfig/20240606-154457-ladsgroup.json
  • 15:44 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 15:42 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 15:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P64198 and previous config saved to /var/cache/conftool/dbconfig/20240606-154255-ladsgroup.json
  • 15:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64197 and previous config saved to /var/cache/conftool/dbconfig/20240606-154028-ladsgroup.json
  • 15:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 15:40 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 15:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T352010)', diff saved to https://phabricator.wikimedia.org/P64196 and previous config saved to /var/cache/conftool/dbconfig/20240606-154004-ladsgroup.json
  • 15:38 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 15:38 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 15:37 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 15:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T360332)', diff saved to https://phabricator.wikimedia.org/P64195 and previous config saved to /var/cache/conftool/dbconfig/20240606-153730-arnaudb.json
  • 15:29 topranks: rebooting ssw1-f1-eqiad to install new JunOS release T366361
  • 15:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P64194 and previous config saved to /var/cache/conftool/dbconfig/20240606-152949-ladsgroup.json
  • 15:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T352010)', diff saved to https://phabricator.wikimedia.org/P64193 and previous config saved to /var/cache/conftool/dbconfig/20240606-152747-ladsgroup.json
  • 15:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P64192 and previous config saved to /var/cache/conftool/dbconfig/20240606-152456-ladsgroup.json
  • 15:23 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "moved wikikube-ctrl1001 to a new rack - kamila@cumin1002 - T366204"
  • 15:23 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:23 jforrester@deploy1002: Finished scap: Backport for Revert "commonswiki: Enable numeric wgCategoryCollation" (T366809) (duration: 13m 58s)
  • 15:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P64191 and previous config saved to /var/cache/conftool/dbconfig/20240606-152222-arnaudb.json
  • 15:19 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:18 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "moved wikikube-ctrl1001 to a new rack - kamila@cumin1002 - T366204"
  • 15:16 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:16 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T352010)', diff saved to https://phabricator.wikimedia.org/P64190 and previous config saved to /var/cache/conftool/dbconfig/20240606-151440-ladsgroup.json
  • 15:14 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 15:12 jforrester@deploy1002: jforrester: Continuing with sync
  • 15:11 jforrester@deploy1002: jforrester: Backport for Revert "commonswiki: Enable numeric wgCategoryCollation" (T366809) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 15:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P64189 and previous config saved to /var/cache/conftool/dbconfig/20240606-150948-ladsgroup.json
  • 15:09 jforrester@deploy1002: Started scap: Backport for Revert "commonswiki: Enable numeric wgCategoryCollation" (T366809)
  • 15:08 jforrester@deploy1002: Finished scap: Backport for Add wikilambda-edit-monolingual-text-placeholder message to extension.json (T359782) (duration: 12m 05s)
  • 15:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P64188 and previous config saved to /var/cache/conftool/dbconfig/20240606-150714-arnaudb.json
  • 15:04 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on ssw1-e1-eqiad.mgmt with reason: upgrading spine switches eqiad rows e and f
  • 15:04 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on ssw1-e1-eqiad.mgmt with reason: upgrading spine switches eqiad rows e and f
  • 14:59 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on 15 hosts with reason: upgrading spine switches eqiad rows e and f
  • 14:59 jforrester@deploy1002: jforrester: Continuing with sync
  • 14:59 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on 15 hosts with reason: upgrading spine switches eqiad rows e and f
  • 14:58 jforrester@deploy1002: jforrester: Backport for Add wikilambda-edit-monolingual-text-placeholder message to extension.json (T359782) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:58 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 14:58 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 14:56 topranks: disable ssw1-f1-eqiad leaf-facing ports in advance of upgrade T366361
  • 14:56 jforrester@deploy1002: Started scap: Backport for Add wikilambda-edit-monolingual-text-placeholder message to extension.json (T359782)
  • 14:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T352010)', diff saved to https://phabricator.wikimedia.org/P64187 and previous config saved to /var/cache/conftool/dbconfig/20240606-145440-ladsgroup.json
  • 14:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T360332)', diff saved to https://phabricator.wikimedia.org/P64186 and previous config saved to /var/cache/conftool/dbconfig/20240606-145205-arnaudb.json
  • 14:51 elukey: kill sessionstore pod running on mw1390.eqiad.wmnet (no dedicated='kask' taint)
  • 14:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T360332)', diff saved to https://phabricator.wikimedia.org/P64185 and previous config saved to /var/cache/conftool/dbconfig/20240606-144943-arnaudb.json
  • 14:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 14:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 14:43 sukhe: sudo cumin -b1 -s60 'A:cp and A:eqsin' 'run-puppet-agent --enable "merging CR 1038881"'
  • 14:25 TheresNoTime: close UTC afternoon backport window
  • 14:18 hashar@deploy1002: Finished deploy [integration/docroot@eee90e6]: Build dependencies updates (duration: 00m 10s)
  • 14:18 hashar@deploy1002: Started deploy [integration/docroot@eee90e6]: Build dependencies updates
  • 14:17 hashar@deploy1002: Finished deploy [integration/docroot@eee90e6]: Build dependencies updates (duration: 00m 09s)
  • 14:17 hashar@deploy1002: Started deploy [integration/docroot@eee90e6]: Build dependencies updates
  • 14:17 samtar@deploy1002: Finished scap: Backport for commonswiki: Enable numeric wgCategoryCollation (T362494), Add project namespace alias for Azerbaijani Wikisource (T365966) (duration: 12m 58s)
  • 14:15 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ssw1-f1-eqiad,ssw1-f1-eqiad IPv6,ssw1-f1-eqiad.mgmt with reason: upgrading spine switches eqiad rows e and f
  • 14:15 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ssw1-f1-eqiad,ssw1-f1-eqiad IPv6,ssw1-f1-eqiad.mgmt with reason: upgrading spine switches eqiad rows e and f
  • 14:14 topranks: disabling BGP on cr2-eqiad towards ssw1-f1-eqiad prior to upgrade of ssw later T366361
  • 14:14 ChrisDobbins901_: sudo cumin 'A:cp and A:eqsin' 'disable-puppet "merging CR 1038881"'
  • 14:08 samtar@deploy1002: samtar and anzx and nmw03: Continuing with sync
  • 14:07 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet
  • 14:06 samtar@deploy1002: samtar and anzx and nmw03: Backport for commonswiki: Enable numeric wgCategoryCollation (T362494), Add project namespace alias for Azerbaijani Wikisource (T365966) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:06 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4050.ulsfo.wmnet
  • 14:05 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: host reimage
  • 14:04 samtar@deploy1002: Started scap: Backport for commonswiki: Enable numeric wgCategoryCollation (T362494), Add project namespace alias for Azerbaijani Wikisource (T365966)
  • 14:02 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: host reimage
  • 14:00 kartik@deploy1002: Finished scap: Backport for CX: Fix translation container max width for large screens (T366374) (duration: 13m 11s)
  • 13:57 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4050.ulsfo.wmnet
  • 13:56 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4050.ulsfo.wmnet
  • 13:52 kartik@deploy1002: kartik: Continuing with sync
  • 13:50 kartik@deploy1002: kartik: Backport for CX: Fix translation container max width for large screens (T366374) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:47 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 13:47 kartik@deploy1002: Started scap: Backport for CX: Fix translation container max width for large screens (T366374)
  • 13:46 samtar@deploy1002: Finished scap: Backport for [mswiktionary] Change the default Sitename value to Wikikamus (T366549) (duration: 16m 05s)
  • 13:45 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host wikikube-ctrl1001.eqiad.wmnet
  • 13:44 kamila@cumin1002: START - Cookbook sre.hosts.dhcp for host wikikube-ctrl1001.eqiad.wmnet
  • 13:44 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host wikikube-ctrl1001.eqiad.wmnet
  • 13:37 samtar@deploy1002: samtar and gergesshamon: Continuing with sync
  • 13:32 samtar@deploy1002: samtar and gergesshamon: Backport for [mswiktionary] Change the default Sitename value to Wikikamus (T366549) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:30 samtar@deploy1002: Started scap: Backport for [mswiktionary] Change the default Sitename value to Wikikamus (T366549)
  • 13:28 samtar@deploy1002: Finished scap: Backport for Activate campaignEvents extension on Igbo wiki. (T363199) (duration: 14m 07s)
  • 13:19 samtar@deploy1002: mhorsey and samtar: Continuing with sync
  • 13:16 samtar@deploy1002: mhorsey and samtar: Backport for Activate campaignEvents extension on Igbo wiki. (T363199) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:15 samtar@deploy1002: Started scap: Backport for Activate campaignEvents extension on Igbo wiki. (T363199)
  • 13:11 taavi: taavi@deploy1002 ~ $ sudo kill 32174 # kill forgotten scap sync-world process
  • 13:08 klausman@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad
  • 12:57 vgutierrez: repool text@cofw with IPIP encapsulation enabled - T366466
  • 12:56 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
  • 12:56 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 12:50 vgutierrez: rolling restart of pybal on lvs2014 and lvs2011 - T366466
  • 12:44 topranks: disabling PyBal on lvs1019 to allow for cable move T366361
  • 12:40 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4051.ulsfo.wmnet
  • 12:39 topranks: rebooting ssw1-e1-eqiad to upgrade JunOS
  • 12:39 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4051.ulsfo.wmnet
  • 12:33 topranks: disabling BGP to ssw1-e1-eqiad from cr1-eqiad in advance of upgrade T366361
  • 12:33 vgutierrez: depool text@codfw before enabling IPIP encapsulation - T366466
  • 12:29 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4051.ulsfo.wmnet
  • 12:28 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4051.ulsfo.wmnet
  • 12:25 topranks: disabling PyBal on lvs1018 to allow for cable move T366361
  • 12:25 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1018.eqiad.wmnet with reason: moving lvs1018 link to row E from spine to leaf
  • 12:25 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1018.eqiad.wmnet with reason: moving lvs1018 link to row E from spine to leaf
  • 12:24 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1017.eqiad.wmnet
  • 12:24 cmooney@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs1017.eqiad.wmnet
  • 12:21 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 12:21 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 12:14 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on 18 hosts with reason: upgrading spine switches eqiad rows e and f
  • 12:14 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on 18 hosts with reason: upgrading spine switches eqiad rows e and f
  • 11:56 topranks: disabling PyBal on lvs1017 to allow for cable move T366361
  • 11:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1017.eqiad.wmnet with reason: moving lvs1017 link to row E from spine to leaf
  • 11:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1017.eqiad.wmnet with reason: moving lvs1017 link to row E from spine to leaf
  • 11:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-codfw
  • 11:27 effie: kicking off k8s eqiad restarts - T366555
  • 11:25 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 11:15 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 11:09 klausman@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
  • 11:05 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 10:58 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 10:47 sfaci@deploy1002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 10:45 sfaci@deploy1002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 10:45 sfaci@deploy1002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
  • 10:43 sfaci@deploy1002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
  • 10:41 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:41 sfaci@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 10:40 sfaci@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 10:40 sfaci@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 10:38 sfaci@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 10:37 sfaci@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 10:35 sfaci@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 10:27 sfaci@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 10:26 sfaci@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 10:11 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64181 and previous config saved to /var/cache/conftool/dbconfig/20240606-100747-arnaudb.json
  • 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64180 and previous config saved to /var/cache/conftool/dbconfig/20240606-095240-arnaudb.json
  • 09:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 09:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 09:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T364069)', diff saved to https://phabricator.wikimedia.org/P64179 and previous config saved to /var/cache/conftool/dbconfig/20240606-095053-marostegui.json
  • 09:47 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2004.codfw.wmnet
  • 09:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64178 and previous config saved to /var/cache/conftool/dbconfig/20240606-093734-arnaudb.json
  • 09:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P64177 and previous config saved to /var/cache/conftool/dbconfig/20240606-093545-marostegui.json
  • 09:33 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2004.codfw.wmnet
  • 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2003.codfw.wmnet
  • 09:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64176 and previous config saved to /var/cache/conftool/dbconfig/20240606-092228-arnaudb.json
  • 09:22 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:20 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P64175 and previous config saved to /var/cache/conftool/dbconfig/20240606-092037-marostegui.json
  • 09:20 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet
  • 09:17 cgoubert@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-codfw
  • 09:17 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:15 stevemunene@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:13 stevemunene@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:12 stevemunene@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:11 stevemunene@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:08 mvernon@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be1004.eqiad.wmnet
  • 09:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64174 and previous config saved to /var/cache/conftool/dbconfig/20240606-090722-arnaudb.json
  • 09:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T364069)', diff saved to https://phabricator.wikimedia.org/P64173 and previous config saved to /var/cache/conftool/dbconfig/20240606-090529-marostegui.json
  • 09:01 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2002.codfw.wmnet
  • 09:01 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
  • 09:01 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
  • 08:57 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 08:56 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host thanos-be1004.eqiad.wmnet
  • 08:56 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 08:52 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
  • 08:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 5%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64172 and previous config saved to /var/cache/conftool/dbconfig/20240606-085216-arnaudb.json
  • 08:52 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
  • 08:50 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1031.eqiad.wmnet
  • 08:47 mvernon@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be1003.eqiad.wmnet
  • 08:44 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1031.eqiad.wmnet
  • 08:44 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:43 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2002.codfw.wmnet
  • 08:40 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2001.codfw.wmnet
  • 08:39 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 08:39 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 08:38 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
  • 08:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 2%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64171 and previous config saved to /var/cache/conftool/dbconfig/20240606-083710-arnaudb.json
  • 08:36 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
  • 08:35 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:35 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:19 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
  • 08:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T364299)', diff saved to https://phabricator.wikimedia.org/P64167 and previous config saved to /var/cache/conftool/dbconfig/20240606-081753-marostegui.json
  • 08:14 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:14 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P64166 and previous config saved to /var/cache/conftool/dbconfig/20240606-081412-ladsgroup.json
  • 08:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P64165 and previous config saved to /var/cache/conftool/dbconfig/20240606-080245-marostegui.json
  • 08:02 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host thanos-be1002.eqiad.wmnet
  • 08:01 mvernon@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be1001.eqiad.wmnet
  • 08:00 urbanecm@deploy1002: Started scap: Backport for Add throttle exception for an upcoming workshop (T366748)
  • 07:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P64164 and previous config saved to /var/cache/conftool/dbconfig/20240606-075904-ladsgroup.json
  • 07:50 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host thanos-be1001.eqiad.wmnet
  • 07:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P64163 and previous config saved to /var/cache/conftool/dbconfig/20240606-074737-marostegui.json
  • 07:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T352010)', diff saved to https://phabricator.wikimedia.org/P64162 and previous config saved to /var/cache/conftool/dbconfig/20240606-074356-ladsgroup.json
  • 07:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T364299)', diff saved to https://phabricator.wikimedia.org/P64161 and previous config saved to /var/cache/conftool/dbconfig/20240606-073229-marostegui.json
  • 07:30 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
  • 07:06 hashar: Restarting Gerrit
  • 07:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T352010)', diff saved to https://phabricator.wikimedia.org/P64160 and previous config saved to /var/cache/conftool/dbconfig/20240606-070558-ladsgroup.json
  • 07:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 07:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 06:56 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1034.eqiad.wmnet
  • 06:49 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1034.eqiad.wmnet
  • 05:40 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 05:21 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
  • 05:20 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
  • 05:04 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 05:02 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
  • 04:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T364299)', diff saved to https://phabricator.wikimedia.org/P64159 and previous config saved to /var/cache/conftool/dbconfig/20240606-041714-marostegui.json
  • 04:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 04:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 04:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T364299)', diff saved to https://phabricator.wikimedia.org/P64158 and previous config saved to /var/cache/conftool/dbconfig/20240606-041650-marostegui.json
  • 04:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P64157 and previous config saved to /var/cache/conftool/dbconfig/20240606-040142-marostegui.json
  • 03:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1193 (T352010)', diff saved to https://phabricator.wikimedia.org/P64156 and previous config saved to /var/cache/conftool/dbconfig/20240606-034732-ladsgroup.json
  • 03:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 03:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 03:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T352010)', diff saved to https://phabricator.wikimedia.org/P64155 and previous config saved to /var/cache/conftool/dbconfig/20240606-034709-ladsgroup.json
  • 03:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P64154 and previous config saved to /var/cache/conftool/dbconfig/20240606-034635-marostegui.json
  • 03:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P64153 and previous config saved to /var/cache/conftool/dbconfig/20240606-033201-ladsgroup.json
  • 03:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T364299)', diff saved to https://phabricator.wikimedia.org/P64152 and previous config saved to /var/cache/conftool/dbconfig/20240606-033125-marostegui.json
  • 03:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T352010)', diff saved to https://phabricator.wikimedia.org/P64151 and previous config saved to /var/cache/conftool/dbconfig/20240606-032907-ladsgroup.json
  • 03:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 03:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 03:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T352010)', diff saved to https://phabricator.wikimedia.org/P64150 and previous config saved to /var/cache/conftool/dbconfig/20240606-032844-ladsgroup.json
  • 03:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P64149 and previous config saved to /var/cache/conftool/dbconfig/20240606-031653-ladsgroup.json
  • 03:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P64148 and previous config saved to /var/cache/conftool/dbconfig/20240606-031336-ladsgroup.json
  • 03:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T352010)', diff saved to https://phabricator.wikimedia.org/P64147 and previous config saved to /var/cache/conftool/dbconfig/20240606-030145-ladsgroup.json
  • 02:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P64146 and previous config saved to /var/cache/conftool/dbconfig/20240606-025828-ladsgroup.json
  • 02:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T352010)', diff saved to https://phabricator.wikimedia.org/P64145 and previous config saved to /var/cache/conftool/dbconfig/20240606-024321-ladsgroup.json
  • 01:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T364069)', diff saved to https://phabricator.wikimedia.org/P64144 and previous config saved to /var/cache/conftool/dbconfig/20240606-012208-marostegui.json
  • 01:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 01:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 01:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T364069)', diff saved to https://phabricator.wikimedia.org/P64143 and previous config saved to /var/cache/conftool/dbconfig/20240606-012144-marostegui.json
  • 01:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P64142 and previous config saved to /var/cache/conftool/dbconfig/20240606-010636-marostegui.json
  • 00:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P64141 and previous config saved to /var/cache/conftool/dbconfig/20240606-005128-marostegui.json
  • 00:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T364069)', diff saved to https://phabricator.wikimedia.org/P64140 and previous config saved to /var/cache/conftool/dbconfig/20240606-003620-marostegui.json
  • 00:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T364299)', diff saved to https://phabricator.wikimedia.org/P64139 and previous config saved to /var/cache/conftool/dbconfig/20240606-003232-marostegui.json
  • 00:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 00:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 00:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T364299)', diff saved to https://phabricator.wikimedia.org/P64138 and previous config saved to /var/cache/conftool/dbconfig/20240606-003208-marostegui.json
  • 00:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P64137 and previous config saved to /var/cache/conftool/dbconfig/20240606-001700-marostegui.json
  • 00:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P64136 and previous config saved to /var/cache/conftool/dbconfig/20240606-000151-marostegui.json

2024-06-05

  • 23:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T364299)', diff saved to https://phabricator.wikimedia.org/P64135 and previous config saved to /var/cache/conftool/dbconfig/20240605-234643-marostegui.json
  • 23:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 23:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 23:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T352010)', diff saved to https://phabricator.wikimedia.org/P64134 and previous config saved to /var/cache/conftool/dbconfig/20240605-232926-ladsgroup.json
  • 23:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 23:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 22:54 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 22:50 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin2002 - T366555
  • 22:44 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 22:03 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Hail mary - eevans@cumin1002
  • 21:43 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Hail mary - eevans@cumin1002
  • 21:42 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin2002 - T366555
  • 21:42 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin2002 - T366555
  • 21:36 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin2002 - T366555
  • 21:18 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 21:08 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 21:02 jhathaway@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host mx-in2001.wikimedia.org
  • 21:02 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mx-in2001.wikimedia.org with OS bookworm
  • 20:45 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mx-in2001.wikimedia.org with reason: host reimage
  • 20:42 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mx-in2001.wikimedia.org with reason: host reimage
  • 20:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T364299)', diff saved to https://phabricator.wikimedia.org/P64133 and previous config saved to /var/cache/conftool/dbconfig/20240605-202949-marostegui.json
  • 20:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 20:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 20:26 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host mx-in2001.wikimedia.org with OS bookworm
  • 20:26 jhathaway@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM mx-in2001.wikimedia.org - jhathaway@cumin1002"
  • 20:25 urbanecm@deploy1002: Finished scap: Backport for [CheckUser] Stop writing old for event tables migration on group0 (T360685), Growth: Use `growthexperiments` DB list for enabling GrowthExperiments (T364892), [Beta] Enable CommunityConfiguration extension in all wikis (T364892) (duration: 22m 04s)
  • 20:25 jhathaway@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM mx-in2001.wikimedia.org - jhathaway@cumin1002"
  • 20:25 jhathaway@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mx-in2001.wikimedia.org on all recursors
  • 20:25 jhathaway@cumin1002: START - Cookbook sre.dns.wipe-cache mx-in2001.wikimedia.org on all recursors
  • 20:25 jhathaway@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:25 jhathaway@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM mx-in2001.wikimedia.org - jhathaway@cumin1002"
  • 20:24 jhathaway@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM mx-in2001.wikimedia.org - jhathaway@cumin1002"
  • 20:22 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 20:21 jhathaway@cumin1002: START - Cookbook sre.dns.netbox
  • 20:21 jhathaway@cumin1002: START - Cookbook sre.ganeti.makevm for new host mx-in2001.wikimedia.org
  • 20:18 jhathaway@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host mx-in1001.wikimedia.org
  • 20:18 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mx-in1001.wikimedia.org with OS bookworm
  • 20:16 urbanecm@deploy1002: urbanecm and sgimeno and dreamyjazz: Continuing with sync
  • 20:12 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 20:06 ejegg: payments-wiki upgraded from c255fda8 to 82a5e588
  • 20:06 urbanecm@deploy1002: urbanecm and sgimeno and dreamyjazz: Backport for [CheckUser] Stop writing old for event tables migration on group0 (T360685), Growth: Use `growthexperiments` DB list for enabling GrowthExperiments (T364892), [Beta] Enable CommunityConfiguration extension in all wikis (T364892) synced to the testservers (https://wikitech.wikimedia.org/wiki/M
  • 20:03 urbanecm@deploy1002: Started scap: Backport for [CheckUser] Stop writing old for event tables migration on group0 (T360685), Growth: Use `growthexperiments` DB list for enabling GrowthExperiments (T364892), [Beta] Enable CommunityConfiguration extension in all wikis (T364892)
  • 20:02 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mx-in1001.wikimedia.org with reason: host reimage
  • 19:57 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mx-in1001.wikimedia.org with reason: host reimage
  • 19:47 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host mx-in1001.wikimedia.org with OS bookworm
  • 19:45 jhathaway@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM mx-in1001.wikimedia.org - jhathaway@cumin1002"
  • 19:44 jhathaway@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM mx-in1001.wikimedia.org - jhathaway@cumin1002"
  • 19:43 jhathaway@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mx-in1001.wikimedia.org on all recursors
  • 19:43 jhathaway@cumin1002: START - Cookbook sre.dns.wipe-cache mx-in1001.wikimedia.org on all recursors
  • 19:43 jhathaway@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:43 jhathaway@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM mx-in1001.wikimedia.org - jhathaway@cumin1002"
  • 19:38 jhathaway@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM mx-in1001.wikimedia.org - jhathaway@cumin1002"
  • 19:36 jhathaway@cumin1002: START - Cookbook sre.dns.netbox
  • 19:36 jhathaway@cumin1002: START - Cookbook sre.ganeti.makevm for new host mx-in1001.wikimedia.org
  • 19:27 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 19:09 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 18:58 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 18:53 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.8 refs T361402
  • 18:53 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 18:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64132 and previous config saved to /var/cache/conftool/dbconfig/20240605-184250-ladsgroup.json
  • 18:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P64131 and previous config saved to /var/cache/conftool/dbconfig/20240605-182742-ladsgroup.json
  • 18:13 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 18:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P64130 and previous config saved to /var/cache/conftool/dbconfig/20240605-181234-ladsgroup.json
  • 18:12 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 18:11 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1001.eqiad.wmnet
  • 18:07 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1001.eqiad.wmnet
  • 18:06 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 17:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64129 and previous config saved to /var/cache/conftool/dbconfig/20240605-175725-ladsgroup.json
  • 17:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64128 and previous config saved to /var/cache/conftool/dbconfig/20240605-175503-ladsgroup.json
  • 17:50 kamila@cumin1002: START - Cookbook sre.hosts.dhcp for host wikikube-ctrl1001.eqiad.wmnet
  • 17:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 17:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 17:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T364299)', diff saved to https://phabricator.wikimedia.org/P64127 and previous config saved to /var/cache/conftool/dbconfig/20240605-174724-marostegui.json
  • 17:42 ladsgroup@deploy1002: Finished scap: Backport for Stop writing to pagelinks old columns in enwiki (T352010) (duration: 12m 19s)
  • 17:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P64126 and previous config saved to /var/cache/conftool/dbconfig/20240605-173954-ladsgroup.json
  • 17:33 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 17:32 ladsgroup@deploy1002: ladsgroup: Backport for Stop writing to pagelinks old columns in enwiki (T352010) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P64125 and previous config saved to /var/cache/conftool/dbconfig/20240605-173216-marostegui.json
  • 17:31 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 17:29 ladsgroup@deploy1002: Started scap: Backport for Stop writing to pagelinks old columns in enwiki (T352010)
  • 17:27 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001']
  • 17:24 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 17:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P64124 and previous config saved to /var/cache/conftool/dbconfig/20240605-172446-ladsgroup.json
  • 17:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P64123 and previous config saved to /var/cache/conftool/dbconfig/20240605-171708-marostegui.json
  • 17:13 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
  • 17:12 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 17:10 jhathaway: phabricator email now egressing via mx-out{1001,2001}.wikimedia.org, which should solve the SPF warnings in your inbox
  • 17:10 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1033.eqiad.wmnet
  • 17:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64122 and previous config saved to /var/cache/conftool/dbconfig/20240605-170938-ladsgroup.json
  • 17:06 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1007.eqiad.wmnet with reason: decom T353785
  • 17:06 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1033.eqiad.wmnet
  • 17:06 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1007.eqiad.wmnet with reason: decom T353785
  • 17:05 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1006.eqiad.wmnet with reason: decom T353785
  • 17:05 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1006.eqiad.wmnet with reason: decom T353785
  • 17:04 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 17:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T364299)', diff saved to https://phabricator.wikimedia.org/P64121 and previous config saved to /var/cache/conftool/dbconfig/20240605-170200-marostegui.json
  • 16:56 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001']
  • 16:56 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1005.eqiad.wmnet with reason: decom T353785
  • 16:56 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1005.eqiad.wmnet with reason: decom T353785
  • 16:54 mutante: downtimed stat1004 for 10 days to avoid alerting spam during decom process - T353785
  • 16:53 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1004.eqiad.wmnet with reason: decom T353785
  • 16:53 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1004.eqiad.wmnet with reason: decom T353785
  • 16:52 ladsgroup@deploy1002: Finished scap: Backport for Bump XML dump schema to version 0.11 (T365155) (duration: 18m 23s)
  • 16:48 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 16:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64120 and previous config saved to /var/cache/conftool/dbconfig/20240605-164635-ladsgroup.json
  • 16:46 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
  • 16:45 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 16:43 ladsgroup@deploy1002: ladsgroup and dr0ptp4kt: Continuing with sync
  • 16:40 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage1003.eqiad.wmnet
  • 16:38 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 16:36 ladsgroup@deploy1002: ladsgroup and dr0ptp4kt: Backport for Bump XML dump schema to version 0.11 (T365155) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:34 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 16:34 ladsgroup@deploy1002: Started scap: Backport for Bump XML dump schema to version 0.11 (T365155)
  • 16:32 jayme@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubestage1003.eqiad.wmnet
  • 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64119 and previous config saved to /var/cache/conftool/dbconfig/20240605-163129-ladsgroup.json
  • 16:20 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:18 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:18 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1032.eqiad.wmnet
  • 16:18 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: Maint over', diff saved to https://phabricator.wikimedia.org/P64118 and previous config saved to /var/cache/conftool/dbconfig/20240605-161622-ladsgroup.json
  • 16:16 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:15 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:14 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:12 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1032.eqiad.wmnet
  • 16:11 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:10 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 16:10 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:10 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:08 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:05 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:05 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:01 aokoth@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 16:01 aokoth@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 16:01 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 16:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64117 and previous config saved to /var/cache/conftool/dbconfig/20240605-160116-ladsgroup.json
  • 15:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T352010)', diff saved to https://phabricator.wikimedia.org/P64116 and previous config saved to /var/cache/conftool/dbconfig/20240605-155955-ladsgroup.json
  • 15:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 15:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 15:59 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1082.eqiad.wmnet
  • 15:58 aokoth@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 15:58 aokoth@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 15:57 aokoth@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 15:56 aokoth@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 15:51 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1082.eqiad.wmnet
  • 15:51 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1081.eqiad.wmnet
  • 15:51 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 15:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T352010)', diff saved to https://phabricator.wikimedia.org/P64115 and previous config saved to /var/cache/conftool/dbconfig/20240605-155023-ladsgroup.json
  • 15:46 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 15:44 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1081.eqiad.wmnet
  • 15:43 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1080.eqiad.wmnet
  • 15:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb1002.eqiad.wmnet
  • 15:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2080.codfw.wmnet
  • 15:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb1002.eqiad.wmnet
  • 15:37 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1080.eqiad.wmnet
  • 15:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2002.codfw.wmnet
  • 15:37 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1079.eqiad.wmnet
  • 15:36 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2080.codfw.wmnet
  • 15:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2079.codfw.wmnet
  • 15:34 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 15:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb2002.codfw.wmnet
  • 15:32 moritzm: rebalancing drmrs Ganeti clusters
  • 15:30 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001']
  • 15:29 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1079.eqiad.wmnet
  • 15:28 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1078.eqiad.wmnet
  • 15:28 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2079.codfw.wmnet
  • 15:27 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2078.codfw.wmnet
  • 15:26 sukhe@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2003.codfw.wmnet
  • 15:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ping1004.eqiad.wmnet
  • 15:25 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ping1004.eqiad.wmnet with OS bookworm
  • 15:24 sukhe@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2003.codfw.wmnet
  • 15:21 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1078.eqiad.wmnet
  • 15:20 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1077.eqiad.wmnet
  • 15:19 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2078.codfw.wmnet
  • 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2077.codfw.wmnet
  • 15:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
  • 15:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
  • 15:13 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1077.eqiad.wmnet
  • 15:12 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2077.codfw.wmnet
  • 15:10 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
  • 15:10 kamila@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-ctrl1001']
  • 15:09 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
  • 15:09 jnuche@deploy1002: Installation of scap version "4.86.0" completed for 285 hosts
  • 15:08 jnuche@deploy1002: Installing scap version "4.86.0" for 285 hosts
  • 15:07 jnuche@deploy1002: Installing scap version "4.86.0" for 286 hosts
  • 15:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T364069)', diff saved to https://phabricator.wikimedia.org/P64114 and previous config saved to /var/cache/conftool/dbconfig/20240605-150605-marostegui.json
  • 15:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 15:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 15:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T364069)', diff saved to https://phabricator.wikimedia.org/P64113 and previous config saved to /var/cache/conftool/dbconfig/20240605-150542-marostegui.json
  • 15:05 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 15:04 vgutierrez: repool text@eqsin with IPIP encapsulation enabled - T366466
  • 15:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2002.codfw.wmnet
  • 15:01 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 14:59 cwhite@deploy1002: Finished scap: Backport for MWMultiVersion: Fix "Undefined index: PATH_INFO" warnings (T366657) (duration: 12m 32s)
  • 14:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T352010)', diff saved to https://phabricator.wikimedia.org/P64112 and previous config saved to /var/cache/conftool/dbconfig/20240605-145757-ladsgroup.json
  • 14:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 14:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 14:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T352010)', diff saved to https://phabricator.wikimedia.org/P64111 and previous config saved to /var/cache/conftool/dbconfig/20240605-145735-ladsgroup.json
  • 14:55 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 14:55 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 14:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2002.codfw.wmnet
  • 14:55 vgutierrez: rolling restart of pybal on lvs5006 and lvs5004 - T366466
  • 14:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P64110 and previous config saved to /var/cache/conftool/dbconfig/20240605-145034-marostegui.json
  • 14:50 cwhite@deploy1002: matmarex and cwhite: Continuing with sync
  • 14:49 cwhite@deploy1002: matmarex and cwhite: Backport for MWMultiVersion: Fix "Undefined index: PATH_INFO" warnings (T366657) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host serpens.wikimedia.org
  • 14:46 cwhite@deploy1002: Started scap: Backport for MWMultiVersion: Fix "Undefined index: PATH_INFO" warnings (T366657)
  • 14:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host serpens.wikimedia.org
  • 14:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P64109 and previous config saved to /var/cache/conftool/dbconfig/20240605-144227-ladsgroup.json
  • 14:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P64108 and previous config saved to /var/cache/conftool/dbconfig/20240605-143526-marostegui.json
  • 14:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6002.drmrs.wmnet
  • 14:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
  • 14:29 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 14:28 vgutierrez: depool text@eqsin before enabling IPIP encapsulation - T366466
  • 14:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P64107 and previous config saved to /var/cache/conftool/dbconfig/20240605-142718-ladsgroup.json
  • 14:23 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1076.eqiad.wmnet
  • 14:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2076.codfw.wmnet
  • 14:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
  • 14:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T364069)', diff saved to https://phabricator.wikimedia.org/P64106 and previous config saved to /var/cache/conftool/dbconfig/20240605-142018-marostegui.json
  • 14:15 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2076.codfw.wmnet
  • 14:15 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1076.eqiad.wmnet
  • 14:13 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1075.eqiad.wmnet
  • 14:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2075.codfw.wmnet
  • 14:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T352010)', diff saved to https://phabricator.wikimedia.org/P64105 and previous config saved to /var/cache/conftool/dbconfig/20240605-141210-ladsgroup.json
  • 14:10 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:10 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:07 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2075.codfw.wmnet
  • 14:05 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1075.eqiad.wmnet
  • 14:04 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ping1004.eqiad.wmnet with OS bookworm
  • 14:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ping1004.eqiad.wmnet - jmm@cumin2002"
  • 14:02 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:02 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:00 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ping1004.eqiad.wmnet - jmm@cumin2002"
  • 14:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6002.drmrs.wmnet
  • 14:00 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2074.codfw.wmnet
  • 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping1004.eqiad.wmnet on all recursors
  • 14:00 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping1004.eqiad.wmnet on all recursors
  • 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping1004.eqiad.wmnet - jmm@cumin2002"
  • 13:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6001.drmrs.wmnet
  • 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
  • 13:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping1004.eqiad.wmnet - jmm@cumin2002"
  • 13:54 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1074.eqiad.wmnet
  • 13:52 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus3003.esams.wmnet
  • 13:52 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2074.codfw.wmnet
  • 13:52 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2073.codfw.wmnet
  • 13:52 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus5002.eqsin.wmnet
  • 13:52 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus4002.ulsfo.wmnet
  • 13:51 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 13:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
  • 13:48 inflatador: bking@an-db1001 install python3-psycopg2 pkg T363001
  • 13:48 daniel@deploy1002: Finished scap: Backport for Set LinterParseOnDerivedDataUpdate to false (T361013) (duration: 17m 50s)
  • 13:48 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:48 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping1004.eqiad.wmnet
  • 13:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 13:46 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus3003.esams.wmnet
  • 13:46 elukey: factory reset for sretest1001 to test the new provision cookbook - T365372
  • 13:46 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus4002.ulsfo.wmnet
  • 13:46 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2073.codfw.wmnet
  • 13:46 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus5002.eqsin.wmnet
  • 13:46 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1074.eqiad.wmnet
  • 13:45 inflatador: bking@an-db1001 install acl pkg T363001
  • 13:43 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1073.eqiad.wmnet
  • 13:43 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus6002.drmrs.wmnet
  • 13:43 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus7001.magru.wmnet
  • 13:40 daniel@deploy1002: daniel: Continuing with sync
  • 13:39 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus6002.drmrs.wmnet
  • 13:37 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6001.drmrs.wmnet
  • 13:37 filippo@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host graphite1005.eqiad.wmnet
  • 13:37 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus7001.magru.wmnet
  • 13:37 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2072.codfw.wmnet
  • 13:36 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1073.eqiad.wmnet
  • 13:35 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1072.eqiad.wmnet
  • 13:34 daniel@deploy1002: daniel: Backport for Set LinterParseOnDerivedDataUpdate to false (T361013) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:34 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:30 daniel@deploy1002: Started scap: Backport for Set LinterParseOnDerivedDataUpdate to false (T361013)
  • 13:29 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2072.codfw.wmnet
  • 13:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2071.codfw.wmnet
  • 13:27 elukey: systemctl reset-failed [email protected] redis-instance-tcp_6380.service on netbox[12]002 + apt-get purge of redis-server and prometheus-redis-exporter packages to clean up stale configs (no local redis is used)
  • 13:27 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1072.eqiad.wmnet
  • 13:26 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1071.eqiad.wmnet
  • 13:26 dreamyjazz@deploy1002: Finished scap: Backport for Follow-up: Don't run interact with block buttons if they don't exist (T329493) (duration: 11m 39s)
  • 13:25 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host graphite1005.eqiad.wmnet
  • 13:21 fabfur: enable magru DC after applying IPIP encapsulation patches (T366466)
  • 13:20 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2071.codfw.wmnet
  • 13:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2070.codfw.wmnet
  • 13:17 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
  • 13:17 dreamyjazz@deploy1002: dreamyjazz: Backport for Follow-up: Don't run interact with block buttons if they don't exist (T329493) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2179 (T364299)', diff saved to https://phabricator.wikimedia.org/P64104 and previous config saved to /var/cache/conftool/dbconfig/20240605-131647-marostegui.json
  • 13:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 13:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 13:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T364299)', diff saved to https://phabricator.wikimedia.org/P64103 and previous config saved to /var/cache/conftool/dbconfig/20240605-131623-marostegui.json
  • 13:14 dreamyjazz@deploy1002: Started scap: Backport for Follow-up: Don't run interact with block buttons if they don't exist (T329493)
  • 13:13 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2070.codfw.wmnet
  • 13:13 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1071.eqiad.wmnet
  • 13:13 dreamyjazz@deploy1002: Finished scap: Backport for [CheckUser] Stop writing old for event table migration on testwiki (T360686) (duration: 19m 13s)
  • 13:10 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:aux-worker
  • 13:06 fabfur: restarting pybal on lvs7001/lvs7003 to appy IPIP conf (T366466)
  • 13:04 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
  • 13:03 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1070.eqiad.wmnet
  • 13:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2069.codfw.wmnet
  • 13:02 dreamyjazz@deploy1002: dreamyjazz: Backport for [CheckUser] Stop writing old for event table migration on testwiki (T360686) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P64102 and previous config saved to /var/cache/conftool/dbconfig/20240605-130115-marostegui.json
  • 12:56 elukey@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:aux-worker
  • 12:55 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1070.eqiad.wmnet
  • 12:55 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2069.codfw.wmnet
  • 12:53 dreamyjazz@deploy1002: Started scap: Backport for [CheckUser] Stop writing old for event table migration on testwiki (T360686)
  • 12:53 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1069.eqiad.wmnet
  • 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping2004.codfw.wmnet
  • 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ping2004.codfw.wmnet with OS bookworm
  • 12:51 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2068.codfw.wmnet
  • 12:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1246.eqiad.wmnet with reason: maintenance
  • 12:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1246.eqiad.wmnet with reason: maintenance
  • 12:49 arnaudb@cumin1002: dbctl commit (dc=all): 'depool db1246 T363119', diff saved to https://phabricator.wikimedia.org/P64101 and previous config saved to /var/cache/conftool/dbconfig/20240605-124918-arnaudb.json
  • 12:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P64100 and previous config saved to /var/cache/conftool/dbconfig/20240605-124607-marostegui.json
  • 12:46 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1069.eqiad.wmnet
  • 12:45 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1068.eqiad.wmnet
  • 12:45 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2068.codfw.wmnet
  • 12:45 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2067.codfw.wmnet
  • 12:43 moritzm: failover ganeti masters in drmrs
  • 12:40 cgoubert@cumin1002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:wikikube-worker-codfw
  • 12:39 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1068.eqiad.wmnet
  • 12:39 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
  • 12:38 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
  • 12:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2066.codfw.wmnet
  • 12:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ping2004.codfw.wmnet with reason: host reimage
  • 12:35 fabfur: disabling puppet on A:cp-text to test IPIP encapsulation on magru (T366466)
  • 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6004.drmrs.wmnet
  • 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
  • 12:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ping2004.codfw.wmnet with reason: host reimage
  • 12:32 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2066.codfw.wmnet
  • 12:31 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
  • 12:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T364299)', diff saved to https://phabricator.wikimedia.org/P64099 and previous config saved to /var/cache/conftool/dbconfig/20240605-123059-marostegui.json
  • 12:29 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
  • 12:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2065.codfw.wmnet
  • 12:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
  • 12:26 fabfur: disabling magru DC to apply IPIP encapsulation patches (T366466)
  • 12:21 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
  • 12:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Long schema change
  • 12:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Long schema change
  • 12:20 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2064.codfw.wmnet
  • 12:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6004.drmrs.wmnet
  • 12:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6003.drmrs.wmnet
  • 12:17 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
  • 12:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
  • 12:16 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
  • 12:15 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ping2004.codfw.wmnet with OS bookworm
  • 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ping2004.codfw.wmnet - jmm@cumin2002"
  • 12:14 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2064.codfw.wmnet
  • 12:14 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ping2004.codfw.wmnet - jmm@cumin2002"
  • 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping2004.codfw.wmnet on all recursors
  • 12:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping2004.codfw.wmnet on all recursors
  • 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping2004.codfw.wmnet - jmm@cumin2002"
  • 12:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet
  • 12:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping2004.codfw.wmnet - jmm@cumin2002"
  • 12:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
  • 12:09 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
  • 12:08 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
  • 12:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 12:05 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping2004.codfw.wmnet
  • 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 12:04 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 12:03 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6003.drmrs.wmnet
  • 12:00 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
  • 12:00 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1063.eqiad.wmnet
  • 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
  • 11:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2062.codfw.wmnet
  • 11:52 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1063.eqiad.wmnet
  • 11:50 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1062.eqiad.wmnet
  • 11:49 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2062.codfw.wmnet
  • 11:48 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2061.codfw.wmnet
  • 11:44 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-eqiad
  • 11:41 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1062.eqiad.wmnet
  • 11:41 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
  • 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2002.codfw.wmnet
  • 11:39 hnowlan@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1008.eqiad.wmnet|wikikube-worker1009.eqiad.wmnet|wikikube-worker1010.eqiad.wmnet|wikikube-worker1011.eqiad.wmnet|wikikube-worker1012.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 11:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2002.codfw.wmnet
  • 11:38 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1061.eqiad.wmnet
  • 11:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
  • 11:37 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1031.eqiad.wmnet with OS bullseye
  • 11:36 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-eqiad
  • 11:31 hnowlan: running homer to configure bgp on 5 new k8s workers
  • 11:31 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1011.eqiad.wmnet with OS bullseye
  • 11:30 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
  • 11:30 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1061.eqiad.wmnet
  • 11:27 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1009.eqiad.wmnet with OS bullseye
  • 11:21 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1031.eqiad.wmnet with reason: host reimage
  • 11:17 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1031.eqiad.wmnet with reason: host reimage
  • 11:12 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1011.eqiad.wmnet with reason: host reimage
  • 11:09 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1009.eqiad.wmnet with reason: host reimage
  • 11:06 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1011.eqiad.wmnet with reason: host reimage
  • 11:06 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1009.eqiad.wmnet with reason: host reimage
  • 11:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
  • 11:03 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1031.eqiad.wmnet with OS bullseye
  • 11:03 claime: restarted send_tile_invalidations.service on maps1009
  • 11:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64098 and previous config saved to /var/cache/conftool/dbconfig/20240605-110303-ladsgroup.json
  • 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-codfw
  • 10:54 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1060.eqiad.wmnet
  • 10:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64097 and previous config saved to /var/cache/conftool/dbconfig/20240605-105400-root.json
  • 10:53 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1011.eqiad.wmnet with OS bullseye
  • 10:53 hnowlan@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1011.eqiad.wmnet with OS bullseye
  • 10:53 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1009.eqiad.wmnet with OS bullseye
  • 10:52 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1009.eqiad.wmnet with OS bullseye
  • 10:52 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-codfw
  • 10:50 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
  • 10:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
  • 10:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64096 and previous config saved to /var/cache/conftool/dbconfig/20240605-104757-ladsgroup.json
  • 10:46 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1060.eqiad.wmnet
  • 10:46 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1059.eqiad.wmnet
  • 10:42 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
  • 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
  • 10:39 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1003.eqiad.wmnet
  • 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64094 and previous config saved to /var/cache/conftool/dbconfig/20240605-103854-root.json
  • 10:37 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1003.eqiad.wmnet
  • 10:37 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1012.eqiad.wmnet with OS bullseye
  • 10:35 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
  • 10:34 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1010.eqiad.wmnet with OS bullseye
  • 10:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64093 and previous config saved to /var/cache/conftool/dbconfig/20240605-103251-ladsgroup.json
  • 10:32 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1059.eqiad.wmnet
  • 10:32 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
  • 10:31 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2056.codfw.wmnet
  • 10:30 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1008.eqiad.wmnet with OS bullseye
  • 10:30 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1058.eqiad.wmnet
  • 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.netbox.restart-reboot (exit_code=0) rolling reboot on A:netbox
  • 10:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64091 and previous config saved to /var/cache/conftool/dbconfig/20240605-102348-root.json
  • 10:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64090 and previous config saved to /var/cache/conftool/dbconfig/20240605-102252-ladsgroup.json
  • 10:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 10:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 10:22 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1058.eqiad.wmnet
  • 10:22 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2056.codfw.wmnet
  • 10:21 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2055.codfw.wmnet
  • 10:21 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1057.eqiad.wmnet
  • 10:18 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1012.eqiad.wmnet with reason: host reimage
  • 10:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P64088 and previous config saved to /var/cache/conftool/dbconfig/20240605-101744-ladsgroup.json
  • 10:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1152.eqiad.wmnet with OS bookworm
  • 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64087 and previous config saved to /var/cache/conftool/dbconfig/20240605-101521-ladsgroup.json
  • 10:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 10:15 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1010.eqiad.wmnet with reason: host reimage
  • 10:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 10:13 dcaro@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudcephosd1031.eqiad.wmnet
  • 10:13 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1012.eqiad.wmnet with reason: host reimage
  • 10:11 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1008.eqiad.wmnet with reason: host reimage
  • 10:10 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1152 back to x2 eqiad master T366677', diff saved to https://phabricator.wikimedia.org/P64086 and previous config saved to /var/cache/conftool/dbconfig/20240605-101019-root.json
  • 10:09 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1010.eqiad.wmnet with reason: host reimage
  • 10:09 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1008.eqiad.wmnet with reason: host reimage
  • 10:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64085 and previous config saved to /var/cache/conftool/dbconfig/20240605-100842-root.json
  • 10:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64084 and previous config saved to /var/cache/conftool/dbconfig/20240605-100810-root.json
  • 10:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64083 and previous config saved to /var/cache/conftool/dbconfig/20240605-100117-root.json
  • 10:00 fabfur: disabling puppet on cp4037 to test Benthos performances (T358109)
  • 10:00 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1012.eqiad.wmnet with OS bullseye
  • 10:00 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1057.eqiad.wmnet
  • 10:00 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1011.eqiad.wmnet with OS bullseye
  • 10:00 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2055.codfw.wmnet
  • 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
  • 09:59 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
  • 09:59 cgoubert@cumin1002: conftool action : set/pooled=yes:weight=10; selector: name=wikikube-worker1001.eqiad.wmnet,cluster=kubernetes,service=kubesvc
  • 09:58 claime: pooling and uncordoning wikikube-worker1001 - T351074
  • 09:57 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1456 to wikikube-worker1012
  • 09:57 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1012
  • 09:56 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 09:55 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1010.eqiad.wmnet with OS bullseye
  • 09:55 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1009.eqiad.wmnet with OS bullseye
  • 09:55 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1008.eqiad.wmnet with OS bullseye
  • 09:55 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1008.eqiad.wmnet wikikube-worker1009.eqiad.wmnet wikikube-worker1010.eqiad.wmnet wikikube-worker1011.eqiad.wmnet wikikube-worker1012.eqiad.wmnet on all recursors
  • 09:55 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1008.eqiad.wmnet wikikube-worker1009.eqiad.wmnet wikikube-worker1010.eqiad.wmnet wikikube-worker1011.eqiad.wmnet wikikube-worker1012.eqiad.wmnet on all recursors
  • 09:54 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1012
  • 09:54 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:54 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1456 to wikikube-worker1012 - hnowlan@cumin1002"
  • 09:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1152.eqiad.wmnet with reason: host reimage
  • 09:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
  • 09:54 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
  • 09:54 jmm@cumin2002: START - Cookbook sre.netbox.restart-reboot rolling reboot on A:netbox
  • 09:53 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1456 to wikikube-worker1012 - hnowlan@cumin1002"
  • 09:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64082 and previous config saved to /var/cache/conftool/dbconfig/20240605-095336-root.json
  • 09:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64081 and previous config saved to /var/cache/conftool/dbconfig/20240605-095303-root.json
  • 09:52 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1428 to wikikube-worker1011
  • 09:52 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1011
  • 09:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1152.eqiad.wmnet with reason: host reimage
  • 09:51 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1031.eqiad.wmnet
  • 09:51 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 09:51 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1456 to wikikube-worker1012
  • 09:50 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1011
  • 09:50 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:50 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1428 to wikikube-worker1011 - hnowlan@cumin1002"
  • 09:49 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1428 to wikikube-worker1011 - hnowlan@cumin1002"
  • 09:46 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 09:46 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1428 to wikikube-worker1011
  • 09:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64080 and previous config saved to /var/cache/conftool/dbconfig/20240605-094611-root.json
  • 09:46 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from mw1428 to wikikube-worker1011
  • 09:45 hnowlan@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 09:45 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from mw1456 to wikikube-worker1012
  • 09:44 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1410 to wikikube-worker1010
  • 09:44 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1456 to wikikube-worker1012
  • 09:44 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1010
  • 09:44 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 09:44 claime: homer 'cr*eqiad*' commit 'T351074'
  • 09:44 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1428 to wikikube-worker1011
  • 09:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1001.eqiad.wmnet with OS bullseye
  • 09:43 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1401 to wikikube-worker1009
  • 09:43 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1009
  • 09:42 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1010
  • 09:42 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:41 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1009
  • 09:41 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:41 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1401 to wikikube-worker1009 - hnowlan@cumin1002"
  • 09:41 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 09:40 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1400 to wikikube-worker1008
  • 09:40 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1008
  • 09:39 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1401 to wikikube-worker1009 - hnowlan@cumin1002"
  • 09:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2054.codfw.wmnet
  • 09:38 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1008
  • 09:38 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:38 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1400 to wikikube-worker1008 - hnowlan@cumin1002"
  • 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2003.wikimedia.org
  • 09:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64079 and previous config saved to /var/cache/conftool/dbconfig/20240605-093830-root.json
  • 09:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64078 and previous config saved to /var/cache/conftool/dbconfig/20240605-093757-root.json
  • 09:37 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1152.eqiad.wmnet with OS bookworm
  • 09:35 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 09:35 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1151 to temp x2 eqiad master T366677', diff saved to https://phabricator.wikimedia.org/P64077 and previous config saved to /var/cache/conftool/dbconfig/20240605-093507-root.json
  • 09:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 6 hosts with reason: Reimage x2 eqiad master T366677
  • 09:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on 6 hosts with reason: Reimage x2 eqiad master T366677
  • 09:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2003.wikimedia.org
  • 09:33 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1410 to wikikube-worker1010
  • 09:33 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1400 to wikikube-worker1008 - hnowlan@cumin1002"
  • 09:31 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from mw1410 to wikikube-worker1010.eqiad.wmnet
  • 09:31 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1410 to wikikube-worker1010.eqiad.wmnet
  • 09:31 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1030.eqiad.wmnet
  • 09:31 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1401 to wikikube-worker1009
  • 09:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64076 and previous config saved to /var/cache/conftool/dbconfig/20240605-093105-root.json
  • 09:30 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 09:30 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1400 to wikikube-worker1008
  • 09:29 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from mw1400 to wikikube-worker1008.eqiad.wmnet
  • 09:29 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1400 to wikikube-worker1008.eqiad.wmnet
  • 09:26 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1030.eqiad.wmnet
  • 09:26 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1056.eqiad.wmnet
  • 09:24 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2054.codfw.wmnet
  • 09:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2053.codfw.wmnet
  • 09:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS bookworm
  • 09:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64075 and previous config saved to /var/cache/conftool/dbconfig/20240605-092324-root.json
  • 09:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64074 and previous config saved to /var/cache/conftool/dbconfig/20240605-092251-root.json
  • 09:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1001.eqiad.wmnet with reason: host reimage
  • 09:19 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1056.eqiad.wmnet
  • 09:18 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1055.eqiad.wmnet
  • 09:17 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1001.eqiad.wmnet with reason: host reimage
  • 09:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64073 and previous config saved to /var/cache/conftool/dbconfig/20240605-091559-root.json
  • 09:15 brouberol@cumin2002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons.
  • 09:11 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1055.eqiad.wmnet
  • 09:11 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1054.eqiad.wmnet
  • 09:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64072 and previous config saved to /var/cache/conftool/dbconfig/20240605-090745-root.json
  • 09:06 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4044.ulsfo.wmnet
  • 09:06 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet
  • 09:06 brouberol@cumin2002: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons.
  • 09:02 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1001.eqiad.wmnet with OS bullseye
  • 09:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage
  • 09:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1001.eqiad.wmnet on all recursors
  • 09:01 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1001.eqiad.wmnet on all recursors
  • 09:01 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4052.ulsfo.wmnet
  • 09:00 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64071 and previous config saved to /var/cache/conftool/dbconfig/20240605-090053-root.json
  • 09:00 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4044.ulsfo.wmnet
  • 08:58 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 08:58 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 08:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage
  • 08:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org
  • 08:57 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2053.codfw.wmnet
  • 08:57 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1054.eqiad.wmnet
  • 08:54 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
  • 08:54 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1053.eqiad.wmnet
  • 08:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2052.codfw.wmnet
  • 08:53 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/ipoid: apply
  • 08:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org
  • 08:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:52 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64070 and previous config saved to /var/cache/conftool/dbconfig/20240605-085239-root.json
  • 08:52 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1029.eqiad.wmnet
  • 08:51 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4052.ulsfo.wmnet
  • 08:51 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 08:51 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4044.ulsfo.wmnet
  • 08:50 fabfur@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cp4044.ulsfo.wmnet
  • 08:50 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4044.ulsfo.wmnet
  • 08:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2002.codfw.wmnet
  • 08:47 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2052.codfw.wmnet
  • 08:47 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1053.eqiad.wmnet
  • 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2002.codfw.wmnet
  • 08:45 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64069 and previous config saved to /var/cache/conftool/dbconfig/20240605-084547-root.json
  • 08:45 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1029.eqiad.wmnet
  • 08:45 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4052.ulsfo.wmnet
  • 08:44 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4044.ulsfo.wmnet
  • 08:44 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS bookworm
  • 08:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1227', diff saved to https://phabricator.wikimedia.org/P64068 and previous config saved to /var/cache/conftool/dbconfig/20240605-084211-root.json
  • 08:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage
  • 08:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage
  • 08:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
  • 08:37 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1028.eqiad.wmnet
  • 08:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64067 and previous config saved to /var/cache/conftool/dbconfig/20240605-083733-root.json
  • 08:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
  • 08:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1358 to wikikube-worker1001
  • 08:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1001
  • 08:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2002.codfw.wmnet
  • 08:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1186.eqiad.wmnet with OS bookworm
  • 08:18 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
  • 08:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P64063 and previous config saved to /var/cache/conftool/dbconfig/20240605-081755-marostegui.json
  • 08:14 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1027.eqiad.wmnet
  • 08:08 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1027.eqiad.wmnet
  • 08:07 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1026.eqiad.wmnet
  • 08:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P64062 and previous config saved to /var/cache/conftool/dbconfig/20240605-080247-marostegui.json
  • 08:01 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1026.eqiad.wmnet
  • 08:00 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1025.eqiad.wmnet
  • 08:00 cgoubert@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-codfw
  • 07:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mirror1001.wikimedia.org
  • 07:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1186.eqiad.wmnet with reason: host reimage
  • 07:54 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1025.eqiad.wmnet
  • 07:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1186.eqiad.wmnet with reason: host reimage
  • 07:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mirror1001.wikimedia.org
  • 07:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T364299)', diff saved to https://phabricator.wikimedia.org/P64061 and previous config saved to /var/cache/conftool/dbconfig/20240605-074739-marostegui.json
  • 07:45 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1021.eqiad.wmnet
  • 07:38 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1186.eqiad.wmnet with OS bookworm
  • 07:38 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1021.eqiad.wmnet
  • 07:38 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host db1186.eqiad.wmnet with OS bookworm
  • 07:38 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1186.eqiad.wmnet with OS bookworm
  • 07:37 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host db1186.eqiad.wmnet with OS bookworm
  • 07:37 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1186.eqiad.wmnet with OS bookworm
  • 07:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install2004.wikimedia.org
  • 07:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install1004.wikimedia.org
  • 07:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1186.eqiad.wmnet with reason: Reimage
  • 07:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install2004.wikimedia.org
  • 07:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 5:00:00 on db1186.eqiad.wmnet with reason: Reimage
  • 07:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install1004.wikimedia.org
  • 07:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1186', diff saved to https://phabricator.wikimedia.org/P64060 and previous config saved to /var/cache/conftool/dbconfig/20240605-073024-root.json
  • 07:28 marostegui: dbmaint codfw s2 deploy schema change on db2207 T364299
  • 07:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2207.codfw.wmnet with reason: Long schema change
  • 07:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on db2207.codfw.wmnet with reason: Long schema change
  • 07:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2207 T366038', diff saved to https://phabricator.wikimedia.org/P64059 and previous config saved to /var/cache/conftool/dbconfig/20240605-072509-root.json
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2204 to s2 primary T366038', diff saved to https://phabricator.wikimedia.org/P64058 and previous config saved to /var/cache/conftool/dbconfig/20240605-072427-marostegui.json
  • 07:24 marostegui: Starting s2 codfw failover from db2207 to db2204 - T366038
  • 07:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 T366038
  • 07:08 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2204 with weight 0 T366038', diff saved to https://phabricator.wikimedia.org/P64057 and previous config saved to /var/cache/conftool/dbconfig/20240605-070758-root.json
  • 07:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 T366038
  • 04:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T364069)', diff saved to https://phabricator.wikimedia.org/P64056 and previous config saved to /var/cache/conftool/dbconfig/20240605-044418-marostegui.json
  • 04:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 04:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 04:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T364069)', diff saved to https://phabricator.wikimedia.org/P64055 and previous config saved to /var/cache/conftool/dbconfig/20240605-044355-marostegui.json
  • 04:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P64054 and previous config saved to /var/cache/conftool/dbconfig/20240605-042847-marostegui.json
  • 04:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P64053 and previous config saved to /var/cache/conftool/dbconfig/20240605-041339-marostegui.json
  • 04:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T364299)', diff saved to https://phabricator.wikimedia.org/P64052 and previous config saved to /var/cache/conftool/dbconfig/20240605-041306-marostegui.json
  • 04:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 04:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 04:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 04:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 04:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T364299)', diff saved to https://phabricator.wikimedia.org/P64051 and previous config saved to /var/cache/conftool/dbconfig/20240605-041227-marostegui.json
  • 03:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T352010)', diff saved to https://phabricator.wikimedia.org/P64050 and previous config saved to /var/cache/conftool/dbconfig/20240605-035855-ladsgroup.json
  • 03:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 03:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 03:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T352010)', diff saved to https://phabricator.wikimedia.org/P64049 and previous config saved to /var/cache/conftool/dbconfig/20240605-035832-ladsgroup.json
  • 03:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T364069)', diff saved to https://phabricator.wikimedia.org/P64048 and previous config saved to /var/cache/conftool/dbconfig/20240605-035831-marostegui.json
  • 03:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P64047 and previous config saved to /var/cache/conftool/dbconfig/20240605-035719-marostegui.json
  • 03:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P64046 and previous config saved to /var/cache/conftool/dbconfig/20240605-034326-ladsgroup.json
  • 03:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P64045 and previous config saved to /var/cache/conftool/dbconfig/20240605-034212-marostegui.json
  • 03:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P64044 and previous config saved to /var/cache/conftool/dbconfig/20240605-032817-ladsgroup.json
  • 03:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T364299)', diff saved to https://phabricator.wikimedia.org/P64043 and previous config saved to /var/cache/conftool/dbconfig/20240605-032704-marostegui.json
  • 03:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T352010)', diff saved to https://phabricator.wikimedia.org/P64042 and previous config saved to /var/cache/conftool/dbconfig/20240605-031310-ladsgroup.json
  • 02:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T352010)', diff saved to https://phabricator.wikimedia.org/P64041 and previous config saved to /var/cache/conftool/dbconfig/20240605-023423-ladsgroup.json
  • 02:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance

2024-06-04

  • 23:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T364299)', diff saved to https://phabricator.wikimedia.org/P64040 and previous config saved to /var/cache/conftool/dbconfig/20240604-234228-marostegui.json
  • 23:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 23:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 23:15 tzatziki: removing one file for legal compliance
  • 23:09 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on miscweb1003.eqiad.wmnet with reason: reboot T366555
  • 23:09 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on miscweb1003.eqiad.wmnet with reason: reboot T366555
  • 22:50 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 22:47 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on contint.wikimedia.org with reason: reboot T366555
  • 22:47 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on contint.wikimedia.org with reason: reboot T366555
  • 22:47 tzatziki: removing one file for legal compliance
  • 22:46 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on contint1002.wikimedia.org with reason: reboot T366555
  • 22:46 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on contint1002.wikimedia.org with reason: reboot T366555
  • 22:36 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on contint1002.wikimedia.org with reason: reboot T366555
  • 22:36 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on contint1002.wikimedia.org with reason: reboot T366555
  • 22:36 mutante: CI - (integration.wikimedia.org) short downtime for maintenance
  • 22:35 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on contint.wikimedia.org with reason: reboot T366555
  • 22:35 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on contint.wikimedia.org with reason: reboot T366555
  • 22:29 tzatziki: removing two files for legal compliance
  • 22:16 tzatziki: removing three files for legal compliance
  • 22:08 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 22:02 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 22:02 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 22:00 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 21:59 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:59 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:41 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:41 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:35 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:34 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:33 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:33 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:33 urbanecm@deploy1002: Finished scap: Backport for Disable font size options on specified pages for most wikis (T366334) (duration: 15m 10s)
  • 21:32 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:32 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:28 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:28 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:24 urbanecm@deploy1002: toyofuku and urbanecm: Continuing with sync
  • 21:21 urbanecm@deploy1002: toyofuku and urbanecm: Backport for Disable font size options on specified pages for most wikis (T366334) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:18 urbanecm@deploy1002: Started scap: Backport for Disable font size options on specified pages for most wikis (T366334)
  • 21:10 tgr@deploy1002: Finished scap: Backport for multiversion: Support beta for upload hostname check, multiversion: Add tests for MWMultiVersion::getMediaWiki() (duration: 16m 33s)
  • 21:07 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 21:06 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001']
  • 21:01 tgr@deploy1002: tgr: Continuing with sync
  • 20:58 tgr@deploy1002: tgr: Backport for multiversion: Support beta for upload hostname check, multiversion: Add tests for MWMultiVersion::getMediaWiki() synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:56 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
  • 20:53 tgr@deploy1002: Started scap: Backport for multiversion: Support beta for upload hostname check, multiversion: Add tests for MWMultiVersion::getMediaWiki()
  • 20:52 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 20:47 tgr@deploy1002: Finished scap: Backport for beta: Introduce new test2wiki on test2.wikipedia.beta.wmcloud.org (T355281) (duration: 13m 12s)
  • 20:39 tgr@deploy1002: tgr and pmiazga: Continuing with sync
  • 20:37 tgr@deploy1002: tgr and pmiazga: Backport for beta: Introduce new test2wiki on test2.wikipedia.beta.wmcloud.org (T355281) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:34 tgr@deploy1002: Started scap: Backport for beta: Introduce new test2wiki on test2.wikipedia.beta.wmcloud.org (T355281)
  • 20:28 ladsgroup@deploy1002: Finished scap: Backport for [pawiki] Enable wgMinervaEnableSiteNotice (T366434) (duration: 13m 24s)
  • 20:27 jhathaway: vacuuming pcc db
  • 20:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 20:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 20:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T364299)', diff saved to https://phabricator.wikimedia.org/P64039 and previous config saved to /var/cache/conftool/dbconfig/20240604-202554-marostegui.json
  • 20:22 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
  • 20:22 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
  • 20:21 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
  • 20:21 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
  • 20:19 ladsgroup@deploy1002: pppery and ladsgroup: Continuing with sync
  • 20:17 ladsgroup@deploy1002: pppery and ladsgroup: Backport for [pawiki] Enable wgMinervaEnableSiteNotice (T366434) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:15 ladsgroup@deploy1002: Started scap: Backport for [pawiki] Enable wgMinervaEnableSiteNotice (T366434)
  • 20:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P64038 and previous config saved to /var/cache/conftool/dbconfig/20240604-201047-marostegui.json
  • 20:00 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 19:59 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001']
  • 19:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P64037 and previous config saved to /var/cache/conftool/dbconfig/20240604-195539-marostegui.json
  • 19:49 ecarg@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:49 ecarg@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:47 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
  • 19:44 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T364299)', diff saved to https://phabricator.wikimedia.org/P64036 and previous config saved to /var/cache/conftool/dbconfig/20240604-194031-marostegui.json
  • 19:38 mutante: https://gerrit-replica.wikimedia.org - short downtime for maintenance
  • 19:38 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on gerrit-replica.wikimedia.org with reason: reboot T366555
  • 19:38 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on gerrit-replica.wikimedia.org with reason: reboot T366555
  • 19:37 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 19:37 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on gerrit2002.wikimedia.org with reason: reboot T366555
  • 19:37 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 19:37 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on gerrit2002.wikimedia.org with reason: reboot T366555
  • 19:36 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 19:33 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on contint2002.wikimedia.org with reason: reboot T366555
  • 19:32 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on contint2002.wikimedia.org with reason: reboot T366555
  • 19:16 mutante: releases.wikimedia.org - short downtime for maintenance
  • 19:14 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on releases1003.eqiad.wmnet with reason: reboot T366555
  • 19:13 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on releases1003.eqiad.wmnet with reason: reboot T366555
  • 19:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 19:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 19:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T364069)', diff saved to https://phabricator.wikimedia.org/P64035 and previous config saved to /var/cache/conftool/dbconfig/20240604-190931-marostegui.json
  • 19:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 19:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 19:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T364069)', diff saved to https://phabricator.wikimedia.org/P64034 and previous config saved to /var/cache/conftool/dbconfig/20240604-190906-marostegui.json
  • 19:06 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 19:06 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 19:06 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 19:00 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@43b966f]: 0.3.142 (duration: 12m 53s)
  • 18:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P64033 and previous config saved to /var/cache/conftool/dbconfig/20240604-185358-marostegui.json
  • 18:48 ryankemper: [WDQS Deploy] Forgot to run the command to set git hash to tip of origin/master so deploy was a partial no-op. Re-rolling...
  • 18:47 ryankemper@deploy1002: Started deploy [wdqs/wdqs@43b966f]: 0.3.142
  • 18:46 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@143ca33]: 0.3.142 (duration: 02m 02s)
  • 18:45 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.142` on canary `wdqs1016`; proceeding to rest of fleet
  • 18:44 ryankemper@deploy1002: Started deploy [wdqs/wdqs@143ca33]: 0.3.142
  • 18:41 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.142`. Pre-deploy tests passing on canary `wdqs1016`
  • 18:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P64032 and previous config saved to /var/cache/conftool/dbconfig/20240604-183850-marostegui.json
  • 18:35 mutante: aphlict - (phab realtime notifications) - reboots
  • 18:30 mutante: doc.wikimedia.org - very short downtime for maintenance
  • 18:28 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on doc1003.eqiad.wmnet with reason: reboot T366555
  • 18:28 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on doc1003.eqiad.wmnet with reason: reboot T366555
  • 18:28 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on doc.wikimedia.org with reason: reboot T366555
  • 18:28 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on doc.wikimedia.org with reason: reboot T366555
  • 18:26 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.8 refs T361402
  • 18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T364069)', diff saved to https://phabricator.wikimedia.org/P64031 and previous config saved to /var/cache/conftool/dbconfig/20240604-182342-marostegui.json
  • 18:15 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 18:04 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp7014*} and A:cp
  • 17:54 sukhe@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp7014*} and A:cp
  • 17:53 sukhe: sudo cumin 'A:cp-upload and A:magru' "sed -i '/\sup ethtool -A eno12399np0/d' /etc/network/interfaces"
  • 17:51 sukhe: sudo cumin 'A:cp-text and A:magru' "sed -i '/\sup ethtool -A eno12399np0/d' /etc/network/interfaces"
  • 17:49 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp7002*} and A:cp
  • 17:39 sukhe@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp7002*} and A:cp
  • 17:23 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 17:22 sukhe: sudo cumin 'A:cp and A:magru' 'run-puppet-agent'
  • 17:15 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:15 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1001 to a new rack - kamila@cumin1002"
  • 17:14 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1001 to a new rack - kamila@cumin1002"
  • 17:11 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 16:53 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp700[12].magru.wmnet,service=(cdn|ats-be)
  • 16:52 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:51 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:41 elukey: delete other 2 pods in eventgate-main on wikikube-eqiad to test if envoy on them is in a weird state
  • 16:36 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1010.eqiad.wmnet
  • 16:31 elukey: delete 3 pods in eventgate-main on wikikube-eqiad to test if envoy on them is in a weird state
  • 16:29 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1010.eqiad.wmnet
  • 16:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64028 and previous config saved to /var/cache/conftool/dbconfig/20240604-162241-root.json
  • 16:22 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp7002.magru.wmnet
  • 16:15 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp7001.magru.wmnet
  • 16:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2137 (T364299)', diff saved to https://phabricator.wikimedia.org/P64025 and previous config saved to /var/cache/conftool/dbconfig/20240604-161233-marostegui.json
  • 16:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 16:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 16:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T364299)', diff saved to https://phabricator.wikimedia.org/P64024 and previous config saved to /var/cache/conftool/dbconfig/20240604-161210-marostegui.json
  • 16:11 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp7002.magru.wmnet
  • 16:10 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp7001.magru.wmnet
  • 16:10 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1013.eqiad.wmnet,service=s1
  • 16:10 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1013.eqiad.wmnet,service=s3
  • 16:09 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1013.eqiad.wmnet
  • 16:09 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1005.eqiad.wmnet
  • 16:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2051.codfw.wmnet
  • 16:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64023 and previous config saved to /var/cache/conftool/dbconfig/20240604-160735-root.json
  • 16:05 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
  • 16:05 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
  • 16:04 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
  • 16:04 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host phab2002.codfw.wmnet
  • 16:04 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
  • 16:02 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1005.eqiad.wmnet
  • 16:00 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1052.eqiad.wmnet
  • 16:00 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 15:59 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 15:58 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host phab2002.codfw.wmnet
  • 15:57 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1004.eqiad.wmnet
  • 15:57 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd1003.eqiad.wmnet
  • 15:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P64022 and previous config saved to /var/cache/conftool/dbconfig/20240604-155701-marostegui.json
  • 15:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Bumping db1194 weight', diff saved to https://phabricator.wikimedia.org/P64021 and previous config saved to /var/cache/conftool/dbconfig/20240604-155629-ladsgroup.json
  • 15:55 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1013.eqiad.wmnet
  • 15:53 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd1003.eqiad.wmnet
  • 15:53 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1052.eqiad.wmnet
  • 15:53 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1051.eqiad.wmnet
  • 15:52 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1004.eqiad.wmnet
  • 15:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64020 and previous config saved to /var/cache/conftool/dbconfig/20240604-155228-root.json
  • 15:52 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1003.eqiad.wmnet
  • 15:52 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1013.eqiad.wmnet,service=s3
  • 15:51 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd1002.eqiad.wmnet
  • 15:51 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1013.eqiad.wmnet,service=s1
  • 15:48 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host miscweb2003.codfw.wmnet
  • 15:47 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd1002.eqiad.wmnet
  • 15:47 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1003.eqiad.wmnet
  • 15:47 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
  • 15:47 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd1001.eqiad.wmnet
  • 15:46 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1051.eqiad.wmnet
  • 15:44 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host miscweb2003.codfw.wmnet
  • 15:43 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd1001.eqiad.wmnet
  • 15:43 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:43 elukey@cumin1002: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM aux-k8s-etcd1001.eqiad.wmnet
  • 15:42 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd1001.eqiad.wmnet
  • 15:42 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:42 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2001.codfw.wmnet
  • 15:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P64019 and previous config saved to /var/cache/conftool/dbconfig/20240604-154153-marostegui.json
  • 15:40 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_magru
  • 15:38 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts2001.codfw.wmnet
  • 15:37 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-ctrl1002.eqiad.wmnet
  • 15:37 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=kubernetes203(0|3|5).codfw.wmnet,cluster=kubernetes,service=kubesvc
  • 15:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64018 and previous config saved to /var/cache/conftool/dbconfig/20240604-153722-root.json
  • 15:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes[2030,2033,2035].codfw.wmnet
  • 15:36 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1002.eqiad.wmnet
  • 15:36 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for kubernetes[2030,2033,2035].codfw.wmnet
  • 15:36 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 15:34 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 15:31 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1002.eqiad.wmnet
  • 15:31 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-ctrl1002.eqiad.wmnet
  • 15:29 tchin@deploy1002: Finished deploy [airflow-dags/analytics_test@a279784]: (no justification provided) (duration: 00m 10s)
  • 15:29 tchin@deploy1002: Started deploy [airflow-dags/analytics_test@a279784]: (no justification provided)
  • 15:29 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:28 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:28 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1001.eqiad.wmnet
  • 15:27 tchin@deploy1002: Finished deploy [airflow-dags/analytics@a279784]: (no justification provided) (duration: 00m 27s)
  • 15:27 dcausse@deploy1002: Finished deploy [airflow-dags/search@a279784]: search: bump to discolytics 0.24 and name n-triples dumps (duration: 00m 27s)
  • 15:27 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 15:27 tchin@deploy1002: Started deploy [airflow-dags/analytics@a279784]: (no justification provided)
  • 15:27 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 15:27 dcausse@deploy1002: Started deploy [airflow-dags/search@a279784]: search: bump to discolytics 0.24 and name n-triples dumps
  • 15:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T364299)', diff saved to https://phabricator.wikimedia.org/P64017 and previous config saved to /var/cache/conftool/dbconfig/20240604-152644-marostegui.json
  • 15:25 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-ctrl1001.eqiad.wmnet
  • 15:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64015 and previous config saved to /var/cache/conftool/dbconfig/20240604-152216-root.json
  • 15:22 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1001.eqiad.wmnet
  • 15:21 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1001
  • 15:21 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1001
  • 15:19 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:19 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-ctrl1001.eqiad.wmnet
  • 15:18 elukey@cumin1002: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM aux-k8s-ctrl1001.eqiad.wmnet
  • 15:18 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-ctrl1001.eqiad.wmnet
  • 15:18 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 15:16 ejegg: fundraising civicrm upgraded from 44900b8c to 71ed6bed
  • 15:15 kamila@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:15 kamila@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1001 to a new rack - kamila@cumin1002"
  • 15:15 ejegg: payments-wiki upgraded from 0174d89c to c255fda8
  • 15:13 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 15:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 15:12 dancy@deploy1002: Installation of scap version "4.85.0" completed for 294 hosts
  • 15:11 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1001 to a new rack - kamila@cumin1002"
  • 15:11 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:11 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_magru
  • 15:11 dancy@deploy1002: Installing scap version "4.85.0" for 294 hosts
  • 15:11 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:09 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 15:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T352010)', diff saved to https://phabricator.wikimedia.org/P64014 and previous config saved to /var/cache/conftool/dbconfig/20240604-150835-ladsgroup.json
  • 15:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 15:08 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:08 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 15:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64013 and previous config saved to /var/cache/conftool/dbconfig/20240604-150710-root.json
  • 15:06 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp3066*} and A:cp
  • 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 15:04 brennen@deploy1002: Finished deploy [phabricator/deployment@ef680d8]: deploy phab1004 for T366605 (duration: 00m 32s)
  • 15:04 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 15:04 brennen@deploy1002: Started deploy [phabricator/deployment@ef680d8]: deploy phab1004 for T366605
  • 15:03 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phorge Update
  • 15:03 aokoth@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phorge Update
  • 15:03 brennen@deploy1002: Finished deploy [phabricator/deployment@ef680d8]: deploy phab2002 for T366605 (duration: 00m 33s)
  • 15:02 brennen@deploy1002: Started deploy [phabricator/deployment@ef680d8]: deploy phab2002 for T366605
  • 15:02 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phorge Update
  • 15:02 aokoth@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phorge Update
  • 14:57 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1001
  • 14:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 14:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 14:55 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1001
  • 14:55 sukhe@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp3066*} and A:cp
  • 14:53 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:52 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64012 and previous config saved to /var/cache/conftool/dbconfig/20240604-145203-root.json
  • 14:49 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kubernetes[2030,2033,2035].codfw.wmnet with reason: Hardware issue
  • 14:48 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp4045*} and A:cp
  • 14:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kubernetes[2030,2033,2035].codfw.wmnet with reason: Hardware issue
  • 14:48 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:46 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=kubernetes203(1|4).codfw.wmnet,cluster=kubernetes,service=kubesvc
  • 14:43 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 14:43 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 14:38 sukhe@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp4045*} and A:cp
  • 14:33 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs7003.magru.wmnet
  • 14:27 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs7003.magru.wmnet
  • 14:22 cgoubert@cumin1002: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-worker-codfw
  • 14:14 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl1001.eqiad.wmnet
  • 14:14 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:14 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 14:10 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 14:06 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 14:02 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs7002.magru.wmnet
  • 14:00 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl1001.eqiad.wmnet
  • 13:59 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs7002.magru.wmnet
  • 13:59 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl1001.eqiad.wmnet
  • 13:46 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
  • 13:42 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
  • 13:42 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
  • 13:37 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
  • 13:35 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
  • 13:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Bumping db1194 weight', diff saved to https://phabricator.wikimedia.org/P64009 and previous config saved to /var/cache/conftool/dbconfig/20240604-133250-ladsgroup.json
  • 13:29 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
  • 13:29 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
  • 13:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 13:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 13:24 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
  • 13:23 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2002.codfw.wmnet
  • 13:22 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 13:21 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 13:20 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 13:20 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 13:19 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 13:18 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 13:17 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2002.codfw.wmnet
  • 13:17 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2001.codfw.wmnet
  • 13:14 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs7001.magru.wmnet
  • 13:12 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2001.codfw.wmnet
  • 13:11 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_magru
  • 13:11 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_magru
  • 13:11 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs7001.magru.wmnet
  • 13:10 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2003.codfw.wmnet
  • 13:08 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2003.codfw.wmnet
  • 13:08 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2002.codfw.wmnet
  • 13:05 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2002.codfw.wmnet
  • 13:05 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2001.codfw.wmnet
  • 13:03 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2001.codfw.wmnet
  • 13:02 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1002.eqiad.wmnet
  • 13:00 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1002.eqiad.wmnet
  • 12:59 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1001.eqiad.wmnet
  • 12:57 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1001.eqiad.wmnet
  • 12:56 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
  • 12:53 brouberol@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org
  • 12:53 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
  • 12:52 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
  • 12:48 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
  • 12:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org
  • 12:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P64008 and previous config saved to /var/cache/conftool/dbconfig/20240604-124432-ladsgroup.json
  • 12:43 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
  • 12:39 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
  • 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1003.wikimedia.org
  • 12:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1003.wikimedia.org
  • 12:32 brouberol@cumin2002: START - Cookbook sre.wdqs.restart
  • 12:32 brouberol@cumin2002: END (ERROR) - Cookbook sre.wdqs.restart (exit_code=97)
  • 12:32 brouberol@cumin2002: START - Cookbook sre.wdqs.restart
  • 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-druid1001.eqiad.wmnet
  • 12:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P64007 and previous config saved to /var/cache/conftool/dbconfig/20240604-122924-ladsgroup.json
  • 12:29 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1002.eqiad.wmnet
  • 12:28 taavi@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database dtpwiki (T365229)
  • 12:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-test-druid1001.eqiad.wmnet
  • 12:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64006 and previous config saved to /var/cache/conftool/dbconfig/20240604-122602-root.json
  • 12:22 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog1002.eqiad.wmnet
  • 12:17 klausman@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:ml-cache-eqiad
  • 12:15 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 12:15 btullis@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 12:14 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 12:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P64005 and previous config saved to /var/cache/conftool/dbconfig/20240604-121415-ladsgroup.json
  • 12:14 btullis@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 12:12 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 12:12 btullis@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 12:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64004 and previous config saved to /var/cache/conftool/dbconfig/20240604-121056-root.json
  • 12:08 klausman@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:ml-cache-codfw
  • 12:02 taavi@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database dtpwiki (T365229)
  • 11:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P64003 and previous config saved to /var/cache/conftool/dbconfig/20240604-115907-ladsgroup.json
  • 11:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64002 and previous config saved to /var/cache/conftool/dbconfig/20240604-115549-root.json
  • 11:54 hnowlan: depooling 3 api appservers and 2 appservers in advance of reimaging
  • 11:50 klausman@cumin2002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:ml-cache-eqiad
  • 11:44 klausman@cumin2002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:ml-cache-codfw
  • 11:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2136 (T364299)', diff saved to https://phabricator.wikimedia.org/P64001 and previous config saved to /var/cache/conftool/dbconfig/20240604-114157-marostegui.json
  • 11:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 11:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 11:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64000 and previous config saved to /var/cache/conftool/dbconfig/20240604-114043-root.json
  • 11:39 cgoubert@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-codfw
  • 11:39 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling reboot on A:thanos-fe
  • 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
  • 11:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
  • 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
  • 11:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P63999 and previous config saved to /var/cache/conftool/dbconfig/20240604-112537-root.json
  • 11:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
  • 11:15 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2003-dev.codfw.wmnet
  • 11:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P63998 and previous config saved to /var/cache/conftool/dbconfig/20240604-111031-root.json
  • 11:06 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2003-dev.codfw.wmnet
  • 11:06 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2002-dev.codfw.wmnet
  • 11:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:04 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 11:00 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
  • 10:59 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
  • 10:59 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1002.eqiad.wmnet
  • 10:57 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2002-dev.codfw.wmnet
  • 10:57 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2001-dev.codfw.wmnet
  • 10:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P63996 and previous config saved to /var/cache/conftool/dbconfig/20240604-105525-root.json
  • 10:54 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1002.eqiad.wmnet
  • 10:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw1358.eqiad.wmnet with reason: Waiting on iDrac update
  • 10:53 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw1358.eqiad.wmnet with reason: Waiting on iDrac update
  • 10:50 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb1002.eqiad.wmnet
  • 10:50 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1001.eqiad.wmnet
  • 10:49 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2001-dev.codfw.wmnet
  • 10:48 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling reboot on A:thanos-fe
  • 10:46 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling reboot on P{ms-fe2*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 10:45 marostegui: dbmaint codfw s1 deploy schema change on db2203 T364299
  • 10:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2203.codfw.wmnet with reason: Long schema change
  • 10:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on db2203.codfw.wmnet with reason: Long schema change
  • 10:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2141.codfw.wmnet with reason: Long schema change
  • 10:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on db2141.codfw.wmnet with reason: Long schema change
  • 10:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2203 T366552', diff saved to https://phabricator.wikimedia.org/P63995 and previous config saved to /var/cache/conftool/dbconfig/20240604-104337-root.json
  • 10:42 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2212 to s1 primary T366552', diff saved to https://phabricator.wikimedia.org/P63994 and previous config saved to /var/cache/conftool/dbconfig/20240604-104241-root.json
  • 10:42 marostegui: Starting s1 codfw failover from db2203 to db2212 - T366552
  • 10:42 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb1001.eqiad.wmnet
  • 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2001.codfw.wmnet
  • 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dumps::generation::worker::dumper
  • 10:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS bookworm
  • 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2001.codfw.wmnet
  • 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
  • 10:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
  • 10:28 hashar@deploy1002: Finished deploy [releng/jenkins-deploy@5d3a06d] (releasing): (no justification provided) (duration: 01m 12s)
  • 10:27 hashar: Upgrading releases Jenkins instances # T366008
  • 10:27 hashar@deploy1002: Started deploy [releng/jenkins-deploy@5d3a06d] (releasing): (no justification provided)
  • 10:24 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dumps::generation::worker::dumper
  • 10:23 claime: Migrating votewiki to mw-on-k8s - T362323
  • 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
  • 10:20 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog1002.eqiad.wmnet
  • 10:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
  • 10:16 hashar: Upgrading CI Jenkins # T366008
  • 10:15 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1002.eqiad.wmnet
  • 10:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage
  • 10:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage
  • 10:10 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb2002-dev.codfw.wmnet
  • 10:09 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling reboot on P{ms-fe2*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 10:08 marostegui: dbmaint eqiad s1 deploy schema change on db1184 T364299
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dumps::generation::worker::dumper_monitor
  • 10:07 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb1002.eqiad.wmnet
  • 10:06 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1001.eqiad.wmnet
  • 10:04 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb2002-dev.codfw.wmnet
  • 10:04 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling reboot on P{ms-fe1*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 10:00 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2212 with weight 0 T366552', diff saved to https://phabricator.wikimedia.org/P63993 and previous config saved to /var/cache/conftool/dbconfig/20240604-100024-root.json
  • 10:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s1 T366552
  • 09:59 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s1 T366552
  • 09:58 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS bookworm
  • 09:58 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb1001.eqiad.wmnet
  • 09:57 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2003-dev.codfw.wmnet
  • 09:56 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin1001.eqiad.wmnet
  • 09:54 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2003-dev.codfw.wmnet
  • 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
  • 09:53 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet
  • 09:48 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
  • 09:48 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2003-dev.codfw.wmnet
  • 09:48 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw2003-dev.codfw.wmnet
  • 09:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
  • 09:45 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin2001.codfw.wmnet
  • 09:45 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2004.codfw.wmnet
  • 09:45 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2002-dev.codfw.wmnet
  • 09:44 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2002-dev.codfw.wmnet
  • 09:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3003.wikimedia.org
  • 09:42 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
  • 09:41 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcumin2001.codfw.wmnet
  • 09:40 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2008-dev.codfw.wmnet
  • 09:40 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog2002.codfw.wmnet
  • 09:39 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dumps::generation::worker::dumper_monitor
  • 09:38 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw2002-dev.codfw.wmnet
  • 09:37 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host graphite2004.codfw.wmnet
  • 09:37 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb1004.wikimedia.org
  • 09:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3003.wikimedia.org
  • 09:36 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2002-dev.codfw.wmnet
  • 09:36 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2001-dev.codfw.wmnet
  • 09:34 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2008-dev.codfw.wmnet
  • 09:34 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2007-dev.codfw.wmnet
  • 09:34 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host mwlog2002.codfw.wmnet
  • 09:33 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog1002.eqiad.wmnet
  • 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install4002.wikimedia.org
  • 09:30 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudweb1004.wikimedia.org
  • 09:29 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2003.wikimedia.org
  • 09:29 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb1003.wikimedia.org
  • 09:27 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2007-dev.codfw.wmnet
  • 09:27 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host mwlog1002.eqiad.wmnet
  • 09:27 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2001-dev.codfw.wmnet
  • 09:27 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling reboot on P{ms-fe1*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 09:27 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testhost2001.codfw.wmnet
  • 09:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install4002.wikimedia.org
  • 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install5002.wikimedia.org
  • 09:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe2001.codfw.wmnet
  • 09:22 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudweb1003.wikimedia.org
  • 09:22 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host gitlab2003.wikimedia.org
  • 09:21 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb2002-dev.wikimedia.org
  • 09:21 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1003.wikimedia.org
  • 09:21 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host testhost2001.codfw.wmnet
  • 09:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install5002.wikimedia.org
  • 09:17 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-fe2001.codfw.wmnet
  • 09:15 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe1001.eqiad.wmnet
  • 09:15 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host gitlab1003.wikimedia.org
  • 09:15 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudweb2002-dev.wikimedia.org
  • 09:14 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org
  • 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6002.wikimedia.org
  • 09:09 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-fe1001.eqiad.wmnet
  • 09:08 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org
  • 09:08 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
  • 09:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6002.wikimedia.org
  • 09:01 moritzm: imported python3-xapian-haystack 2.1.1-1+deb12u1 to bookworm-wikimedia (already lined up for the next Bookworm point release to address https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1066136 and needed for the update of the Mailman servers T331706
  • 08:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7001.wikimedia.org
  • 08:54 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 08:52 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 08:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T364069)', diff saved to https://phabricator.wikimedia.org/P63992 and previous config saved to /var/cache/conftool/dbconfig/20240604-085205-marostegui.json
  • 08:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 08:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7001.wikimedia.org
  • 08:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 08:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364069)', diff saved to https://phabricator.wikimedia.org/P63991 and previous config saved to /var/cache/conftool/dbconfig/20240604-085141-marostegui.json
  • 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1003.wikimedia.org
  • 08:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1003.wikimedia.org
  • 08:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1156', diff saved to https://phabricator.wikimedia.org/P63990 and previous config saved to /var/cache/conftool/dbconfig/20240604-084428-root.json
  • 08:40 kostajh: UTC morning deploys done
  • 08:38 kharlan@deploy1002: Finished scap: Backport for IPReputationHooks: Bump schema version (T354597) (duration: 15m 45s)
  • 08:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P63989 and previous config saved to /var/cache/conftool/dbconfig/20240604-083633-marostegui.json
  • 08:19 kharlan@deploy1002: Finished scap: Backport for IPReputationHooks: Bump schema version (T354597) (duration: 14m 08s)
  • 08:10 kharlan@deploy1002: kharlan: Continuing with sync
  • 08:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P63986 and previous config saved to /var/cache/conftool/dbconfig/20240604-080846-marostegui.json
  • 08:08 kharlan@deploy1002: kharlan: Backport for IPReputationHooks: Bump schema version (T354597) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364069)', diff saved to https://phabricator.wikimedia.org/P63985 and previous config saved to /var/cache/conftool/dbconfig/20240604-080617-marostegui.json
  • 08:06 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf2002.codfw.wmnet with reason: host reimage
  • 08:05 kharlan@deploy1002: Started scap: Backport for IPReputationHooks: Bump schema version (T354597)
  • 08:02 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
  • 08:01 jiji@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf2002.codfw.wmnet with reason: host reimage
  • 07:57 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
  • 07:56 hashar: Restarting Gerrit for Java 17 upgrade # T364342
  • 07:56 hashar@deploy1002: Finished deploy [gerrit/gerrit@6ba3f2e]: gerrit1003: switch to Java 17 version of plugins after having switched Java to 17- T364342 (duration: 00m 03s)
  • 07:56 hashar@deploy1002: Started deploy [gerrit/gerrit@6ba3f2e]: gerrit1003: switch to Java 17 version of plugins after having switched Java to 17- T364342
  • 07:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P63984 and previous config saved to /var/cache/conftool/dbconfig/20240604-075338-marostegui.json
  • 07:47 hashar@deploy1002: Finished deploy [gerrit/gerrit@6ba3f2e]: gerrit2002: switch to Java 17 version of plugins after having switched Java to 17- T364342 (duration: 00m 05s)
  • 07:46 hashar@deploy1002: Started deploy [gerrit/gerrit@6ba3f2e]: gerrit2002: switch to Java 17 version of plugins after having switched Java to 17- T364342
  • 07:42 jiji@cumin2002: START - Cookbook sre.hosts.reimage for host mc-wf2002.codfw.wmnet with OS bookworm
  • 07:42 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc-wf1002.eqiad.wmnet with OS bookworm
  • 07:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T364299)', diff saved to https://phabricator.wikimedia.org/P63983 and previous config saved to /var/cache/conftool/dbconfig/20240604-073830-marostegui.json
  • 07:27 marostegui: dbmaint eqiad s1 deploy schema change on db1184 T356166
  • 07:15 moritzm: installing intel-microcode updates on bullseye
  • 07:10 marostegui: dbmaint eqiad s1 deploy schema change on db1184 T355609
  • 07:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 07:06 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 07:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1184.eqiad.wmnet with OS bookworm
  • 06:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1184.eqiad.wmnet with reason: host reimage
  • 06:40 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1184.eqiad.wmnet with reason: host reimage
  • 06:26 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db1184.eqiad.wmnet with OS bookworm
  • 06:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db1184.eqiad.wmnet with reason: reimage
  • 06:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db1184.eqiad.wmnet with reason: reimage
  • 06:14 marostegui: Rename table flaggedpage_pending on db1185 (s5 eqiad dbmaint) - T365568
  • 06:09 arnaudb@cumin1002: dbctl commit (dc=all): ' fix api db1163 vs db1184 T366259', diff saved to https://phabricator.wikimedia.org/P63982 and previous config saved to /var/cache/conftool/dbconfig/20240604-060925-arnaudb.json
  • 06:07 arnaudb@cumin1002: dbctl commit (dc=all): 'API db1163 T366259', diff saved to https://phabricator.wikimedia.org/P63981 and previous config saved to /var/cache/conftool/dbconfig/20240604-060747-arnaudb.json
  • 06:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1184 T366259', diff saved to https://phabricator.wikimedia.org/P63980 and previous config saved to /var/cache/conftool/dbconfig/20240604-060703-arnaudb.json
  • 06:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db1163 to s1 primary and set section read-write T366259', diff saved to https://phabricator.wikimedia.org/P63979 and previous config saved to /var/cache/conftool/dbconfig/20240604-060324-arnaudb.json
  • 06:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Set s1 eqiad as read-only for maintenance - T366259', diff saved to https://phabricator.wikimedia.org/P63978 and previous config saved to /var/cache/conftool/dbconfig/20240604-060208-arnaudb.json
  • 06:01 arnaudb: Starting s1 eqiad failover from db1184 to db1163 - T366259
  • 05:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db1163 with weight 0 T366259', diff saved to https://phabricator.wikimedia.org/P63977 and previous config saved to /var/cache/conftool/dbconfig/20240604-052803-arnaudb.json
  • 05:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s1 T366259
  • 05:27 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s1 T366259
  • 04:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P63976 and previous config saved to /var/cache/conftool/dbconfig/20240604-042011-ladsgroup.json
  • 04:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 04:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.5 (duration: 00m 57s)
  • 03:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T364299)', diff saved to https://phabricator.wikimedia.org/P63975 and previous config saved to /var/cache/conftool/dbconfig/20240604-035703-marostegui.json
  • 03:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 03:56 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.8 refs T361402 (duration: 53m 47s)
  • 03:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 03:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T364299)', diff saved to https://phabricator.wikimedia.org/P63974 and previous config saved to /var/cache/conftool/dbconfig/20240604-035640-marostegui.json
  • 03:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P63973 and previous config saved to /var/cache/conftool/dbconfig/20240604-034132-marostegui.json
  • 03:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P63972 and previous config saved to /var/cache/conftool/dbconfig/20240604-032625-marostegui.json
  • 03:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T364299)', diff saved to https://phabricator.wikimedia.org/P63971 and previous config saved to /var/cache/conftool/dbconfig/20240604-031117-marostegui.json
  • 03:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T364299)', diff saved to https://phabricator.wikimedia.org/P63970 and previous config saved to /var/cache/conftool/dbconfig/20240604-030906-marostegui.json
  • 03:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 03:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 03:03 mwpresync@deploy1002: Started scap: testwikis wikis to 1.43.0-wmf.8 refs T361402
  • 00:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 00:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 00:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T352010)', diff saved to https://phabricator.wikimedia.org/P63969 and previous config saved to /var/cache/conftool/dbconfig/20240604-002119-ladsgroup.json
  • 00:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P63968 and previous config saved to /var/cache/conftool/dbconfig/20240604-000612-ladsgroup.json

2024-06-03

  • 23:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P63967 and previous config saved to /var/cache/conftool/dbconfig/20240603-235104-ladsgroup.json
  • 23:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T352010)', diff saved to https://phabricator.wikimedia.org/P63966 and previous config saved to /var/cache/conftool/dbconfig/20240603-233555-ladsgroup.json
  • 23:14 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki mediawikiwiki "Extension:DynamicPageList (Wikimedia)" "Extension:DynamicPageList" "Zabe" --reason "per request T366488"
  • 23:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 23:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 23:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T364299)', diff saved to https://phabricator.wikimedia.org/P63965 and previous config saved to /var/cache/conftool/dbconfig/20240603-231424-marostegui.json
  • 22:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P63963 and previous config saved to /var/cache/conftool/dbconfig/20240603-225916-marostegui.json
  • 22:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P63962 and previous config saved to /var/cache/conftool/dbconfig/20240603-224408-marostegui.json
  • 22:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T364299)', diff saved to https://phabricator.wikimedia.org/P63961 and previous config saved to /var/cache/conftool/dbconfig/20240603-222900-marostegui.json
  • 22:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T364069)', diff saved to https://phabricator.wikimedia.org/P63960 and previous config saved to /var/cache/conftool/dbconfig/20240603-222607-marostegui.json
  • 22:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 22:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 22:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 22:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 22:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T364069)', diff saved to https://phabricator.wikimedia.org/P63959 and previous config saved to /var/cache/conftool/dbconfig/20240603-222524-marostegui.json
  • 22:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P63958 and previous config saved to /var/cache/conftool/dbconfig/20240603-221016-marostegui.json
  • 21:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P63957 and previous config saved to /var/cache/conftool/dbconfig/20240603-215508-marostegui.json
  • 21:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T364069)', diff saved to https://phabricator.wikimedia.org/P63956 and previous config saved to /var/cache/conftool/dbconfig/20240603-214000-marostegui.json
  • 21:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 21:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 21:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P63955 and previous config saved to /var/cache/conftool/dbconfig/20240603-212040-ladsgroup.json
  • 21:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P63954 and previous config saved to /var/cache/conftool/dbconfig/20240603-211312-root.json
  • 21:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P63953 and previous config saved to /var/cache/conftool/dbconfig/20240603-210532-ladsgroup.json
  • 20:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P63952 and previous config saved to /var/cache/conftool/dbconfig/20240603-205806-root.json
  • 20:51 urbanecm@deploy1002: Finished scap: Backport for Wrap tables in Vector 2022 for projects where legacy Vector is default (T366314), Enable night theme on pages which have no color contrast issues (T366370) (duration: 14m 57s)
  • 20:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P63951 and previous config saved to /var/cache/conftool/dbconfig/20240603-205024-ladsgroup.json
  • 20:43 urbanecm@deploy1002: jdlrobson and urbanecm: Continuing with sync
  • 20:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P63950 and previous config saved to /var/cache/conftool/dbconfig/20240603-204300-root.json
  • 20:39 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Wrap tables in Vector 2022 for projects where legacy Vector is default (T366314), Enable night theme on pages which have no color contrast issues (T366370) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:36 urbanecm@deploy1002: Started scap: Backport for Wrap tables in Vector 2022 for projects where legacy Vector is default (T366314), Enable night theme on pages which have no color contrast issues (T366370)
  • 20:36 urbanecm@deploy1002: Finished scap: Backport for EventLogging: Enable IP reputation logging (T354597), [trwiki] Allow translator group to publish translation only in Extension:ContentTranslation, [trwiki] Reducing count edits ip and newbie per minute (T330811) (duration: 30m 14s)
  • 20:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P63949 and previous config saved to /var/cache/conftool/dbconfig/20240603-203514-ladsgroup.json
  • 20:27 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P63948 and previous config saved to /var/cache/conftool/dbconfig/20240603-202754-root.json
  • 20:27 urbanecm@deploy1002: kharlan and urbanecm and gergesshamon: Continuing with sync
  • 20:12 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P63947 and previous config saved to /var/cache/conftool/dbconfig/20240603-201248-root.json
  • 20:10 urbanecm@deploy1002: kharlan and urbanecm and gergesshamon: Backport for EventLogging: Enable IP reputation logging (T354597), [trwiki] Allow translator group to publish translation only in Extension:ContentTranslation, [trwiki] Reducing count edits ip and newbie per minute (T330811) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:06 urbanecm@deploy1002: Started scap: Backport for EventLogging: Enable IP reputation logging (T354597), [trwiki] Allow translator group to publish translation only in Extension:ContentTranslation, [trwiki] Reducing count edits ip and newbie per minute (T330811)
  • 19:57 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P63946 and previous config saved to /var/cache/conftool/dbconfig/20240603-195742-root.json
  • 19:42 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P63945 and previous config saved to /var/cache/conftool/dbconfig/20240603-194236-root.json
  • 18:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T364299)', diff saved to https://phabricator.wikimedia.org/P63944 and previous config saved to /var/cache/conftool/dbconfig/20240603-183029-marostegui.json
  • 18:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 18:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 18:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T364299)', diff saved to https://phabricator.wikimedia.org/P63943 and previous config saved to /var/cache/conftool/dbconfig/20240603-183006-marostegui.json
  • 18:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P63942 and previous config saved to /var/cache/conftool/dbconfig/20240603-181459-marostegui.json
  • 17:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P63941 and previous config saved to /var/cache/conftool/dbconfig/20240603-175951-marostegui.json
  • 17:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T364299)', diff saved to https://phabricator.wikimedia.org/P63940 and previous config saved to /var/cache/conftool/dbconfig/20240603-174442-marostegui.json
  • 17:27 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1002.eqiad.wmnet|wikikube-worker1003.eqiad.wmnet|wikikube-worker1007.eqiad.wmnet|wikikube-worker1004.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 17:27 claime: Pooling and uncordoning wikikube-worker1002.eqiad.wmnet,wikikube-worker1003.eqiad.wmnet,wikikube-worker1007.eqiad.wmnet,wikikube-worker1004.eqiad.wmnet - T351074
  • 17:19 claime: homer 'cr*eqiad*' commit 'T351074'
  • 17:18 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 17:17 claime: homer 'lsw1-e2-eqiad*' commit 'T351074'
  • 17:17 claime: homer 'lsw1-e2-eqiad*' commit 'T35107
  • 17:17 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 17:17 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 17:16 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 17:15 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 17:14 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 16:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1007.eqiad.wmnet with OS bullseye
  • 16:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1007.eqiad.wmnet with reason: host reimage
  • 16:33 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1007.eqiad.wmnet with reason: host reimage
  • 16:20 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1007.eqiad.wmnet with OS bullseye
  • 16:18 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker1007.eqiad.wmnet with OS bullseye
  • 16:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1003.eqiad.wmnet with OS bullseye
  • 15:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1004.eqiad.wmnet with OS bullseye
  • 15:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1002.eqiad.wmnet with OS bullseye
  • 15:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2212', diff saved to https://phabricator.wikimedia.org/P63939 and previous config saved to /var/cache/conftool/dbconfig/20240603-155048-root.json
  • 15:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1003.eqiad.wmnet with reason: host reimage
  • 15:43 hashar@deploy1002: Finished deploy [gerrit/gerrit@c93e47d]: Revert "Rebuild plugins for Java 17" to stick to Java 11 based compiled plugins - T364342 (duration: 00m 05s)
  • 15:43 hashar@deploy1002: Started deploy [gerrit/gerrit@c93e47d]: Revert "Rebuild plugins for Java 17" to stick to Java 11 based compiled plugins - T364342
  • 15:42 jhathaway: deploying more restrictive SPF & DMARC settings for wikipedia.org
  • 15:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1004.eqiad.wmnet with reason: host reimage
  • 15:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1002.eqiad.wmnet with reason: host reimage
  • 15:36 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1004.eqiad.wmnet with reason: host reimage
  • 15:36 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-c2-codfw.mgmt.codfw.wmnet
  • 15:35 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1003.eqiad.wmnet with reason: host reimage
  • 15:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1002.eqiad.wmnet with reason: host reimage
  • 15:30 dancy@deploy1002: sync-world aborted: testing (duration: 00m 00s)
  • 15:30 dancy@deploy1002: Started scap: testing
  • 15:27 dancy@mwmaint1002: scap failed: FileNotFoundError [Errno 2] No such file or directory: '/etc/helmfile-defaults/mediawiki-deployments.yaml' (duration: 00m 00s)
  • 15:23 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1007.eqiad.wmnet with OS bullseye
  • 15:23 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1004.eqiad.wmnet with OS bullseye
  • 15:22 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1003.eqiad.wmnet with OS bullseye
  • 15:21 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1002.eqiad.wmnet with OS bullseye
  • 15:04 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:04 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-c2-codfw - pt1979@cumin2002"
  • 15:03 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-c2-codfw - pt1979@cumin2002"
  • 15:03 dancy@deploy1002: Installing scap version "4.84.0" for 297 hosts
  • 15:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1490 to wikikube-worker1007
  • 15:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1007
  • 15:00 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:00 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-c2-codfw.mgmt.codfw.wmnet
  • 15:00 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1007
  • 15:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1490 to wikikube-worker1007 - cgoubert@cumin1002"
  • 14:57 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1490 to wikikube-worker1007 - cgoubert@cumin1002"
  • 14:57 hashar@deploy1002: Finished deploy [gerrit/gerrit@6ba3f2e]: Rebuild plugins for Java 17 - T364342 (duration: 00m 05s)
  • 14:57 hashar@deploy1002: Started deploy [gerrit/gerrit@6ba3f2e]: Rebuild plugins for Java 17 - T364342
  • 14:55 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:55 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1490 to wikikube-worker1007
  • 14:54 hashar@deploy1002: Finished deploy [gerrit/gerrit@6ba3f2e]: Rebuild plugins for Java 17 - T364342 (duration: 00m 08s)
  • 14:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1443 to wikikube-worker1004
  • 14:54 hashar@deploy1002: Started deploy [gerrit/gerrit@6ba3f2e]: Rebuild plugins for Java 17 - T364342
  • 14:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1004
  • 14:53 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1004
  • 14:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1443 to wikikube-worker1004 - cgoubert@cumin1002"
  • 14:53 hashar@deploy1002: Finished deploy [gerrit/gerrit@c93e47d]: Rebuild plugins for Java 17 - T364342 (duration: 00m 05s)
  • 14:53 hashar@deploy1002: Started deploy [gerrit/gerrit@c93e47d]: Rebuild plugins for Java 17 - T364342
  • 14:52 Dreamy_Jazz: Afternoon UTC backport window done
  • 14:52 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1443 to wikikube-worker1004 - cgoubert@cumin1002"
  • 14:51 dreamyjazz@deploy1002: Finished scap: Backport for Ensure excluded SHA-1s have numeric keys for scanFilesInScanTable.php (T366473) (duration: 12m 04s)
  • 14:45 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:45 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1443 to wikikube-worker1004
  • 14:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1427 to wikikube-worker1003
  • 14:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1003
  • 14:43 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
  • 14:42 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1003
  • 14:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1427 to wikikube-worker1003 - cgoubert@cumin1002"
  • 14:41 dreamyjazz@deploy1002: dreamyjazz: Backport for Ensure excluded SHA-1s have numeric keys for scanFilesInScanTable.php (T366473) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:41 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1427 to wikikube-worker1003 - cgoubert@cumin1002"
  • 14:39 dreamyjazz@deploy1002: Started scap: Backport for Ensure excluded SHA-1s have numeric keys for scanFilesInScanTable.php (T366473)
  • 14:39 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:38 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1427 to wikikube-worker1003
  • 14:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1426 to wikikube-worker1002
  • 14:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1002
  • 14:37 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1002
  • 14:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1426 to wikikube-worker1002 - cgoubert@cumin1002"
  • 14:35 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1426 to wikikube-worker1002 - cgoubert@cumin1002"
  • 14:34 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
  • 14:33 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
  • 14:33 vgutierrez: repool text@ulsfo with IPIP encapsulation enabled - T366466
  • 14:31 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host snapshot1012.eqiad.wmnet with OS bullseye
  • 14:31 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
  • 14:31 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
  • 14:30 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
  • 14:30 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
  • 14:30 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf2001.codfw.wmnet with OS bookworm
  • 14:29 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:29 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1426 to wikikube-worker1002
  • 14:28 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host snapshot1010.eqiad.wmnet with OS bullseye
  • 14:25 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1001.eqiad.wmnet with OS bookworm
  • 14:24 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from mw1358 to wikikube-worker1001
  • 14:24 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1358 to wikikube-worker1001
  • 14:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:12 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf2001.codfw.wmnet with reason: host reimage
  • 14:09 jiji@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf2001.codfw.wmnet with reason: host reimage
  • 14:08 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
  • 14:05 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1012.eqiad.wmnet with reason: host reimage
  • 14:05 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
  • 14:02 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1010.eqiad.wmnet with reason: host reimage
  • 14:01 tgr@deploy1002: Finished scap: Backport for [trwiki] Create translator group (T356440) (duration: 23m 15s)
  • 13:59 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1012.eqiad.wmnet with reason: host reimage
  • 13:59 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1010.eqiad.wmnet with reason: host reimage
  • 13:58 vgutierrez: rolling restart of pybal on lvs4010 and lvs4008 - T366466
  • 13:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P63937 and previous config saved to /var/cache/conftool/dbconfig/20240603-135634-ladsgroup.json
  • 13:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 13:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 13:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T352010)', diff saved to https://phabricator.wikimedia.org/P63936 and previous config saved to /var/cache/conftool/dbconfig/20240603-135612-ladsgroup.json
  • 13:54 vgutierrez: re-enable puppet on "A:cp-text_ulsfo" - T366466
  • 13:50 jiji@cumin2002: START - Cookbook sre.hosts.reimage for host mc-wf2001.codfw.wmnet with OS bookworm
  • 13:50 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc-wf1001.eqiad.wmnet with OS bookworm
  • 13:49 vgutierrez: re-enable puppet on "A:cp-text and not A:cp-text_ulsfo" - T366466
  • 13:46 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host snapshot1012.eqiad.wmnet with OS bullseye
  • 13:46 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host snapshot1010.eqiad.wmnet with OS bullseye
  • 13:44 tgr@deploy1002: gergesshamon and tgr: Continuing with sync
  • 13:41 vgutierrez: disable puppet on A:cp-text before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1038294/ - T366466
  • 13:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P63935 and previous config saved to /var/cache/conftool/dbconfig/20240603-134104-ladsgroup.json
  • 13:40 tgr@deploy1002: gergesshamon and tgr: Backport for [trwiki] Create translator group (T356440) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:38 tgr@deploy1002: Started scap: Backport for [trwiki] Create translator group (T356440)
  • 13:36 vgutierrez: depool text@ulsfo before enabling IPIP encapsulation - T366466
  • 13:32 tgr@deploy1002: Finished scap: Backport for [Beta] cswiki: enable CommunityConfiguration for GrowthExperiments (T364892), [multiversion] Add 'manage-dblist init-labs' subcommand, [arwiki] add ipblock-exempt to bot group (T366404) (duration: 19m 07s)
  • 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P63934 and previous config saved to /var/cache/conftool/dbconfig/20240603-132556-ladsgroup.json
  • 13:23 tgr@deploy1002: sgimeno and gergesshamon and tgr: Continuing with sync
  • 13:20 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1001.eqiad.wmnet with OS bookworm
  • 13:16 tgr@deploy1002: sgimeno and gergesshamon and tgr: Backport for [Beta] cswiki: enable CommunityConfiguration for GrowthExperiments (T364892), [multiversion] Add 'manage-dblist init-labs' subcommand, [arwiki] add ipblock-exempt to bot group (T366404) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:13 tgr@deploy1002: Started scap: Backport for [Beta] cswiki: enable CommunityConfiguration for GrowthExperiments (T364892), [multiversion] Add 'manage-dblist init-labs' subcommand, [arwiki] add ipblock-exempt to bot group (T366404)
  • 13:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
  • 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T352010)', diff saved to https://phabricator.wikimedia.org/P63933 and previous config saved to /var/cache/conftool/dbconfig/20240603-131048-ladsgroup.json
  • 13:08 moritzm: uploaded intel-microcode 3.20240312.1~deb11u1 to apt.wikimedia.org (import from bullseye-proposed-updates, to be coupled with forthcoming reboots)
  • 13:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
  • 13:03 Emperor: depool moss-fe2001 with a view to returning it to apus T279621
  • 13:02 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1001.eqiad.wmnet with reason: host reimage
  • 13:02 Emperor: depool moss-fe1001 with a view to returning it to apus T279621
  • 13:00 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1001.eqiad.wmnet with reason: host reimage
  • 12:55 Emperor: depool/restart swift-proxy/repool ms-fe10{09,11,12,14} due to rising connection failures T360913
  • 12:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 12:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T364299)', diff saved to https://phabricator.wikimedia.org/P63932 and previous config saved to /var/cache/conftool/dbconfig/20240603-124628-marostegui.json
  • 12:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 12:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 12:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T364299)', diff saved to https://phabricator.wikimedia.org/P63931 and previous config saved to /var/cache/conftool/dbconfig/20240603-124605-marostegui.json
  • 12:45 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1001.eqiad.wmnet with OS bookworm
  • 12:41 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1002.eqiad.wmnet with OS bookworm
  • 12:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 12:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P63930 and previous config saved to /var/cache/conftool/dbconfig/20240603-123057-marostegui.json
  • 12:24 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1002.eqiad.wmnet with reason: host reimage
  • 12:20 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1002.eqiad.wmnet with reason: host reimage
  • 12:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P63929 and previous config saved to /var/cache/conftool/dbconfig/20240603-121549-marostegui.json
  • 12:06 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1002.eqiad.wmnet with OS bookworm
  • 12:03 ladsgroup@deploy1002: Finished scap: Backport for Enable numeric sorting for Persian (T329440) (duration: 12m 07s)
  • 12:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T364299)', diff saved to https://phabricator.wikimedia.org/P63928 and previous config saved to /var/cache/conftool/dbconfig/20240603-120041-marostegui.json
  • 11:54 ladsgroup@deploy1002: ebrahim and ladsgroup: Continuing with sync
  • 11:53 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on backup2011.codfw.wmnet with reason: remount filesystem
  • 11:53 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on backup2011.codfw.wmnet with reason: remount filesystem
  • 11:53 ladsgroup@deploy1002: ebrahim and ladsgroup: Backport for Enable numeric sorting for Persian (T329440) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:51 ladsgroup@deploy1002: Started scap: Backport for Enable numeric sorting for Persian (T329440)
  • 11:35 effie: restart memcached on mc1050 and mc2050
  • 11:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T352010)', diff saved to https://phabricator.wikimedia.org/P63927 and previous config saved to /var/cache/conftool/dbconfig/20240603-113447-ladsgroup.json
  • 11:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 11:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 11:27 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup2011.codfw.wmnet with reason: remount filesystem
  • 11:26 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup2011.codfw.wmnet with reason: remount filesystem
  • 11:24 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1037.eqiad.wmnet with OS bookworm
  • 11:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host snapshot1013.eqiad.wmnet
  • 11:07 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1037.eqiad.wmnet with reason: host reimage
  • 11:04 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1037.eqiad.wmnet with reason: host reimage
  • 10:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T364069)', diff saved to https://phabricator.wikimedia.org/P63926 and previous config saved to /var/cache/conftool/dbconfig/20240603-105416-marostegui.json
  • 10:54 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host snapshot1013.eqiad.wmnet
  • 10:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 10:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 10:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T364069)', diff saved to https://phabricator.wikimedia.org/P63925 and previous config saved to /var/cache/conftool/dbconfig/20240603-105352-marostegui.json
  • 10:50 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc1037.eqiad.wmnet with OS bookworm
  • 10:41 moritzm: installing linux 5.10.218 security updates
  • 10:40 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1038.eqiad.wmnet with OS bookworm
  • 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P63924 and previous config saved to /var/cache/conftool/dbconfig/20240603-103844-marostegui.json
  • 10:29 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host snapshot1013.eqiad.wmnet with OS bullseye
  • 10:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P63923 and previous config saved to /var/cache/conftool/dbconfig/20240603-102335-marostegui.json
  • 10:21 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
  • 10:18 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
  • 10:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T364069)', diff saved to https://phabricator.wikimedia.org/P63922 and previous config saved to /var/cache/conftool/dbconfig/20240603-100827-marostegui.json
  • 10:03 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc1038.eqiad.wmnet with OS bookworm
  • 10:02 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1013.eqiad.wmnet with reason: host reimage
  • 09:58 ladsgroup@deploy1002: Finished scap: Backport for Stop writing to the old pagelinks columns in s8 (T352010) (duration: 18m 39s)
  • 09:57 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 09:56 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1013.eqiad.wmnet with reason: host reimage
  • 09:49 jiji@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host mc-gp2001.codfw.wmnet with OS bookworm
  • 09:45 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 09:43 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host snapshot1013.eqiad.wmnet with OS bullseye
  • 09:42 ladsgroup@deploy1002: ladsgroup: Backport for Stop writing to the old pagelinks columns in s8 (T352010) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:41 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1039.eqiad.wmnet with OS bookworm
  • 09:40 ladsgroup@deploy1002: Started scap: Backport for Stop writing to the old pagelinks columns in s8 (T352010)
  • 09:31 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2001.codfw.wmnet with reason: host reimage
  • 09:29 jiji@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2001.codfw.wmnet with reason: host reimage
  • 09:25 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1039.eqiad.wmnet with reason: host reimage
  • 09:22 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1039.eqiad.wmnet with reason: host reimage
  • 09:10 jiji@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2001.codfw.wmnet with OS bookworm
  • 09:10 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc1039.eqiad.wmnet with OS bookworm
  • 09:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:08 jiji@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc1039.eqiad.wmnet']
  • 08:49 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2002.codfw.wmnet with OS bookworm
  • 08:45 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1003.eqiad.wmnet with OS bookworm
  • 08:15 hashar@deploy1002: Finished deploy [gerrit/gerrit@c93e47d]: Revert Gerrit back to 3.8.6 - T354887 (duration: 00m 05s)
  • 08:15 hashar@deploy1002: Started deploy [gerrit/gerrit@c93e47d]: Revert Gerrit back to 3.8.6 - T354887
  • 08:10 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1003.eqiad.wmnet with OS bookworm
  • 08:09 jiji@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2002.codfw.wmnet with OS bookworm
  • 08:08 hashar@deploy1002: Finished deploy [gerrit/gerrit@7838134]: Gerrit to v3.9.5 on gerrit1003 - T354887 (duration: 00m 05s)
  • 08:08 hashar@deploy1002: Started deploy [gerrit/gerrit@7838134]: Gerrit to v3.9.5 on gerrit1003 - T354887
  • 08:08 hashar@deploy1002: Finished deploy [gerrit/gerrit@7838134]: Gerrit to v3.9.5 on gerrit2002 - T354887 (duration: 00m 08s)
  • 08:08 hashar@deploy1002: Started deploy [gerrit/gerrit@7838134]: Gerrit to v3.9.5 on gerrit2002 - T354887
  • 08:04 jiji@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc1039.eqiad.wmnet']
  • 07:32 kartik@deploy1002: Finished scap: Backport for testwiki: Fix language for nan in Section Translation (duration: 28m 37s)
  • 07:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T364299)', diff saved to https://phabricator.wikimedia.org/P63920 and previous config saved to /var/cache/conftool/dbconfig/20240603-072513-marostegui.json
  • 07:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 07:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T364299)', diff saved to https://phabricator.wikimedia.org/P63919 and previous config saved to /var/cache/conftool/dbconfig/20240603-072450-marostegui.json
  • 07:22 kartik@deploy1002: kartik: Continuing with sync
  • 07:18 kartik@deploy1002: kartik: Backport for testwiki: Fix language for nan in Section Translation synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P63918 and previous config saved to /var/cache/conftool/dbconfig/20240603-070942-marostegui.json
  • 07:04 kartik@deploy1002: Started scap: Backport for testwiki: Fix language for nan in Section Translation
  • 06:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P63917 and previous config saved to /var/cache/conftool/dbconfig/20240603-065434-marostegui.json
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T364299)', diff saved to https://phabricator.wikimedia.org/P63916 and previous config saved to /var/cache/conftool/dbconfig/20240603-063925-marostegui.json
  • 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T364299)', diff saved to https://phabricator.wikimedia.org/P63915 and previous config saved to /var/cache/conftool/dbconfig/20240603-063814-marostegui.json
  • 06:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 06:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 06:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 06:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 06:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T364299)', diff saved to https://phabricator.wikimedia.org/P63914 and previous config saved to /var/cache/conftool/dbconfig/20240603-063735-marostegui.json
  • 06:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P63913 and previous config saved to /var/cache/conftool/dbconfig/20240603-062227-marostegui.json
  • 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 100%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63912 and previous config saved to /var/cache/conftool/dbconfig/20240603-061956-root.json
  • 06:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P63911 and previous config saved to /var/cache/conftool/dbconfig/20240603-060719-marostegui.json
  • 06:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 75%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63910 and previous config saved to /var/cache/conftool/dbconfig/20240603-060450-root.json
  • 05:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T364299)', diff saved to https://phabricator.wikimedia.org/P63909 and previous config saved to /var/cache/conftool/dbconfig/20240603-055210-marostegui.json
  • 05:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 50%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63908 and previous config saved to /var/cache/conftool/dbconfig/20240603-054944-root.json
  • 05:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 25%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63907 and previous config saved to /var/cache/conftool/dbconfig/20240603-053438-root.json
  • 05:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 10%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63906 and previous config saved to /var/cache/conftool/dbconfig/20240603-051932-root.json
  • 05:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 5%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63905 and previous config saved to /var/cache/conftool/dbconfig/20240603-050424-root.json
  • 04:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 1%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63904 and previous config saved to /var/cache/conftool/dbconfig/20240603-044918-root.json
  • 04:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:18 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T364299)', diff saved to https://phabricator.wikimedia.org/P63903 and previous config saved to /var/cache/conftool/dbconfig/20240603-011839-marostegui.json
  • 01:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 01:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 01:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T364299)', diff saved to https://phabricator.wikimedia.org/P63902 and previous config saved to /var/cache/conftool/dbconfig/20240603-011813-marostegui.json
  • 01:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 01:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 01:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P63901 and previous config saved to /var/cache/conftool/dbconfig/20240603-010925-ladsgroup.json
  • 01:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P63900 and previous config saved to /var/cache/conftool/dbconfig/20240603-010305-marostegui.json
  • 01:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P63899 and previous config saved to /var/cache/conftool/dbconfig/20240603-005415-ladsgroup.json
  • 00:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P63898 and previous config saved to /var/cache/conftool/dbconfig/20240603-004757-marostegui.json
  • 00:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P63897 and previous config saved to /var/cache/conftool/dbconfig/20240603-003907-ladsgroup.json
  • 00:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T364299)', diff saved to https://phabricator.wikimedia.org/P63896 and previous config saved to /var/cache/conftool/dbconfig/20240603-003247-marostegui.json
  • 00:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P63895 and previous config saved to /var/cache/conftool/dbconfig/20240603-002359-ladsgroup.json
  • 00:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply

2024-06-02

  • 23:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T364069)', diff saved to https://phabricator.wikimedia.org/P63894 and previous config saved to /var/cache/conftool/dbconfig/20240602-232847-marostegui.json
  • 23:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 23:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 23:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1213.eqiad.wmnet with reason: replication issues
  • 20:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1213.eqiad.wmnet with reason: replication issues
  • 20:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:47 taavi@cumin1002: dbctl commit (dc=all): 'depool db1213', diff saved to https://phabricator.wikimedia.org/P63893 and previous config saved to /var/cache/conftool/dbconfig/20240602-204719-taavi.json
  • 20:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:32 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T364299)', diff saved to https://phabricator.wikimedia.org/P63892 and previous config saved to /var/cache/conftool/dbconfig/20240602-200046-marostegui.json
  • 20:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 20:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 20:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T364299)', diff saved to https://phabricator.wikimedia.org/P63891 and previous config saved to /var/cache/conftool/dbconfig/20240602-200021-marostegui.json
  • 20:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P63890 and previous config saved to /var/cache/conftool/dbconfig/20240602-194514-marostegui.json
  • 19:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:32 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P63889 and previous config saved to /var/cache/conftool/dbconfig/20240602-193006-marostegui.json
  • 19:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T364299)', diff saved to https://phabricator.wikimedia.org/P63888 and previous config saved to /var/cache/conftool/dbconfig/20240602-191458-marostegui.json
  • 19:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P63887 and previous config saved to /var/cache/conftool/dbconfig/20240602-185215-ladsgroup.json
  • 18:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 18:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 18:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P63886 and previous config saved to /var/cache/conftool/dbconfig/20240602-185151-ladsgroup.json
  • 18:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P63885 and previous config saved to /var/cache/conftool/dbconfig/20240602-183643-ladsgroup.json
  • 18:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P63884 and previous config saved to /var/cache/conftool/dbconfig/20240602-182135-ladsgroup.json
  • 18:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P63883 and previous config saved to /var/cache/conftool/dbconfig/20240602-180627-ladsgroup.json
  • 18:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:22 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:32 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T364299)', diff saved to https://phabricator.wikimedia.org/P63882 and previous config saved to /var/cache/conftool/dbconfig/20240602-144924-marostegui.json
  • 14:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 14:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 14:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T364299)', diff saved to https://phabricator.wikimedia.org/P63881 and previous config saved to /var/cache/conftool/dbconfig/20240602-144900-marostegui.json
  • 14:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P63880 and previous config saved to /var/cache/conftool/dbconfig/20240602-143352-marostegui.json
  • 14:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P63879 and previous config saved to /var/cache/conftool/dbconfig/20240602-141843-marostegui.json
  • 14:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P63878 and previous config saved to /var/cache/conftool/dbconfig/20240602-141139-root.json
  • 14:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T364299)', diff saved to https://phabricator.wikimedia.org/P63877 and previous config saved to /var/cache/conftool/dbconfig/20240602-140334-marostegui.json
  • 13:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P63876 and previous config saved to /var/cache/conftool/dbconfig/20240602-135632-root.json
  • 13:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P63875 and previous config saved to /var/cache/conftool/dbconfig/20240602-134126-root.json
  • 13:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P63874 and previous config saved to /var/cache/conftool/dbconfig/20240602-132620-root.json
  • 13:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P63873 and previous config saved to /var/cache/conftool/dbconfig/20240602-131114-root.json
  • 13:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P63872 and previous config saved to /var/cache/conftool/dbconfig/20240602-125608-root.json
  • 12:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P63871 and previous config saved to /var/cache/conftool/dbconfig/20240602-124102-root.json
  • 12:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P63870 and previous config saved to /var/cache/conftool/dbconfig/20240602-120033-ladsgroup.json
  • 12:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P63869 and previous config saved to /var/cache/conftool/dbconfig/20240602-120010-ladsgroup.json
  • 11:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P63868 and previous config saved to /var/cache/conftool/dbconfig/20240602-114503-ladsgroup.json
  • 11:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P63867 and previous config saved to /var/cache/conftool/dbconfig/20240602-112955-ladsgroup.json
  • 11:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T364069)', diff saved to https://phabricator.wikimedia.org/P63866 and previous config saved to /var/cache/conftool/dbconfig/20240602-112512-marostegui.json
  • 11:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P63865 and previous config saved to /var/cache/conftool/dbconfig/20240602-111447-ladsgroup.json
  • 11:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P63864 and previous config saved to /var/cache/conftool/dbconfig/20240602-111004-marostegui.json
  • 10:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P63863 and previous config saved to /var/cache/conftool/dbconfig/20240602-105456-marostegui.json
  • 10:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T364069)', diff saved to https://phabricator.wikimedia.org/P63862 and previous config saved to /var/cache/conftool/dbconfig/20240602-103948-marostegui.json
  • 10:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:32 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:22 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T364299)', diff saved to https://phabricator.wikimedia.org/P63861 and previous config saved to /var/cache/conftool/dbconfig/20240602-091021-marostegui.json
  • 09:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 09:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 09:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T364299)', diff saved to https://phabricator.wikimedia.org/P63860 and previous config saved to /var/cache/conftool/dbconfig/20240602-090941-marostegui.json
  • 09:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P63859 and previous config saved to /var/cache/conftool/dbconfig/20240602-085433-marostegui.json
  • 08:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P63858 and previous config saved to /var/cache/conftool/dbconfig/20240602-083925-marostegui.json
  • 08:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1206.eqiad.wmnet with reason: Long schema change
  • 07:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on db1206.eqiad.wmnet with reason: Long schema change
  • 07:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P63856 and previous config saved to /var/cache/conftool/dbconfig/20240602-072956-root.json
  • 07:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T364299)', diff saved to https://phabricator.wikimedia.org/P63855 and previous config saved to /var/cache/conftool/dbconfig/20240602-033618-marostegui.json
  • 03:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 03:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 03:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T364299)', diff saved to https://phabricator.wikimedia.org/P63854 and previous config saved to /var/cache/conftool/dbconfig/20240602-033555-marostegui.json
  • 03:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P63853 and previous config saved to /var/cache/conftool/dbconfig/20240602-032047-marostegui.json
  • 03:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P63852 and previous config saved to /var/cache/conftool/dbconfig/20240602-030539-marostegui.json
  • 03:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P63851 and previous config saved to /var/cache/conftool/dbconfig/20240602-025039-ladsgroup.json
  • 02:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T364299)', diff saved to https://phabricator.wikimedia.org/P63850 and previous config saved to /var/cache/conftool/dbconfig/20240602-025031-marostegui.json
  • 02:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 02:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 02:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T352010)', diff saved to https://phabricator.wikimedia.org/P63849 and previous config saved to /var/cache/conftool/dbconfig/20240602-025015-ladsgroup.json
  • 02:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P63848 and previous config saved to /var/cache/conftool/dbconfig/20240602-023507-ladsgroup.json
  • 02:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T364069)', diff saved to https://phabricator.wikimedia.org/P63847 and previous config saved to /var/cache/conftool/dbconfig/20240602-022710-marostegui.json
  • 02:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 02:26 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 02:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T364069)', diff saved to https://phabricator.wikimedia.org/P63846 and previous config saved to /var/cache/conftool/dbconfig/20240602-022646-marostegui.json
  • 02:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P63845 and previous config saved to /var/cache/conftool/dbconfig/20240602-021959-ladsgroup.json
  • 02:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P63844 and previous config saved to /var/cache/conftool/dbconfig/20240602-021137-marostegui.json
  • 02:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T352010)', diff saved to https://phabricator.wikimedia.org/P63843 and previous config saved to /var/cache/conftool/dbconfig/20240602-020451-ladsgroup.json
  • 02:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P63842 and previous config saved to /var/cache/conftool/dbconfig/20240602-015629-marostegui.json
  • 01:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T364069)', diff saved to https://phabricator.wikimedia.org/P63841 and previous config saved to /var/cache/conftool/dbconfig/20240602-014121-marostegui.json
  • 01:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply

2024-06-01

  • 23:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:32 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T364299)', diff saved to https://phabricator.wikimedia.org/P63839 and previous config saved to /var/cache/conftool/dbconfig/20240601-215534-marostegui.json
  • 21:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 21:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 21:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2102.codfw.wmnet with reason: Long schema change
  • 21:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on db2102.codfw.wmnet with reason: Long schema change
  • 21:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1162 (T352010)', diff saved to https://phabricator.wikimedia.org/P63838 and previous config saved to /var/cache/conftool/dbconfig/20240601-201053-ladsgroup.json
  • 20:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 20:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 20:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P63837 and previous config saved to /var/cache/conftool/dbconfig/20240601-201029-ladsgroup.json
  • 19:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P63836 and previous config saved to /var/cache/conftool/dbconfig/20240601-195521-ladsgroup.json
  • 19:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P63835 and previous config saved to /var/cache/conftool/dbconfig/20240601-194013-ladsgroup.json
  • 19:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P63834 and previous config saved to /var/cache/conftool/dbconfig/20240601-192505-ladsgroup.json
  • 19:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 17:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 17:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 17:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 17:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 17:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 17:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 17:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 17:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T364299)', diff saved to https://phabricator.wikimedia.org/P63833 and previous config saved to /var/cache/conftool/dbconfig/20240601-174133-marostegui.json
  • 17:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P63832 and previous config saved to /var/cache/conftool/dbconfig/20240601-172625-marostegui.json
  • 17:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T364069)', diff saved to https://phabricator.wikimedia.org/P63831 and previous config saved to /var/cache/conftool/dbconfig/20240601-172455-marostegui.json
  • 17:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 17:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 17:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T364069)', diff saved to https://phabricator.wikimedia.org/P63830 and previous config saved to /var/cache/conftool/dbconfig/20240601-172432-marostegui.json
  • 17:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P63829 and previous config saved to /var/cache/conftool/dbconfig/20240601-171116-marostegui.json
  • 17:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P63828 and previous config saved to /var/cache/conftool/dbconfig/20240601-170924-marostegui.json
  • 17:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T364299)', diff saved to https://phabricator.wikimedia.org/P63827 and previous config saved to /var/cache/conftool/dbconfig/20240601-165609-marostegui.json
  • 16:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P63826 and previous config saved to /var/cache/conftool/dbconfig/20240601-165416-marostegui.json
  • 16:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T364069)', diff saved to https://phabricator.wikimedia.org/P63825 and previous config saved to /var/cache/conftool/dbconfig/20240601-163907-marostegui.json
  • 16:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:22 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:22 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:39 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1010.eqiad.wmnet with OS bullseye
  • 13:39 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - akosiaris@cumin1002"
  • 13:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T364299)', diff saved to https://phabricator.wikimedia.org/P63824 and previous config saved to /var/cache/conftool/dbconfig/20240601-125216-marostegui.json
  • 12:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 12:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 12:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T364299)', diff saved to https://phabricator.wikimedia.org/P63823 and previous config saved to /var/cache/conftool/dbconfig/20240601-125152-marostegui.json
  • 12:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P63822 and previous config saved to /var/cache/conftool/dbconfig/20240601-123644-marostegui.json
  • 12:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P63821 and previous config saved to /var/cache/conftool/dbconfig/20240601-122136-marostegui.json
  • 12:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T364299)', diff saved to https://phabricator.wikimedia.org/P63820 and previous config saved to /var/cache/conftool/dbconfig/20240601-120628-marostegui.json
  • 12:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:08 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:08 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P63819 and previous config saved to /var/cache/conftool/dbconfig/20240601-095545-ladsgroup.json
  • 09:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 09:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 09:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:36 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - akosiaris@cumin1002"
  • 07:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:20 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1010.eqiad.wmnet with reason: host reimage
  • 07:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T364299)', diff saved to https://phabricator.wikimedia.org/P63818 and previous config saved to /var/cache/conftool/dbconfig/20240601-071723-marostegui.json
  • 07:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 07:17 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1010.eqiad.wmnet with reason: host reimage
  • 07:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 07:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T364299)', diff saved to https://phabricator.wikimedia.org/P63817 and previous config saved to /var/cache/conftool/dbconfig/20240601-071700-marostegui.json
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T364069)', diff saved to https://phabricator.wikimedia.org/P63816 and previous config saved to /var/cache/conftool/dbconfig/20240601-070211-marostegui.json
  • 07:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P63815 and previous config saved to /var/cache/conftool/dbconfig/20240601-070151-marostegui.json
  • 07:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 06:59 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1010.eqiad.wmnet with OS bullseye
  • 06:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P63814 and previous config saved to /var/cache/conftool/dbconfig/20240601-064643-marostegui.json
  • 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T364299)', diff saved to https://phabricator.wikimedia.org/P63813 and previous config saved to /var/cache/conftool/dbconfig/20240601-063135-marostegui.json
  • 06:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:14 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:14 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:18 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:18 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:16 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:16 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:03 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:03 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:59 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:59 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:57 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:57 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:55 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:55 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:53 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:53 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:48 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:48 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:46 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:46 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:44 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:44 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:42 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:42 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:40 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:39 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:36 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:35 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:34 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:33 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:31 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:31 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:29 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:29 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:27 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:27 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:25 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:25 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:23 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:23 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:19 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:19 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:17 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:17 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:12 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:12 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:10 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:10 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:08 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:08 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:06 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:06 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:04 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:04 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:02 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:02 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:52 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:52 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:50 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:50 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:48 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:48 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:43 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:43 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:37 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:37 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:35 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:35 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:33 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:33 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:31 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:31 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:29 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:29 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:27 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:27 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:25 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:25 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:23 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:23 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:21 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:21 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:18 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:18 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:16 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:16 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:14 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:14 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T364299)', diff saved to https://phabricator.wikimedia.org/P63812 and previous config saved to /var/cache/conftool/dbconfig/20240601-021256-marostegui.json
  • 02:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 02:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 02:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228 (T364299)', diff saved to https://phabricator.wikimedia.org/P63811 and previous config saved to /var/cache/conftool/dbconfig/20240601-021233-marostegui.json
  • 02:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:03 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:03 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:01 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:01 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:59 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:59 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228', diff saved to https://phabricator.wikimedia.org/P63810 and previous config saved to /var/cache/conftool/dbconfig/20240601-015725-marostegui.json
  • 01:57 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:57 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:55 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:55 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:53 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:52 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:51 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:51 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:49 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:49 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:47 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:47 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:45 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:45 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:43 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:43 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228', diff saved to https://phabricator.wikimedia.org/P63809 and previous config saved to /var/cache/conftool/dbconfig/20240601-014216-marostegui.json
  • 01:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:40 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:40 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:36 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:36 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:32 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:32 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:30 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:30 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:28 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:28 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228 (T364299)', diff saved to https://phabricator.wikimedia.org/P63808 and previous config saved to /var/cache/conftool/dbconfig/20240601-012708-marostegui.json
  • 01:26 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:26 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:24 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:24 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:22 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:22 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:20 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:20 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:18 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:18 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:16 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:16 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:14 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:14 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:12 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:12 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:10 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:10 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P63807 and previous config saved to /var/cache/conftool/dbconfig/20240601-010959-ladsgroup.json
  • 01:08 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:08 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:06 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:06 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:04 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:04 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:02 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:02 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:00 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:00 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:58 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:58 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:56 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:55 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P63806 and previous config saved to /var/cache/conftool/dbconfig/20240601-005451-ladsgroup.json
  • 00:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:54 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:53 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:52 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:51 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:49 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:49 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:47 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:47 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:45 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:45 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P63805 and previous config saved to /var/cache/conftool/dbconfig/20240601-003943-ladsgroup.json
  • 00:38 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:38 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:30 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:30 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:27 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:27 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:25 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:25 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P63804 and previous config saved to /var/cache/conftool/dbconfig/20240601-002435-ladsgroup.json
  • 00:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:22 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:21 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:21 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:19 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:18 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:17 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:16 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:13 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:13 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:11 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:11 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:09 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:09 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:06 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:06 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:04 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:04 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:01 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:01 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply

Archives

See Server Admin Log/Archives.