Server Admin Log

2024-06-21

11:37 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) shellbox-video.discovery.wmnet on all recursors
11:37 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache shellbox-video.discovery.wmnet on all recursors
11:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2214 (T367856)', diff saved to https://phabricator.wikimedia.org/P65303 and previous config saved to /var/cache/conftool/dbconfig/20240621-110638-marostegui.json
11:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Maintenance
11:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Maintenance
10:57 Emperor: restart swift-proxy on ms-fe2011 ms-fe2012 T360913
10:56 Emperor: restart swift-proxy on ms-fe1010 T360913
10:36 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2002.codfw.wmnet
10:36 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2001.codfw.wmnet
10:28 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
10:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T364069)', diff saved to https://phabricator.wikimedia.org/P65302 and previous config saved to /var/cache/conftool/dbconfig/20240621-100554-marostegui.json
10:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
10:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
10:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T364069)', diff saved to https://phabricator.wikimedia.org/P65301 and previous config saved to /var/cache/conftool/dbconfig/20240621-100531-marostegui.json
09:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P65300 and previous config saved to /var/cache/conftool/dbconfig/20240621-095024-marostegui.json
09:45 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 0:00:00 on karapace[1001-1002].eqiad.wmnet with reason: The hosts are soon to be decommissioned
09:45 brouberol@cumin2002: START - Cookbook sre.hosts.downtime for 12 days, 0:00:00 on karapace[1001-1002].eqiad.wmnet with reason: The hosts are soon to be decommissioned
09:41 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bookworm
09:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P65299 and previous config saved to /var/cache/conftool/dbconfig/20240621-093517-marostegui.json
09:31 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240603/ using stat1009.eqiad.wmnet)
09:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T364069)', diff saved to https://phabricator.wikimedia.org/P65298 and previous config saved to /var/cache/conftool/dbconfig/20240621-092009-marostegui.json
09:16 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
09:14 aborrero@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
09:02 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
08:57 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
08:56 aborrero@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bookworm
08:47 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1053.eqiad.wmnet
08:41 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
08:39 aborrero@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudvirt1053.eqiad.wmnet
08:14 vgutierrez: restarting logrotate.service on cp[3068,3070-3071].esams.wmnet
08:04 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
08:04 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
08:03 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
08:03 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
08:00 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
08:00 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
07:54 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65297 and previous config saved to /var/cache/conftool/dbconfig/20240621-075404-arnaudb.json
07:38 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65296 and previous config saved to /var/cache/conftool/dbconfig/20240621-073858-arnaudb.json
07:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65295 and previous config saved to /var/cache/conftool/dbconfig/20240621-072353-arnaudb.json
07:08 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65294 and previous config saved to /var/cache/conftool/dbconfig/20240621-070847-arnaudb.json
07:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool for debugging T368098', diff saved to https://phabricator.wikimedia.org/P65293 and previous config saved to /var/cache/conftool/dbconfig/20240621-070358-arnaudb.json
04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T364069)', diff saved to https://phabricator.wikimedia.org/P65292 and previous config saved to /var/cache/conftool/dbconfig/20240621-045107-marostegui.json
04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
04:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
04:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T364069)', diff saved to https://phabricator.wikimedia.org/P65291 and previous config saved to /var/cache/conftool/dbconfig/20240621-045044-marostegui.json
04:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
04:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
04:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367856)', diff saved to https://phabricator.wikimedia.org/P65290 and previous config saved to /var/cache/conftool/dbconfig/20240621-044455-marostegui.json
04:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P65289 and previous config saved to /var/cache/conftool/dbconfig/20240621-043537-marostegui.json
04:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P65288 and previous config saved to /var/cache/conftool/dbconfig/20240621-042948-marostegui.json
04:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P65287 and previous config saved to /var/cache/conftool/dbconfig/20240621-042030-marostegui.json
04:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P65286 and previous config saved to /var/cache/conftool/dbconfig/20240621-041441-marostegui.json
04:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T364069)', diff saved to https://phabricator.wikimedia.org/P65285 and previous config saved to /var/cache/conftool/dbconfig/20240621-040523-marostegui.json
03:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367856)', diff saved to https://phabricator.wikimedia.org/P65284 and previous config saved to /var/cache/conftool/dbconfig/20240621-035934-marostegui.json
03:04 ejegg: fundraising civicrm upgraded from 2e1db811 to 8a0b5bea
01:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T367856)', diff saved to https://phabricator.wikimedia.org/P65283 and previous config saved to /var/cache/conftool/dbconfig/20240621-014545-marostegui.json
01:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
01:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
01:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65282 and previous config saved to /var/cache/conftool/dbconfig/20240621-014523-marostegui.json
01:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P65281 and previous config saved to /var/cache/conftool/dbconfig/20240621-013016-marostegui.json
01:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P65280 and previous config saved to /var/cache/conftool/dbconfig/20240621-011509-marostegui.json
01:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65279 and previous config saved to /var/cache/conftool/dbconfig/20240621-010002-marostegui.json
00:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T352010)', diff saved to https://phabricator.wikimedia.org/P65278 and previous config saved to /var/cache/conftool/dbconfig/20240621-005237-ladsgroup.json
00:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P65277 and previous config saved to /var/cache/conftool/dbconfig/20240621-003730-ladsgroup.json
00:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P65276 and previous config saved to /var/cache/conftool/dbconfig/20240621-002223-ladsgroup.json
00:08 mutante: [cp3072:~] $ sudo systemctl start varnishkafka-webrequest.service
00:08 mutante: [cp3067:~] $ sudo systemctl start logrotate
00:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T352010)', diff saved to https://phabricator.wikimedia.org/P65275 and previous config saved to /var/cache/conftool/dbconfig/20240621-000716-ladsgroup.json
00:00 sukhe: restarting haproxy on cp3068 and cp3072

2024-06-20

23:47 zabe@deploy1002: Finished scap: Update interwiki cache (duration: 10m 12s)
23:36 zabe@deploy1002: Started scap: Update interwiki cache
23:35 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=btmwiki --cluster=all 2>&1 | tee /tmp/btmwiki.UpdateSearchIndexConfig.log # T368038
23:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T364069)', diff saved to https://phabricator.wikimedia.org/P65274 and previous config saved to /var/cache/conftool/dbconfig/20240620-233346-marostegui.json
23:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
23:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
23:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T364069)', diff saved to https://phabricator.wikimedia.org/P65273 and previous config saved to /var/cache/conftool/dbconfig/20240620-233324-marostegui.json
23:33 zabe@deploy1002: Finished scap: Creating btmwiki (T368038) (duration: 12m 20s)
23:20 zabe@deploy1002: Started scap: Creating btmwiki (T368038)
23:20 zabe: create Wikipedia Mandailing # T368038
23:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P65272 and previous config saved to /var/cache/conftool/dbconfig/20240620-231817-marostegui.json
23:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P65271 and previous config saved to /var/cache/conftool/dbconfig/20240620-230310-marostegui.json
22:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T364069)', diff saved to https://phabricator.wikimedia.org/P65270 and previous config saved to /var/cache/conftool/dbconfig/20240620-224803-marostegui.json
22:39 mutante: aphlict1002/aphlict2001 - systemctl stop aphlict_lograte.timer (and .service); systemctl disable aphlict_logrotate.timer (and .service); systemctl daemon-reload; systemctl reset-failed T367960
22:33 zabe@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T361041 T363825 T366649 (duration: 09m 55s)
22:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65269 and previous config saved to /var/cache/conftool/dbconfig/20240620-222909-marostegui.json
22:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
22:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
22:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367856)', diff saved to https://phabricator.wikimedia.org/P65268 and previous config saved to /var/cache/conftool/dbconfig/20240620-222847-marostegui.json
22:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P65267 and previous config saved to /var/cache/conftool/dbconfig/20240620-221340-marostegui.json
21:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P65266 and previous config saved to /var/cache/conftool/dbconfig/20240620-215833-marostegui.json
21:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367856)', diff saved to https://phabricator.wikimedia.org/P65265 and previous config saved to /var/cache/conftool/dbconfig/20240620-214326-marostegui.json
21:12 ebernhardson@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:12 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
21:12 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
21:11 ebernhardson@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
21:10 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
21:09 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
21:09 brett: Include ncmonitor 1.0.0 in wikimedia-bookworm apt repo
21:09 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:08 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:08 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:08 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
21:07 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
21:07 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:06 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
21:06 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
21:05 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
21:04 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
21:03 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
21:03 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
20:53 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on elastic1105.eqiad.wmnet with reason: T348977
20:53 bking@cumin2002: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on elastic1105.eqiad.wmnet with reason: T348977
20:44 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
20:44 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
20:43 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
20:42 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
20:40 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
20:40 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
20:39 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
20:38 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
20:36 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
20:36 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
20:34 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
20:33 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
20:28 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
20:27 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/page-analytics: apply
20:27 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
20:27 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
20:27 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
20:26 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
20:26 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
20:26 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
20:25 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
20:25 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
20:25 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
20:24 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
19:58 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
19:58 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
19:57 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
19:57 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
19:57 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
19:57 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
19:57 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
19:57 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
19:56 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
19:55 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
19:54 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
19:52 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
19:51 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
19:18 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1105* for T348977 - bking@cumin2002
19:18 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1105* for T348977 - bking@cumin2002
19:18 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic1105 for T348977 - bking@cumin2002
19:18 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1105 for T348977 - bking@cumin2002
19:04 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host elastic2088.codfw.wmnet
19:01 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
18:58 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
18:21 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bookworm
18:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T364069)', diff saved to https://phabricator.wikimedia.org/P65263 and previous config saved to /var/cache/conftool/dbconfig/20240620-181635-marostegui.json
18:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
18:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
18:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65262 and previous config saved to /var/cache/conftool/dbconfig/20240620-181613-marostegui.json
18:06 inflatador: bking@an-airflow1007 install `ripgrep` deb pkg
18:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P65261 and previous config saved to /var/cache/conftool/dbconfig/20240620-180104-marostegui.json
17:51 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
17:48 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
17:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P65260 and previous config saved to /var/cache/conftool/dbconfig/20240620-174557-marostegui.json
17:44 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host elastic2088.codfw.wmnet
17:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T352010)', diff saved to https://phabricator.wikimedia.org/P65259 and previous config saved to /var/cache/conftool/dbconfig/20240620-174125-ladsgroup.json
17:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
17:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
17:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65258 and previous config saved to /var/cache/conftool/dbconfig/20240620-173050-marostegui.json
17:30 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bookworm
17:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm
17:15 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
17:14 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
16:34 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm
16:33 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 75%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65256 and previous config saved to /var/cache/conftool/dbconfig/20240620-163348-arnaudb.json
16:30 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bookworm
16:18 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 50%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65254 and previous config saved to /var/cache/conftool/dbconfig/20240620-161842-arnaudb.json
16:18 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Fix Special:Notifications (T368029) (duration: 12m 21s)
16:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, urbanecm: Continuing with sync
16:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, urbanecm: Backport for Fix Special:Notifications (T368029) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T357309)
16:06 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: test glibc updates - bking@cumin2002 - T367978
16:06 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
16:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Fix Special:Notifications (T368029)
16:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1028.eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
16:03 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 25%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65253 and previous config saved to /var/cache/conftool/dbconfig/20240620-160337-arnaudb.json
16:03 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
16:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2282.codfw.wmnet
16:01 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw2282.codfw.wmnet
16:01 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=mw2282.codfw.wmnet,cluster=kubernetes,service=kubesvc
16:00 claime: Repooling and uncordoning mw2282.codfw.wmnet following move - T361856
15:59 hnowlan@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T357309)
15:59 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:58 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:57 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2019.codfw.wmnet|wikikube-worker2020.codfw.wmnet|wikikube-worker2021.codfw.wmnet|wikikube-worker2022.codfw.wmnet|wikikube-worker2023.codfw.wmnet|wikikube-worker2024.codfw.wmnet),cluster=kubernetes,service=kubesvc
15:57 claime: Pooling and uncordoning wikikube-worker2019.codfw.wmnet,wikikube-worker2020.codfw.wmnet,wikikube-worker2021.codfw.wmnet,wikikube-worker2022.codfw.wmnet,wikikube-worker2023.codfw.wmnet,wikikube-worker2024.codfw.wmnet - T351074
15:55 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1028.eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
15:55 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
15:55 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
15:54 hnowlan@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T357309)
15:52 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: test glibc updates - bking@cumin2002 - T367978
15:48 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 10%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65252 and previous config saved to /var/cache/conftool/dbconfig/20240620-154831-arnaudb.json
15:46 hnowlan@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T357309)
15:46 claime: homer 'cr*codfw*' commit 'T351074'
15:45 cmooney@cumin1002: START - Cookbook sre.hosts.dhcp for host wikikube-ctrl2002.codfw.wmnet
15:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bookworm
15:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2019.codfw.wmnet with OS bullseye
15:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2020.codfw.wmnet with OS bullseye
15:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2022.codfw.wmnet with OS bullseye
15:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:33 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:33 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 5%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65251 and previous config saved to /var/cache/conftool/dbconfig/20240620-153326-arnaudb.json
15:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2024.codfw.wmnet with OS bullseye
15:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2023.codfw.wmnet with OS bullseye
15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2405.codfw.wmnet
15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2405.codfw.wmnet
15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2404.codfw.wmnet
15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2404.codfw.wmnet
15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2403.codfw.wmnet
15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2403.codfw.wmnet
15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2400.codfw.wmnet
15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2400.codfw.wmnet
15:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2021.codfw.wmnet with OS bullseye
15:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2019.codfw.wmnet with reason: host reimage
15:23 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:20 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:19 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2020.codfw.wmnet with reason: host reimage
15:18 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 2%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65249 and previous config saved to /var/cache/conftool/dbconfig/20240620-151820-arnaudb.json
15:18 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2022.codfw.wmnet with reason: host reimage
15:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2024.codfw.wmnet with reason: host reimage
15:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2023.codfw.wmnet with reason: host reimage
15:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2021.codfw.wmnet with reason: host reimage
15:06 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:05 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:04 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore100[4-6].eqiad.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2024.codfw.wmnet with reason: host reimage
15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2023.codfw.wmnet with reason: host reimage
15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2022.codfw.wmnet with reason: host reimage
15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2019.codfw.wmnet with reason: host reimage
15:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2020.codfw.wmnet with reason: host reimage
15:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2021.codfw.wmnet with reason: host reimage
15:02 jhathaway@deploy1002: Finished scap: (no justification provided) (duration: 04m 15s)
15:01 topranks: rebooting lsw1-e6-eqiad to upgrade JunOS on switch T365987
15:01 jhathaway@deploy1002: Started scap: (no justification provided)
14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on an-worker[1160-1162].eqiad.wmnet,es1036.eqiad.wmnet,ms-be1077.eqiad.wmnet with reason: JunOS upgrade lsw1-e6-eqiad
14:58 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on an-worker[1160-1162].eqiad.wmnet,es1036.eqiad.wmnet,ms-be1077.eqiad.wmnet with reason: JunOS upgrade lsw1-e6-eqiad
14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e6-eqiad,lsw1-e6-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e6-eqiad
14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e6-eqiad,lsw1-e6-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e6-eqiad
14:57 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lsw1-f6-eqiad.mgmt
14:57 cmooney@cumin1002: START - Cookbook sre.hosts.remove-downtime for lsw1-f6-eqiad.mgmt
14:56 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-e6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
14:56 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-e6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
14:56 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
14:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1020.eqiad.wmnet
14:54 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs1020.eqiad.wmnet
14:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1018.eqiad.wmnet
14:54 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs1018.eqiad.wmnet
14:54 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
14:53 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-f6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
14:53 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 6 hosts
14:53 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-f6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
14:53 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for 6 hosts
14:48 sukhe: homer "*" commit "rolling out NTP ACL change"
14:48 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
14:48 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2024.codfw.wmnet with OS bullseye
14:47 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65248 and previous config saved to /var/cache/conftool/dbconfig/20240620-144750-arnaudb.json
14:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2023.codfw.wmnet with OS bullseye
14:47 vgutierrez: rolling restart of pybal on lvs1020 and lvs1018 - T367511
14:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2022.codfw.wmnet with OS bullseye
14:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2021.codfw.wmnet with OS bullseye
14:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2020.codfw.wmnet with OS bullseye
14:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2364 to wikikube-worker2024
14:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2024
14:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2019.codfw.wmnet with OS bullseye
14:46 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore100[4-6].eqiad.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
14:45 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2024
14:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2364 to wikikube-worker2024 - cgoubert@cumin1002"
14:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T367856)', diff saved to https://phabricator.wikimedia.org/P65247 and previous config saved to /var/cache/conftool/dbconfig/20240620-144423-marostegui.json
14:44 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2364 to wikikube-worker2024 - cgoubert@cumin1002"
14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
14:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
14:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
14:43 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
14:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
14:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
14:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65246 and previous config saved to /var/cache/conftool/dbconfig/20240620-144341-marostegui.json
14:42 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore200[5-6].codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
14:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2364 to wikikube-worker2024
14:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Main board swap — T362033
14:39 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Main board swap — T362033
14:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2363 to wikikube-worker2023
14:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2023
14:38 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1051.eqiad.wmnet with OS bookworm
14:38 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
14:37 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2023
14:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2363 to wikikube-worker2023 - cgoubert@cumin1002"
14:37 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2324.codfw.wmnet
14:37 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2324.codfw.wmnet
14:36 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2323.codfw.wmnet
14:36 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2323.codfw.wmnet
14:36 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw1489.eqiad.wmnet
14:36 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw1489.eqiad.wmnet
14:35 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
14:35 sukhe: running authdns-update for CR 1047074
14:35 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2363 to wikikube-worker2023 - cgoubert@cumin1002"
14:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
14:32 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65245 and previous config saved to /var/cache/conftool/dbconfig/20240620-143244-arnaudb.json
14:32 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:32 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2363 to wikikube-worker2023
14:31 moritzm: imported python-pymysql 1.0.2-2~wmf11u2 to apt.wikimedia.org (merge of the security fix from DSA 5700 on top of our internal backport)
14:31 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 depool ahead of T365987', diff saved to https://phabricator.wikimedia.org/P65244 and previous config saved to /var/cache/conftool/dbconfig/20240620-143109-arnaudb.json
14:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1036.eqiad.wmnet with reason: T365987
14:30 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore200[5-6].codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
14:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on es1036.eqiad.wmnet with reason: T365987
14:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2362 to wikikube-worker2022
14:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2022
14:29 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2004.codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
14:28 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2022
14:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2362 to wikikube-worker2022 - cgoubert@cumin1002"
14:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65243 and previous config saved to /var/cache/conftool/dbconfig/20240620-142834-marostegui.json
14:27 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2362 to wikikube-worker2022 - cgoubert@cumin1002"
14:27 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
14:26 sukhe: sudo cumin 'O:alerting_host' 'run-puppet-agent'
14:25 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
14:25 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
14:25 elukey@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update wmf-plugin for K8s ml-staging - elukey@cumin1002
14:25 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:24 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2362 to wikikube-worker2022
14:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
14:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
14:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
14:24 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
14:22 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2004.codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
14:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2360 to wikikube-worker2021
14:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2021
14:21 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2021
14:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2360 to wikikube-worker2021 - cgoubert@cumin1002"
14:19 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2360 to wikikube-worker2021 - cgoubert@cumin1002"
14:17 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65242 and previous config saved to /var/cache/conftool/dbconfig/20240620-141739-arnaudb.json
14:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: IPIP migration
14:17 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:17 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: IPIP migration
14:17 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2360 to wikikube-worker2021
14:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2358 to wikikube-worker2020
14:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2020
14:15 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2020
14:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2358 to wikikube-worker2020 - cgoubert@cumin1002"
14:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
14:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
14:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65241 and previous config saved to /var/cache/conftool/dbconfig/20240620-141328-marostegui.json
14:13 elukey@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update wmf-plugin for K8s ml-staging - elukey@cumin1002
14:13 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2358 to wikikube-worker2020 - cgoubert@cumin1002"
14:10 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:10 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2358 to wikikube-worker2020
14:10 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1051.eqiad.wmnet with reason: host reimage
14:10 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P65240 and previous config saved to /var/cache/conftool/dbconfig/20240620-141010-root.json
14:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2339 to wikikube-worker2019
14:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2019
14:09 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2019
14:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2339 to wikikube-worker2019 - cgoubert@cumin1002"
14:07 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1051.eqiad.wmnet with reason: host reimage
14:07 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2339 to wikikube-worker2019 - cgoubert@cumin1002"
14:04 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:04 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2339 to wikikube-worker2019
14:02 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65239 and previous config saved to /var/cache/conftool/dbconfig/20240620-140233-arnaudb.json
14:01 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1049.eqiad.wmnet with OS bookworm
14:01 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1050.eqiad.wmnet with OS bookworm
13:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65238 and previous config saved to /var/cache/conftool/dbconfig/20240620-135820-marostegui.json
13:57 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
13:56 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
13:56 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
13:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65237 and previous config saved to /var/cache/conftool/dbconfig/20240620-135610-marostegui.json
13:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
13:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
13:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65236 and previous config saved to /var/cache/conftool/dbconfig/20240620-135559-marostegui.json
13:55 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
13:55 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
13:55 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
13:55 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
13:55 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
13:55 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
13:54 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
13:54 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
13:54 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P65235 and previous config saved to /var/cache/conftool/dbconfig/20240620-135438-root.json
13:54 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
13:53 claime: Depooling mw2339.codfw.wmnet,mw2358.codfw.wmnet,mw2360.codfw.wmnet,mw2362.codfw.wmnet,mw2363.codfw.wmnet,mw2364.codfw.wmnet for reimage to k8s - T351074
13:53 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
13:52 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
13:52 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
13:51 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
13:51 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
13:50 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
13:50 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
13:50 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1051.eqiad.wmnet with OS bookworm
13:50 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
13:47 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 10%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65234 and previous config saved to /var/cache/conftool/dbconfig/20240620-134728-arnaudb.json
13:46 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
13:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65233 and previous config saved to /var/cache/conftool/dbconfig/20240620-134052-marostegui.json
13:39 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P65232 and previous config saved to /var/cache/conftool/dbconfig/20240620-133907-root.json
13:36 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1050.eqiad.wmnet with reason: host reimage
13:32 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1049.eqiad.wmnet with reason: host reimage
13:28 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1050.eqiad.wmnet with reason: host reimage
13:28 hashar@deploy1002: Finished deploy [integration/docroot@7f59f49]: build: Updating eslint-config-wikimedia to 0.28.2 (duration: 00m 06s)
13:28 hashar@deploy1002: Started deploy [integration/docroot@7f59f49]: build: Updating eslint-config-wikimedia to 0.28.2
13:27 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1049.eqiad.wmnet with reason: host reimage
13:27 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
13:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
13:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65231 and previous config saved to /var/cache/conftool/dbconfig/20240620-132545-marostegui.json
13:24 reedy@deploy1002: Synchronized wmf-config/: T368003 (duration: 10m 39s)
13:24 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
13:23 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P65230 and previous config saved to /var/cache/conftool/dbconfig/20240620-132335-root.json
13:23 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
13:22 elukey: upload dragonfly packages 1.0.6-2 to bookworm-wikimedia - T365253
13:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65228 and previous config saved to /var/cache/conftool/dbconfig/20240620-131038-marostegui.json
13:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65227 and previous config saved to /var/cache/conftool/dbconfig/20240620-131031-marostegui.json
13:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
13:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
13:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65226 and previous config saved to /var/cache/conftool/dbconfig/20240620-130928-marostegui.json
13:09 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1050.eqiad.wmnet with OS bookworm
13:09 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1049.eqiad.wmnet with OS bookworm
13:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
13:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
13:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
13:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
13:08 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P65225 and previous config saved to /var/cache/conftool/dbconfig/20240620-130804-root.json
13:07 sukhe: running homer on cr*{eqiad,codfw}* for CR 1046737: update policies/cr-labs.yaml for new NTP servers: T366360
13:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin1002.eqiad.wmnet
13:05 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2003.codfw.wmnet
13:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin1002.eqiad.wmnet
13:00 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-staging2003.codfw.wmnet
12:54 sukhe: sudo cumin -b1 -s30 "A:installserver" "run-puppet-agent": T366360
12:51 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 5%: 1', diff saved to https://phabricator.wikimedia.org/P65223 and previous config saved to /var/cache/conftool/dbconfig/20240620-125139-root.json
12:51 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
12:44 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
12:37 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1048.eqiad.wmnet with OS bookworm
12:33 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1047.eqiad.wmnet with OS bookworm
12:11 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1048.eqiad.wmnet with reason: host reimage
12:08 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
12:06 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1048.eqiad.wmnet with reason: host reimage
12:04 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
11:52 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=mw2282.codfw.wmnet,cluster=kubernetes,service=kubesvc
11:48 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
11:48 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1048.eqiad.wmnet with OS bookworm
11:47 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bookworm
11:41 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
11:38 XioNoX: merge netbox-extra CR1038869 - Fix lots of CI errors
11:33 jgiannelos@deploy1002: Finished deploy [restbase/deploy@f867c66]: (no justification provided) (duration: 30m 12s)
11:27 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
11:26 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
11:25 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
11:25 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
11:21 akosiaris: upgrade mathoid to 2024-06-18-233457-production T349118
11:20 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: sync
11:20 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: sync
11:03 jgiannelos@deploy1002: Started deploy [restbase/deploy@f867c66]: (no justification provided)
10:57 dreamyjazz@deploy1002: Finished scap: Backport for [testwiki] Fix assignment of 'checkuser-temporary-account' right (T367170) (duration: 15m 03s)
10:48 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
10:44 dreamyjazz@deploy1002: dreamyjazz: Backport for [testwiki] Fix assignment of 'checkuser-temporary-account' right (T367170) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:42 dreamyjazz@deploy1002: Started scap: Backport for [testwiki] Fix assignment of 'checkuser-temporary-account' right (T367170)
10:41 Amir1: running extensions/Echo/maintenance/removeOrphanedEvents.php --force on all wikis (T308084)
10:37 dreamyjazz@deploy1002: Finished scap: Backport for [testwiki] Assign 'checkuser-temporary-account' to the sysop group (T367170) (duration: 13m 49s)
10:33 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1046.eqiad.wmnet with OS bookworm
10:33 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1045.eqiad.wmnet with OS bookworm
10:31 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=mw2321.codfw.wmnet,cluster=kubernetes,service=kubesvc
10:31 claime: repooling and uncordoning mw2321.codfw.wmnet - T367862
10:31 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
10:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2321.codfw.wmnet
10:30 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw2321.codfw.wmnet
10:28 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
10:25 dreamyjazz@deploy1002: dreamyjazz: Backport for [testwiki] Assign 'checkuser-temporary-account' to the sysop group (T367170) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:24 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
10:23 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
10:23 dreamyjazz@deploy1002: Started scap: Backport for [testwiki] Assign 'checkuser-temporary-account' to the sysop group (T367170)
10:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2321.codfw.wmnet with reason: Test scap with host unavailable
10:20 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
10:20 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2321.codfw.wmnet with reason: Test scap with host unavailable
10:19 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
10:18 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
10:18 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
10:17 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
10:16 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
10:16 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
10:15 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=mw2321.codfw.wmnet,cluster=kubernetes,service=kubesvc
10:14 claime: Draining and depooling mw2321.codfw.wmnet to test 1047031 - T367862
10:14 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
10:07 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
10:04 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage
10:04 claime: Running puppet on A:wikikube-worker
10:02 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
10:01 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage
10:00 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
10:00 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
09:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
09:51 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
09:50 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
09:49 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
09:47 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:45 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bookworm
09:45 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1045.eqiad.wmnet with OS bookworm
09:45 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
09:16 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php sysop_plwiki AramilFeraxa REDACTED --bureaucrat --sysop # T361041
08:57 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
08:51 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
08:51 cmooney@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.6.6 - cmooney@cumin1002
08:50 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
08:49 cmooney@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.6.6 - cmooney@cumin1002
08:36 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1044.eqiad.wmnet with OS bookworm
08:33 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
08:23 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
08:16 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.10 refs T361404
08:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1001.wikimedia.org
08:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1001.wikimedia.org
08:08 moritzm: reboot of irc1001 to nudge clients to re-connect to the new bullseye host T331702
08:06 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
08:03 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
07:53 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
07:53 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
07:53 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
07:52 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
07:48 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
07:04 moritzm: failover irc.wikimedia.org to the new Bullseye servers T331702
06:04 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on an-worker1085.eqiad.wmnet with reason: T367825 hw maint 2024-06-20
06:03 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 18:00:00 on an-worker1085.eqiad.wmnet with reason: T367825 hw maint 2024-06-20
05:27 marostegui: Deploy schema change on old s7 eqiad master dbmaint (db1236) T364299
05:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Long schema change
05:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Long schema change
05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1236 T367857', diff saved to https://phabricator.wikimedia.org/P65220 and previous config saved to /var/cache/conftool/dbconfig/20240620-052359-root.json
05:22 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1181 to s7 primary and set section read-write T367857', diff saved to https://phabricator.wikimedia.org/P65219 and previous config saved to /var/cache/conftool/dbconfig/20240620-052253-marostegui.json
05:22 marostegui@cumin1002: dbctl commit (dc=all): 'Set s7 eqiad as read-only for maintenance - T367857', diff saved to https://phabricator.wikimedia.org/P65218 and previous config saved to /var/cache/conftool/dbconfig/20240620-052230-marostegui.json
05:22 marostegui: Starting s7 eqiad failover from db1236 to db1181 - T367857
05:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Long schema change
05:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Long schema change
05:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 T367857
05:04 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1181 with weight 0 T367857', diff saved to https://phabricator.wikimedia.org/P65217 and previous config saved to /var/cache/conftool/dbconfig/20240620-050428-marostegui.json
05:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 28 hosts with reason: Primary switchover s7 T367857
02:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T367856)', diff saved to https://phabricator.wikimedia.org/P65216 and previous config saved to /var/cache/conftool/dbconfig/20240620-022416-marostegui.json
02:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
02:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
02:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
02:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
02:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65215 and previous config saved to /var/cache/conftool/dbconfig/20240620-022349-marostegui.json
02:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65214 and previous config saved to /var/cache/conftool/dbconfig/20240620-020842-marostegui.json
01:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65213 and previous config saved to /var/cache/conftool/dbconfig/20240620-015335-marostegui.json
01:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65212 and previous config saved to /var/cache/conftool/dbconfig/20240620-013827-marostegui.json

2024-06-19

23:05 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php arbcom_itwiki Superpes15 REDACTED --bureaucrat --sysop
23:05 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php u4cwiki Superpes15 REDACTED --bureaucrat --sysop
21:08 oblivian@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wikikube-ctrl[2001-2002].codfw.wmnet with reason: Reimage --kamila
21:08 oblivian@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wikikube-ctrl[2001-2002].codfw.wmnet with reason: Reimage --kamila
20:33 zabe@deploy1002: Finished scap: Backport for [tlywiki] Change the logo and wordmark/tagline (T366431) (duration: 14m 41s)
20:24 zabe@deploy1002: superpes, zabe: Continuing with sync
20:23 zabe@deploy1002: superpes, zabe: Backport for [tlywiki] Change the logo and wordmark/tagline (T366431) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:19 zabe@deploy1002: Started scap: Backport for [tlywiki] Change the logo and wordmark/tagline (T366431)
19:08 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
19:05 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
18:54 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2001.codfw.wmnet with reason: host reimage
18:51 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2001.codfw.wmnet with reason: host reimage
18:49 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
18:48 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
18:40 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
18:35 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
18:34 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
18:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65211 and previous config saved to /var/cache/conftool/dbconfig/20240619-182922-marostegui.json
18:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
18:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
18:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65210 and previous config saved to /var/cache/conftool/dbconfig/20240619-182900-marostegui.json
18:21 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
18:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65209 and previous config saved to /var/cache/conftool/dbconfig/20240619-181353-marostegui.json
17:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65208 and previous config saved to /var/cache/conftool/dbconfig/20240619-175846-marostegui.json
17:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65207 and previous config saved to /var/cache/conftool/dbconfig/20240619-174338-marostegui.json
17:21 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:21 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
17:20 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
17:13 kamila@cumin1002: START - Cookbook sre.dns.netbox
17:05 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1044.eqiad.wmnet with OS bookworm
17:01 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2002
17:01 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2002
17:01 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2001
17:01 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2001
17:00 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:00 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
16:59 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
16:42 sukhe: sudo cumin 'A:durum' 'run-puppet-agent' to switch timesyncd NTP pools to ntp-[abc].anycast.wmnet: T366360
16:27 claime: pooling and uncordoning mw2321.codfw.wmnet - T367702
16:27 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=mw2321.codfw.wmnet,cluster=kubernetes,service=kubesvc
16:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: service=(ntp-a|ntp-b|ntp-c)
16:15 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
16:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw2321.codfw.wmnet back to active - cgoubert@cumin1002"
16:12 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw2321.codfw.wmnet back to active - cgoubert@cumin1002"
16:09 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
16:03 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
15:55 ayounsi@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
15:55 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
15:51 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:50 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
15:46 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:46 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
15:45 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
15:44 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:43 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
15:32 taavi@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1042
15:32 taavi@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1042
15:24 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:24 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:23 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:23 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:23 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:22 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:16 sukhe: sudo cumin -b1 -s120 'A:dnsbox' 'run-puppet-agent --enable "merging CR 1046685"': T366360
15:08 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:08 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS info - pt1979@cumin2002"
15:07 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS info - pt1979@cumin2002"
15:07 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1043.eqiad.wmnet with OS bookworm
15:06 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
15:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns2006.wikimedia.org,service=ntp-c
15:03 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:01 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on mw2282.codfw.wmnet with reason: Host move
15:01 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
15:01 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on mw2282.codfw.wmnet with reason: Host move
15:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2282.codfw.wmnet
15:00 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw2282.codfw.wmnet
14:59 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.remove-downtime (exit_code=97) for wikikube-worker2003.codfw.wmnet
14:59 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-worker2003.codfw.wmnet
14:42 marostegui: Deploy schema change on s2 eqiad master dbmaint T364069
14:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Long schema change
14:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Long schema change
14:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155,1158].eqiad.wmnet with reason: Long schema change
14:38 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1042.eqiad.wmnet with OS bookworm
14:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155,1158].eqiad.wmnet with reason: Long schema change
14:38 taavi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1002"
14:37 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
14:36 taavi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1002"
14:35 moritzm: installing nano security updates
14:34 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
14:24 moritzm: installing libvpx security updates
14:23 moritzm: installing pymysql security updates
14:19 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
14:19 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bookworm
14:17 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
14:14 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
14:12 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
14:11 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
14:11 taavi@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
14:10 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging2003.codfw.wmnet with OS bookworm
14:10 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - klausman@cumin2002"
14:09 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
14:09 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - klausman@cumin2002"
14:09 taavi@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
14:08 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
14:08 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
14:07 taavi@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
14:07 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
14:01 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1044.eqiad.wmnet with OS bookworm
13:57 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging2003.codfw.wmnet with reason: host reimage
13:54 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging2003.codfw.wmnet with reason: host reimage
13:53 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bookworm
13:53 taavi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1042.eqiad.wmnet with OS bookworm
13:51 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
13:50 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
13:49 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
13:48 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
13:43 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bookworm
13:42 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
13:41 taavi@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
13:41 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
13:35 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
13:35 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
13:35 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1043.eqiad.wmnet with OS bookworm
13:35 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5017.*} and A:cp
13:32 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5017.*} and A:cp
13:32 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
13:32 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org,service=ntp-a
13:31 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
13:31 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
13:28 sukhe: enable puppet on dns6001 to test CR 1046685
13:23 sukhe: sudo cumin 'A:dnsbox' 'disable-puppet "merging CR 1046685"': T366360
13:22 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:21 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on mw2282.codfw.wmnet with reason: host move
13:21 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on mw2282.codfw.wmnet with reason: host move
13:20 pt1979@cumin2002: START - Cookbook sre.dns.netbox
13:17 kamila_: drained mw2282.codfw.wmnet for T361856
13:16 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
13:06 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
13:04 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: service=ntp-[abc]
13:04 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: sync on production
12:52 taavi@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
12:51 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2011.codfw.wmnet|wikikube-worker2012.codfw.wmnet|wikikube-worker2013.codfw.wmnet|wikikube-worker2014.codfw.wmnet|wikikube-worker2017.codfw.wmnet|wikikube-worker2018.codfw.wmnet),cluster=kubernetes,service=kubesvc
12:40 claime: homer 'cr*codfw*' commit 'T351074'
12:38 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:38 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
12:38 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
12:37 taavi@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
12:37 taavi@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
12:36 taavi@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
12:36 taavi@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
12:36 taavi@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
12:35 taavi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1042.eqiad.wmnet with OS bookworm
12:34 klausman: Puppet management of install2004 restored, lpxelinux.0 also restored.
12:24 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
12:22 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:21 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
12:20 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:19 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
12:17 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
12:14 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
12:14 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
12:13 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bookworm
12:12 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
12:11 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
12:11 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
12:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
12:08 klausman: Will test-replace the PXE chainloader (/srv/tftpboot/lpxelinux.0) on install2003 with a newer version to see if it fixes the ldlinux.c32 error. Puppet will be disabled on that machine for the duration.
12:07 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
12:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
12:06 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
12:03 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
12:03 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
12:02 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
12:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65204 and previous config saved to /var/cache/conftool/dbconfig/20240619-120142-root.json
12:01 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on idp-test1002.wikimedia.org with reason: CAS 7 upgrade
12:01 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on idp-test1002.wikimedia.org with reason: CAS 7 upgrade
12:00 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
12:00 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5017.*} and A:cp
11:57 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bookworm
11:57 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5017.*} and A:cp
11:50 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
11:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65203 and previous config saved to /var/cache/conftool/dbconfig/20240619-114636-root.json
11:36 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-eqsin
11:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65201 and previous config saved to /var/cache/conftool/dbconfig/20240619-113131-root.json
11:26 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
11:18 ayounsi@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host netbox-dev2003.codfw.wmnet
11:18 ayounsi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host netbox-dev2003.codfw.wmnet with OS bookworm
11:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2012.codfw.wmnet with OS bullseye
11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65200 and previous config saved to /var/cache/conftool/dbconfig/20240619-111625-root.json
11:15 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
11:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2013.codfw.wmnet with OS bullseye
11:14 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
11:14 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
11:13 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
11:12 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
11:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2014.codfw.wmnet with OS bullseye
11:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
11:08 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
11:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2017.codfw.wmnet with OS bullseye
11:07 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
11:07 hnowlan@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
11:06 hnowlan@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
11:04 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
11:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2018.codfw.wmnet with OS bullseye
11:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2011.codfw.wmnet with OS bullseye
11:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65199 and previous config saved to /var/cache/conftool/dbconfig/20240619-110120-root.json
10:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2012.codfw.wmnet with reason: host reimage
10:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2013.codfw.wmnet with reason: host reimage
10:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2014.codfw.wmnet with reason: host reimage
10:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2017.codfw.wmnet with reason: host reimage
10:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65198 and previous config saved to /var/cache/conftool/dbconfig/20240619-104614-root.json
10:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2018.codfw.wmnet with reason: host reimage
10:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2011.codfw.wmnet with reason: host reimage
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2017.codfw.wmnet with reason: host reimage
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2018.codfw.wmnet with reason: host reimage
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2013.codfw.wmnet with reason: host reimage
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2014.codfw.wmnet with reason: host reimage
10:40 jmm@deploy1002: Finished scap: (no justification provided) (duration: 04m 03s)
10:39 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2012.codfw.wmnet with reason: host reimage
10:39 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2011.codfw.wmnet with reason: host reimage
10:36 jmm@deploy1002: Started scap: (no justification provided)
10:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65197 and previous config saved to /var/cache/conftool/dbconfig/20240619-103109-root.json
10:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2018.codfw.wmnet with OS bullseye
10:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2017.codfw.wmnet with OS bullseye
10:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65196 and previous config saved to /var/cache/conftool/dbconfig/20240619-102504-marostegui.json
10:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
10:24 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2014.codfw.wmnet with OS bullseye
10:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
10:24 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2013.codfw.wmnet with OS bullseye
10:24 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2012.codfw.wmnet with OS bullseye
10:23 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2011.codfw.wmnet with OS bullseye
10:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2409 to wikikube-worker2018
10:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2018
10:22 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2018
10:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2409 to wikikube-worker2018 - cgoubert@cumin1002"
10:21 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2409 to wikikube-worker2018 - cgoubert@cumin1002"
10:18 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:18 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2409 to wikikube-worker2018
10:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2408 to wikikube-worker2017
10:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2017
10:17 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2017
10:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2408 to wikikube-worker2017 - cgoubert@cumin1002"
10:16 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2408 to wikikube-worker2017 - cgoubert@cumin1002"
10:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P65195 and previous config saved to /var/cache/conftool/dbconfig/20240619-101625-marostegui.json
10:14 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:14 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2408 to wikikube-worker2017
10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2405 to wikikube-worker2014
10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2014
10:12 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2014
10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2405 to wikikube-worker2014 - cgoubert@cumin1002"
10:09 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2405 to wikikube-worker2014 - cgoubert@cumin1002"
10:06 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:06 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2405 to wikikube-worker2014
10:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2404 to wikikube-worker2013
10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2013
10:05 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
10:05 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2013
10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2404 to wikikube-worker2013 - cgoubert@cumin1002"
10:03 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2404 to wikikube-worker2013 - cgoubert@cumin1002"
10:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65194 and previous config saved to /var/cache/conftool/dbconfig/20240619-100118-marostegui.json
10:00 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1
10:00 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
09:59 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2404 to wikikube-worker2013
09:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2403 to wikikube-worker2012
09:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2012
09:55 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
09:54 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
09:53 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2012
09:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2403 to wikikube-worker2012 - cgoubert@cumin1002"
09:51 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2403 to wikikube-worker2012 - cgoubert@cumin1002"
09:51 ayounsi@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "netbox-dev2003 - ayounsi@cumin1002"
09:47 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "netbox-dev2003 - ayounsi@cumin1002"
09:47 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
09:47 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2403 to wikikube-worker2012
09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2400 to wikikube-worker2011
09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2011
09:46 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2011
09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2400 to wikikube-worker2011 - cgoubert@cumin1002"
09:44 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
09:43 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2400 to wikikube-worker2011 - cgoubert@cumin1002"
09:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
09:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2400 to wikikube-worker2011
09:40 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-eqsin
09:34 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:32 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
09:22 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
09:21 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
09:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netbox-dev2003.codfw.wmnet with reason: host reimage
09:15 claime: Depooling mw2400.codfw.wmnet,mw2403.codfw.wmnet,mw2404.codfw.wmnet,mw2405.codfw.wmnet,mw2408.codfw.wmnet,mw2409.codfw.wmnet for reimage - T351074
09:13 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on netbox-dev2003.codfw.wmnet with reason: host reimage
09:11 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
09:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe2001.codfw.wmnet with OS bookworm
09:01 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5025.*} and A:cp
08:59 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe1001.eqiad.wmnet with OS bookworm
08:58 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5025.*} and A:cp
08:57 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5017.*} and A:cp
08:54 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5017.*} and A:cp
08:52 fabfur: upgrading eqsin cp hosts to haproxy 2.8.10 (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1047436) (T367756)
08:51 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
08:48 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
08:40 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: host reimage
08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15830
08:38 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: host reimage
08:35 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 15830
08:31 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2001.codfw.wmnet with OS bookworm
08:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe2001.codfw.wmnet with OS bookworm
08:30 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
08:24 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
08:23 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1001.eqiad.wmnet with OS bookworm
08:23 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe1001.eqiad.wmnet with OS bookworm
08:18 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.10 refs T361404
08:12 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2001.codfw.wmnet with OS bookworm
08:11 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1001.eqiad.wmnet with OS bookworm
08:09 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
08:03 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
08:01 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host netbox-dev2003.codfw.wmnet with OS bookworm
08:00 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
07:59 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
07:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox-dev2003.codfw.wmnet on all recursors
07:59 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache netbox-dev2003.codfw.wmnet on all recursors
07:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
07:57 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
07:54 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
07:54 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host netbox-dev2003.codfw.wmnet
07:48 kartik@deploy1002: Finished scap: Backport for igwiki: Enable MinT for Wikipedia readers (T363464) (duration: 18m 55s)
07:38 kartik@deploy1002: kartik: Continuing with sync
07:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
07:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
07:33 kartik@deploy1002: kartik: Backport for igwiki: Enable MinT for Wikipedia readers (T363464) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:29 kartik@deploy1002: Started scap: Backport for igwiki: Enable MinT for Wikipedia readers (T363464)
07:22 kartik@deploy1002: Finished scap: Backport for testwiki: Enable MinT for Wikipedia readers MVP on a Igbo Wikipedia (T367852) (duration: 20m 12s)
07:20 marostegui: Deploy schema change on old s7 eqiad master db1160 dbmaint T364069
07:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65192 and previous config saved to /var/cache/conftool/dbconfig/20240619-071516-root.json
07:12 kartik@deploy1002: kartik: Continuing with sync
07:07 kartik@deploy1002: kartik: Backport for testwiki: Enable MinT for Wikipedia readers MVP on a Igbo Wikipedia (T367852) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:02 kartik@deploy1002: Started scap: Backport for testwiki: Enable MinT for Wikipedia readers MVP on a Igbo Wikipedia (T367852)
07:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65191 and previous config saved to /var/cache/conftool/dbconfig/20240619-070010-root.json
06:52 jynus: stop db1240:s1, wipe and reimport db1240:s3 T367162
06:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65190 and previous config saved to /var/cache/conftool/dbconfig/20240619-064505-root.json
06:40 XioNoX: merge Puppet "Prepare for netbox-dev" CR1047081
06:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65189 and previous config saved to /var/cache/conftool/dbconfig/20240619-063337-root.json
06:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65188 and previous config saved to /var/cache/conftool/dbconfig/20240619-062959-root.json
06:21 _joe_: upgrading conftool everywhere T367919
06:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65187 and previous config saved to /var/cache/conftool/dbconfig/20240619-061831-root.json
06:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P65186 and previous config saved to /var/cache/conftool/dbconfig/20240619-061721-root.json
06:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65185 and previous config saved to /var/cache/conftool/dbconfig/20240619-061454-root.json
06:08 _joe_: uploaded newer python-conftool packages T367919
06:05 _joe_: deleting manually thirdparty/conda repositories from reprepro T364550
06:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65184 and previous config saved to /var/cache/conftool/dbconfig/20240619-060326-root.json
06:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: After reimage', diff saved to https://phabricator.wikimedia.org/P65183 and previous config saved to /var/cache/conftool/dbconfig/20240619-060216-root.json
05:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65182 and previous config saved to /var/cache/conftool/dbconfig/20240619-055948-root.json
05:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65181 and previous config saved to /var/cache/conftool/dbconfig/20240619-054820-root.json
05:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P65180 and previous config saved to /var/cache/conftool/dbconfig/20240619-054710-root.json
05:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65179 and previous config saved to /var/cache/conftool/dbconfig/20240619-054443-root.json
05:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65178 and previous config saved to /var/cache/conftool/dbconfig/20240619-054259-root.json
05:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65177 and previous config saved to /var/cache/conftool/dbconfig/20240619-054214-marostegui.json
05:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65176 and previous config saved to /var/cache/conftool/dbconfig/20240619-053315-root.json
05:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: After reimage', diff saved to https://phabricator.wikimedia.org/P65175 and previous config saved to /var/cache/conftool/dbconfig/20240619-053205-root.json
05:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65174 and previous config saved to /var/cache/conftool/dbconfig/20240619-052754-root.json
05:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
05:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
05:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65173 and previous config saved to /var/cache/conftool/dbconfig/20240619-051809-root.json
05:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P65172 and previous config saved to /var/cache/conftool/dbconfig/20240619-051659-root.json
05:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65171 and previous config saved to /var/cache/conftool/dbconfig/20240619-051248-root.json
05:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P65170 and previous config saved to /var/cache/conftool/dbconfig/20240619-051233-root.json
05:10 marostegui@cumin1002: dbctl commit (dc=all): 'repool db1169', diff saved to https://phabricator.wikimedia.org/P65169 and previous config saved to /var/cache/conftool/dbconfig/20240619-051014-marostegui.json
05:09 marostegui@cumin1002: dbctl commit (dc=all): 'test depool db1169', diff saved to https://phabricator.wikimedia.org/P65168 and previous config saved to /var/cache/conftool/dbconfig/20240619-050951-marostegui.json

2024-06-18

23:22 jforrester@deploy1002: Finished scap: Backport for Use isEnumType in selector and isCustomEnum for creating literals (T367159), findAddedContentNeedingReference was removed accidentally (T367920) (duration: 17m 16s)
23:12 jforrester@deploy1002: jforrester, kemayo: Continuing with sync
23:10 jforrester@deploy1002: jforrester, kemayo: Backport for Use isEnumType in selector and isCustomEnum for creating literals (T367159), findAddedContentNeedingReference was removed accidentally (T367920) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:05 jforrester@deploy1002: Started scap: Backport for Use isEnumType in selector and isCustomEnum for creating literals (T367159), findAddedContentNeedingReference was removed accidentally (T367920)
22:49 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
22:49 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:31 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
22:20 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:19 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
22:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:07 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:56 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:55 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:54 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:44 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:26 jdrewniak@deploy1002: Finished scap: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844) (duration: 16m 33s)
21:16 jdrewniak@deploy1002: jdlrobson, jdrewniak: Continuing with sync
21:14 jdrewniak@deploy1002: jdlrobson, jdrewniak: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:09 jdrewniak@deploy1002: Started scap: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844)
21:07 jdrewniak@deploy1002: Sync cancelled.
21:07 jdrewniak@deploy1002: jdrewniak, jdlrobson: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:03 jdrewniak@deploy1002: Started scap: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844)
20:59 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Upgrade to Java 11 — T350567 - eevans@cumin1002
20:50 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
20:50 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
20:49 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
20:49 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
20:47 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
20:47 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
20:33 urbanecm@deploy1002: Finished scap: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki (duration: 18m 59s)
20:24 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
20:22 urbanecm@deploy1002: kemayo, urbanecm, superzerocool: Continuing with sync
20:18 urbanecm@deploy1002: kemayo, urbanecm, superzerocool: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:14 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
20:14 urbanecm@deploy1002: Started scap: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki
20:10 urbanecm@deploy1002: Sync cancelled.
20:10 urbanecm@deploy1002: urbanecm, superzerocool, kemayo: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:09 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
20:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
20:06 urbanecm@deploy1002: Started scap: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki
19:59 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:42 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
19:42 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-staging2003.codfw.wmnet with OS bookworm
19:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:30 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
19:29 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:26 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
19:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:17 mutante: lists1001 - systemctl reset-failed - clean up systemd state due to units not found anymore after migration - disable puppet and then deploy gerrit:1047160 on lists to fix invalid unit name - T331706
18:49 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Upgrade to Java 11 — T350567 - eevans@cumin1002
18:44 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in esams for T365123
18:39 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in eqsin for T365123
18:33 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in drmrs for T365123
18:29 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Upgrade to Java 11 — T350567 - eevans@cumin1002
18:27 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in magru for T365123
18:19 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
18:17 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in ulsfo for T365123
18:16 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-staging2003.codfw.wmnet with OS bookworm
18:16 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
17:37 swfrench-wmf: updated conftool to 3.0.0 on bullseye hosts in eqiad for T365123
17:35 swfrench-wmf: updated conftool to 3.0.0 on bookworm hosts in eqiad for T365123
17:34 swfrench-wmf: updated conftool to 3.0.0 on buster hosts in eqiad for T365123
17:21 cdanis: resetting Wiki response time metric on wikimedia.statuspage.io following complete switch to k8s - T362323 T367894
17:16 swfrench-wmf: updated conftool to 3.0.0 on remaining bullseye hosts in codfw for T365123
17:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
17:14 swfrench-wmf: updated conftool to 3.0.0 on remaining bookworm hosts in codfw for T365123
17:12 swfrench-wmf: updated conftool to 3.0.0 on remaining buster hosts in codfw for T365123
16:42 swfrench-wmf: conftool on puppetmaster2001 updated to 3.0.0 for T365123
16:39 swfrench-wmf: validated dbctl 3.0.0 on cumin2002 (noop edit to note: on parsercache spare pc2014) for T365123
16:39 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
16:34 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s1
16:31 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-worker1093.eqiad.wmnet with reason: T367825 hw maint
16:31 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-worker1093.eqiad.wmnet with reason: T367825 hw maint
16:29 swfrench-wmf: conftool on cumin2002 updated to 3.0.0 for T365123
16:23 claime: resetting Wiki response time metric on wikimedia.statuspage.io following complete switch to k8s - T362323
16:23 swfrench-wmf: depooled / pooled mw2441.codfw.wmnet to smoke-test python3-conftool for T365123
16:22 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Upgrade to Java 11 — T350567 - eevans@cumin1002
16:20 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 100%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65167 and previous config saved to /var/cache/conftool/dbconfig/20240618-162053-arnaudb.json
16:19 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
16:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
16:05 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 75%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65166 and previous config saved to /var/cache/conftool/dbconfig/20240618-160548-arnaudb.json
16:02 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
15:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-staging2003.codfw.wmnet with OS bookworm
15:53 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s2
15:53 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s7
15:52 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
15:51 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
15:51 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
15:50 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 50%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65165 and previous config saved to /var/cache/conftool/dbconfig/20240618-155042-arnaudb.json
15:50 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
15:50 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
15:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T364069)', diff saved to https://phabricator.wikimedia.org/P65164 and previous config saved to /var/cache/conftool/dbconfig/20240618-155000-marostegui.json
15:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
15:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
15:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65163 and previous config saved to /var/cache/conftool/dbconfig/20240618-154938-marostegui.json
15:49 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-staging2003
15:49 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ml-staging2003
15:48 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
15:47 swfrench-wmf: included conftool 3.0.0 into buster/bullseye/bookworm-wikimedia on apt.w.o for T365123
15:47 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
15:46 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
15:45 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
15:44 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5032.*} and A:cp
15:43 hnowlan@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
15:42 hnowlan@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
15:42 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5032.*} and A:cp
15:41 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5030.*} and A:cp
15:39 fabfur: upgrade haproxy to v2.8.10 on cp5030,cp5032 (T367756)
15:39 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5030.*} and A:cp
15:38 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp3066.*} and A:cp
15:36 fabfur: upgrade haproxy to v2.8.10 on cp3066 (T367756)
15:35 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp3066.*} and A:cp
15:35 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 25%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65162 and previous config saved to /var/cache/conftool/dbconfig/20240618-153537-arnaudb.json
15:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P65161 and previous config saved to /var/cache/conftool/dbconfig/20240618-153430-marostegui.json
15:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1003.eqiad.wmnet with OS bookworm
15:30 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
15:30 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
15:30 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
15:23 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
15:20 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 10%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65159 and previous config saved to /var/cache/conftool/dbconfig/20240618-152031-arnaudb.json
15:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P65158 and previous config saved to /var/cache/conftool/dbconfig/20240618-151923-marostegui.json
15:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
15:07 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
15:07 brennen@deploy1002: Finished deploy [phabricator/deployment@ef680d8]: revert phab1004 after breakage for T367775 (duration: 00m 15s)
15:07 brennen@deploy1002: Started deploy [phabricator/deployment@ef680d8]: revert phab1004 after breakage for T367775
15:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe1002.eqiad.wmnet with OS bookworm
15:06 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:06 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:06 brennen@deploy1002: Finished deploy [phabricator/deployment@ebe3a94]: deploy phab1004 for T367775 (duration: 00m 47s)
15:05 brennen@deploy1002: Started deploy [phabricator/deployment@ebe3a94]: deploy phab1004 for T367775
15:05 brennen@deploy1002: Finished deploy [phabricator/deployment@ebe3a94]: deploy phab2002 for T367775 (duration: 00m 36s)
15:04 brennen@deploy1002: Started deploy [phabricator/deployment@ebe3a94]: deploy phab2002 for T367775
15:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65157 and previous config saved to /var/cache/conftool/dbconfig/20240618-150416-marostegui.json
15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
15:00 mforns@deploy1002: Finished deploy [airflow-dags/analytics@4f7d29a]: (no justification provided) (duration: 00m 28s)
15:00 topranks: rebooting lsw1-f7-eqiad to upgrade JunOS on switch T365984
15:00 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host htmldumper1001.eqiad.wmnet
15:00 mforns@deploy1002: Started deploy [airflow-dags/analytics@4f7d29a]: (no justification provided)
14:57 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:35:00 on an-worker[1172-1174].eqiad.wmnet,es1040.eqiad.wmnet,ms-be1081.eqiad.wmnet with reason: JunOS upgrade lsw1-f7-eqiad
14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:35:00 on an-worker[1172-1174].eqiad.wmnet,es1040.eqiad.wmnet,ms-be1081.eqiad.wmnet with reason: JunOS upgrade lsw1-f7-eqiad
14:56 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-f7-eqiad,lsw1-f7-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f7-eqiad
14:56 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-f7-eqiad,lsw1-f7-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f7-eqiad
14:53 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host htmldumper1001.eqiad.wmnet
14:49 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bookworm
14:47 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:40:00 on lsw1-f7-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f7-eqiad
14:47 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:40:00 on lsw1-f7-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f7-eqiad
14:44 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1001.eqiad.wmnet with OS bookworm
14:44 jynus: reenable puppet on backup2002
14:40 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ml-serve2001.codfw.wmnet with reason: Hardware maintenance for memory errors
14:40 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ml-serve2001.codfw.wmnet with reason: Hardware maintenance for memory errors
14:39 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 depool - T365984', diff saved to https://phabricator.wikimedia.org/P65156 and previous config saved to /var/cache/conftool/dbconfig/20240618-143951-arnaudb.json
14:39 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4046.ulsfo.wmnet
14:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1040.eqiad.wmnet with reason: T365984
14:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1040.eqiad.wmnet with reason: T365984
14:36 sukhe: enabling puppet and running puppet agent on cp4037
14:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
14:24 claime: trafficserver: move 100% of traffic to mw-on-k8s - T362323
14:23 btullis@cumin1002: START - Cookbook sre.presto.reboot-workers for Presto an-presto cluster: Reboot Presto nodes
14:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1001.eqiad.wmnet with reason: host reimage
14:21 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
14:21 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
14:21 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
14:21 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
14:20 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
14:20 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
14:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
14:20 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
14:20 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
14:20 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
14:19 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1001.eqiad.wmnet with reason: host reimage
14:17 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
14:17 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
14:17 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
14:09 swfrench-wmf: included conftool 3.0.0 into buster-wikimedia on apt.w.o for T365123
14:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: host reimage
14:03 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: host reimage
14:02 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be1001.eqiad.wmnet with OS bookworm
13:57 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
13:57 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
13:57 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
13:57 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
13:54 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
13:54 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
13:51 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
13:51 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
13:50 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
13:49 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1002.eqiad.wmnet with OS bookworm
13:49 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe1002.eqiad.wmnet with OS bookworm
13:49 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
13:47 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
13:47 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
13:47 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
13:45 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
13:40 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo
13:39 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo
13:37 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1002.eqiad.wmnet with OS bookworm
13:35 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1002.eqiad.wmnet
13:34 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes azwiktionary --fix # T367264; 7 pages fixed, 10 links fixed
13:33 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Add VL namespace alias to Azerbaijani Wiktionary (T367264) (duration: 16m 07s)
13:29 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-db1002.eqiad.wmnet
13:28 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
13:28 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
13:23 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamrimmer: Continuing with sync
13:22 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-mariadb1002.eqiad.wmnet
13:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamrimmer: Backport for Add VL namespace alias to Azerbaijani Wiktionary (T367264) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:19 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db1208.eqiad.wmnet
13:19 btullis@cumin1002: START - Cookbook sre.hosts.remove-downtime for db1208.eqiad.wmnet
13:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Add VL namespace alias to Azerbaijani Wiktionary (T367264)
13:16 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
13:16 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
13:16 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-mariadb1002.eqiad.wmnet
13:10 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: sync
13:10 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1004.eqiad.wmnet
13:09 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: sync
13:09 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: sync
13:08 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: sync
13:07 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: sync
13:07 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: sync
13:07 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: sync
13:06 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: sync
13:06 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: sync
13:04 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: sync
13:04 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-coord1004.eqiad.wmnet
12:56 vgutierrez: rolling upgrade on A:cp-eqsin to fifo-log-demux 0.7.5 - T364383
12:53 vgutierrez: disable puppet on A:cp-eqsin before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1047070 - T364383
12:52 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
12:51 marostegui: Deploy schema change on old s4 eqiad master db1160 dbmaint T364069
12:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo
12:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1160', diff saved to https://phabricator.wikimedia.org/P65155 and previous config saved to /var/cache/conftool/dbconfig/20240618-124945-root.json
12:48 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
12:47 fabfur: upgrade haproxy to v2.8.10 on all ulsfo cp hosts (T367756)
12:47 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
12:43 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
12:42 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
12:42 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
12:42 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
12:40 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
12:36 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
12:35 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2003.codfw.wmnet
12:29 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2003.codfw.wmnet
12:22 moritzm: rebalance ganeti eqiad/D following reboots
12:15 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
12:15 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
12:06 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:06 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add IPv6 records for mw, parse and wikikube-worker hosts - cmooney@cumin1002"
12:05 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
12:05 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add IPv6 records for mw, parse and wikikube-worker hosts - cmooney@cumin1002"
12:04 topranks: adding Netbox-generated IPv6 DNS records for wikikube-worker, mw and parse hosts
12:04 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
12:00 cmooney@cumin1002: START - Cookbook sre.dns.netbox
11:59 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
11:59 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
11:59 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
11:58 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
11:58 effie: Slowly pointing mediawiki in eqiad to mw-mcrouter daemonset - T346690
11:54 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:54 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
11:53 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
11:53 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:50 eoghan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) lists.wikimedia.org on all recursors
11:50 eoghan@cumin1002: START - Cookbook sre.dns.wipe-cache lists.wikimedia.org on all recursors
11:48 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1208.eqiad.wmnet with OS bookworm
11:42 marostegui: Delete ipblocks table on clouddb2002-dev (labtestwiki) T367632
11:40 marostegui: Rename ipblocks table on db1169 (enwiki) T367632
11:29 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
11:29 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
11:28 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1002.eqiad.wmnet
11:26 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1208.eqiad.wmnet with reason: host reimage
11:24 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1208.eqiad.wmnet with reason: host reimage
11:22 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-master1002.eqiad.wmnet
11:18 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-druid1001.eqiad.wmnet
11:14 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
11:14 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
11:13 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-druid1001.eqiad.wmnet
11:13 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
11:13 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
11:12 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-ui1001.eqiad.wmnet
11:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65152 and previous config saved to /var/cache/conftool/dbconfig/20240618-111001-marostegui.json
11:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
11:09 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host db1208.eqiad.wmnet with OS bookworm
11:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
11:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65151 and previous config saved to /var/cache/conftool/dbconfig/20240618-110939-marostegui.json
11:08 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-ui1001.eqiad.wmnet
11:08 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
11:08 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
11:07 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
11:05 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1208.eqiad.wmnet with reason: Upgrading to bookworm
11:05 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
11:05 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1208.eqiad.wmnet with reason: Upgrading to bookworm
11:01 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
10:58 fabfur: cp3066 repooled and puppet enabled (T367756)
10:58 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp3066.esams.wmnet
10:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P65150 and previous config saved to /var/cache/conftool/dbconfig/20240618-105432-marostegui.json
10:48 marostegui: dbmaint codfw s2 deploy schema change T364069
10:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P65149 and previous config saved to /var/cache/conftool/dbconfig/20240618-103925-marostegui.json
10:33 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
10:33 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
10:33 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
10:33 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
10:33 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
10:33 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
10:33 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
10:32 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
10:32 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
10:32 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
10:32 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
10:32 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
10:32 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
10:32 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
10:32 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
10:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
10:31 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
10:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
10:30 moritzm: upload openjdk-21 21.0.3+9-2~deb12u2 for bookworm/wikimedia (secondary rebuild on build2001 following the initial bootstrap build) https://phabricator.wikimedia.org/T367487
10:30 cgoubert@deploy1002: Finished scap: Deploy statsd exporter - T365265 (duration: 03m 39s)
10:27 cgoubert@deploy1002: Started scap: Deploy statsd exporter - T365265
10:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65148 and previous config saved to /var/cache/conftool/dbconfig/20240618-102418-marostegui.json
10:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65147 and previous config saved to /var/cache/conftool/dbconfig/20240618-102130-root.json
10:14 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
10:14 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
10:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65146 and previous config saved to /var/cache/conftool/dbconfig/20240618-100624-root.json
10:05 fabfur: cp3066 currently depooled and puppet disabled for T367756
10:04 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp3066.esams.wmnet
09:53 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1019.eqiad.wmnet|wikikube-worker1020.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet),cluster=kubernetes,service=kubesvc
09:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
09:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65145 and previous config saved to /var/cache/conftool/dbconfig/20240618-095119-root.json
09:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
09:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2002.wikimedia.org
09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2002.wikimedia.org
09:36 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65144 and previous config saved to /var/cache/conftool/dbconfig/20240618-093614-root.json
09:27 moritzm: arm keyholder on acmechief2002
09:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65143 and previous config saved to /var/cache/conftool/dbconfig/20240618-092108-root.json
09:13 moritzm: rebooting ganeti2029
09:10 marostegui: dbmaint eqiad s4 deploy schema change T367261
09:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65142 and previous config saved to /var/cache/conftool/dbconfig/20240618-090603-root.json
09:05 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
08:53 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.10 refs T361404
08:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 depool to troubleshoot hardware issues', diff saved to https://phabricator.wikimedia.org/P65141 and previous config saved to /var/cache/conftool/dbconfig/20240618-085254-arnaudb.json
08:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: hardware issues
08:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: hardware issues
08:51 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: repl issues
08:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: repl issues
08:50 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65140 and previous config saved to /var/cache/conftool/dbconfig/20240618-085057-root.json
08:45 hashar@deploy1002: Finished deploy [integration/docroot@7a92240]: doc: Add mwseaql Rust crate (duration: 00m 07s)
08:45 hashar@deploy1002: Started deploy [integration/docroot@7a92240]: doc: Add mwseaql Rust crate
08:43 fabfur: cp4037 currently depooled and puppet disabled for T367756
08:41 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
08:40 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
08:34 marostegui: dbmaint eqiad s6 deploy schema change on eqiad master T364069
08:29 XioNoX: deploy pfw policy update 1718644831 - T367796
07:56 moritzm: uploaded python-irc 8.5.3+dfsg-4+wmf1 to apt.wikimedia.org T331702
07:40 marostegui: dbmaint codfw s7 deploy schema change on codfw master T364069
07:33 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
07:31 kart_: Updated cxserver to 2024-06-13-045621-production (T364122, T138401)
07:30 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
07:29 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
07:28 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
07:28 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
07:26 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
07:26 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
07:20 kartik@deploy1002: Finished scap: Backport for Content Translation: Adjust the Machine translation limit for Telugu WP from 70% to 75% (T367838) (duration: 16m 36s)
07:15 marostegui: dbmaint eqiad s5 deploy schema change on primary master T364069
07:12 marostegui: dbmaint codfw s4 deploy schema change T367261
07:12 marostegui: dbmaint codfw s4 deploy schema change
07:11 kartik@deploy1002: kartik: Continuing with sync
07:09 kartik@deploy1002: kartik: Backport for Content Translation: Adjust the Machine translation limit for Telugu WP from 70% to 75% (T367838) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:04 kartik@deploy1002: Started scap: Backport for Content Translation: Adjust the Machine translation limit for Telugu WP from 70% to 75% (T367838)
06:52 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1240.eqiad.wmnet with reason: data reload
06:52 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1240.eqiad.wmnet with reason: data reload
06:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65139 and previous config saved to /var/cache/conftool/dbconfig/20240618-060100-marostegui.json
06:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
06:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
06:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65138 and previous config saved to /var/cache/conftool/dbconfig/20240618-060038-marostegui.json
05:55 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2102.codfw.wmnet
05:55 jynus@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
05:55 jynus@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2102.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin2002"
05:53 jynus@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2102.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin2002"
05:50 jynus@cumin2002: START - Cookbook sre.dns.netbox
05:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P65137 and previous config saved to /var/cache/conftool/dbconfig/20240618-054531-marostegui.json
05:44 jynus@cumin2002: START - Cookbook sre.hosts.decommission for hosts db2102.codfw.wmnet
05:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P65136 and previous config saved to /var/cache/conftool/dbconfig/20240618-053024-marostegui.json
05:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65135 and previous config saved to /var/cache/conftool/dbconfig/20240618-051517-marostegui.json
05:00 marostegui: dbmaint codfw s5 deploy schema change on db2213 T364299
04:57 marostegui: dbmaint eqiad s2 deploy schema change on db2207 T364299
04:54 marostegui: dbmaint eqiad s4 deploy schema change on db1160 T364299
04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Long schema change
04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Long schema change
04:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1160 T367378', diff saved to https://phabricator.wikimedia.org/P65134 and previous config saved to /var/cache/conftool/dbconfig/20240618-044908-root.json
04:48 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1238 to s4 primary and set section read-write T367378', diff saved to https://phabricator.wikimedia.org/P65133 and previous config saved to /var/cache/conftool/dbconfig/20240618-044806-marostegui.json
04:47 marostegui@cumin1002: dbctl commit (dc=all): 'Set s4 eqiad as read-only for maintenance - T367378', diff saved to https://phabricator.wikimedia.org/P65132 and previous config saved to /var/cache/conftool/dbconfig/20240618-044747-marostegui.json
04:47 marostegui: Starting s4 eqiad failover from db1160 to db1238 - T367378
04:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s4 T367378
04:20 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1238 with weight 0 T367378', diff saved to https://phabricator.wikimedia.org/P65131 and previous config saved to /var/cache/conftool/dbconfig/20240618-042054-marostegui.json
04:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 33 hosts with reason: Primary switchover s4 T367378
04:02 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.7 (duration: 02m 50s)
04:01 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.10 refs T361404 (duration: 58m 57s)
03:03 mwpresync@deploy1002: Started scap: testwikis wikis to 1.43.0-wmf.10 refs T361404
01:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65130 and previous config saved to /var/cache/conftool/dbconfig/20240618-013639-marostegui.json
01:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
01:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
01:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65129 and previous config saved to /var/cache/conftool/dbconfig/20240618-013616-marostegui.json
01:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P65128 and previous config saved to /var/cache/conftool/dbconfig/20240618-012109-marostegui.json
01:10 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4044.ulsfo.wmnet
01:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P65127 and previous config saved to /var/cache/conftool/dbconfig/20240618-010601-marostegui.json
00:57 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS bullseye
00:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65126 and previous config saved to /var/cache/conftool/dbconfig/20240618-005054-marostegui.json
00:34 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
00:31 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
00:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P65125 and previous config saved to /var/cache/conftool/dbconfig/20240618-002823-ladsgroup.json
00:18 zabe@deploy1002: Finished scap: Update interwiki cache (duration: 14m 03s)
00:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65124 and previous config saved to /var/cache/conftool/dbconfig/20240618-001316-ladsgroup.json
00:10 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS bullseye
00:10 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4044.ulsfo.wmnet with OS bullseye
00:05 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=u4cwiki --cluster=all 2>&1 | tee /tmp/u4c.UpdateSearchIndexConfig.log # T366649
00:04 zabe@deploy1002: Started scap: Update interwiki cache
00:02 zabe@deploy1002: Finished scap: T366649 (duration: 15m 16s)
00:00 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS bullseye

2024-06-17

23:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65123 and previous config saved to /var/cache/conftool/dbconfig/20240617-235809-ladsgroup.json
23:52 zabe@deploy1002: zabe: Continuing with sync
23:52 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4044.ulsfo.wmnet
23:51 zabe@deploy1002: zabe: T366649 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:48 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=arbcom_itwiki --cluster=all 2>&1 | tee /tmp/arbcom_it.UpdateSearchIndexConfig.log # T363825
23:47 zabe@deploy1002: Started scap: T366649
23:46 zabe: Create an 'Universal Code of Conduct Coordinating Committee (U4C)' private wiki # T366649
23:44 zabe@deploy1002: Finished scap: T363825 (duration: 15m 00s)
23:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P65122 and previous config saved to /var/cache/conftool/dbconfig/20240617-234302-ladsgroup.json
23:34 zabe@deploy1002: zabe: Continuing with sync
23:34 zabe@deploy1002: zabe: T363825 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:29 zabe@deploy1002: Started scap: T363825
23:29 zabe: create private wiki for itwiki arbcom # T363825
23:23 cdobbins@cumin1002: conftool action : set/pooled=yes; selector: name=cp4043.ulsfo.wmnet
23:14 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4043.ulsfo.wmnet with OS bullseye
22:52 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
22:49 cdobbins@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
22:42 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1041.eqiad.wmnet with OS bookworm
22:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P65121 and previous config saved to /var/cache/conftool/dbconfig/20240617-223010-ladsgroup.json
22:28 cdobbins@cumin1002: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS bullseye
22:26 cdobbins@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4043.ulsfo.wmnet with OS bullseye
22:25 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev200[2-3].codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
22:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
22:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P65120 and previous config saved to /var/cache/conftool/dbconfig/20240617-221503-ladsgroup.json
22:12 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
22:11 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev200[2-3].codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
22:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2001.codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
21:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P65119 and previous config saved to /var/cache/conftool/dbconfig/20240617-215956-ladsgroup.json
21:59 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2001.codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
21:55 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1041.eqiad.wmnet with OS bookworm
21:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P65118 and previous config saved to /var/cache/conftool/dbconfig/20240617-214449-ladsgroup.json
21:41 cdobbins@cumin1002: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS bullseye
21:20 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1040.eqiad.wmnet with OS bookworm
21:09 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=cp4043.ulsfo.wmnet
21:09 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=4043.ulsfo.wmnet
21:05 jforrester@deploy1002: Finished scap: Backport for Fix styles for new heading HTML (T367468) (duration: 18m 57s)
20:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65117 and previous config saved to /var/cache/conftool/dbconfig/20240617-205955-marostegui.json
20:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
20:59 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
20:55 jforrester@deploy1002: jforrester: Continuing with sync
20:52 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
20:50 jforrester@deploy1002: jforrester: Backport for Fix styles for new heading HTML (T367468) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:50 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
20:46 jforrester@deploy1002: Started scap: Backport for Fix styles for new heading HTML (T367468)
20:34 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1040.eqiad.wmnet with OS bookworm
20:33 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1039.eqiad.wmnet with OS bookworm
20:08 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4042.ulsfo.wmnet
20:07 jforrester@deploy1002: jforrester: Continuing with sync
20:07 jforrester@deploy1002: jforrester: Backport for [wikifunctionswiki] Remove right to promote/demote sysops and bureaucrats from staff (T365627), Add a note that you cannot change wgCategoryCollation easily (T362494 T366809) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:06 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
20:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS bullseye
20:02 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
20:02 jforrester@deploy1002: Started scap: Backport for [wikifunctionswiki] Remove right to promote/demote sysops and bureaucrats from staff (T365627), Add a note that you cannot change wgCategoryCollation easily (T362494 T366809)
19:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P65116 and previous config saved to /var/cache/conftool/dbconfig/20240617-195520-ladsgroup.json
19:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2204.codfw.wmnet with reason: Maintenance
19:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2204.codfw.wmnet with reason: Maintenance
19:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1039.eqiad.wmnet with OS bookworm
19:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
19:38 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
19:22 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1038.eqiad.wmnet with OS bookworm
19:15 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS bullseye
19:15 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4042.ulsfo.wmnet with OS bullseye
18:57 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
18:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS bullseye
18:54 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
18:42 ladsgroup@deploy1002: Finished scap: Backport for Change static footer icons to the new one (T256190), Remove footer override (duration: 17m 12s)
18:36 ejegg: fundraising civicrm upgraded from 66acce1f to a25a359b
18:36 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1038.eqiad.wmnet with OS bookworm
18:33 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1037.eqiad.wmnet with OS bookworm
18:30 ladsgroup@deploy1002: ladsgroup, jforrester: Continuing with sync
18:29 ladsgroup@deploy1002: ladsgroup, jforrester: Backport for Change static footer icons to the new one (T256190), Remove footer override synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
18:24 ladsgroup@deploy1002: Started scap: Backport for Change static footer icons to the new one (T256190), Remove footer override
18:19 ladsgroup@deploy1002: Started scap: Backport for Change static footer icons to the new one (T256190)
18:17 ejegg: standalone SmashPig upgraded from 1d1b770c to c8993ec6
18:12 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: sync
18:12 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: sync
18:11 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
18:10 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
18:09 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
18:09 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
18:08 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
18:07 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
18:07 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
18:06 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
18:05 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
18:05 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
18:04 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
18:03 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
18:02 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
18:02 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
18:01 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
18:00 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
17:58 ejegg: fundraising civicrm upgraded from aa127608 to 66acce1f
17:53 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
17:53 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
17:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1037.eqiad.wmnet with OS bookworm
17:37 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
17:36 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
17:35 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
17:34 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
17:34 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
17:33 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
17:32 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
17:31 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
17:30 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
17:29 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
17:18 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4042.ulsfo.wmnet
17:17 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
17:16 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
17:07 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
17:06 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
17:05 claime: Pooling and uncordoning wikikube-worker1019.eqiad.wmnet,wikikube-worker1020.eqiad.wmnet,wikikube-worker1021.eqiad.wmnet - T351074
17:02 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet
16:59 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: sync
16:59 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: sync
16:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1021.eqiad.wmnet with OS bullseye
16:58 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
16:58 claime: homer 'cr*eqiad*' commit 'T351074'
16:58 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
16:43 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: sync
16:43 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: sync
16:42 mnz@deploy1002: Finished deploy [airflow-dags/research@5e1cd80]: (no justification provided) (duration: 00m 32s)
16:42 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
16:42 mnz@deploy1002: Started deploy [airflow-dags/research@5e1cd80]: (no justification provided)
16:42 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
16:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1021.eqiad.wmnet with reason: host reimage
16:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1019.eqiad.wmnet with reason: host reimage
16:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1020.eqiad.wmnet with reason: host reimage
16:32 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1021.eqiad.wmnet with reason: host reimage
16:31 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1019.eqiad.wmnet with reason: host reimage
16:30 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1020.eqiad.wmnet with reason: host reimage
16:30 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
16:29 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
16:29 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
16:29 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
16:29 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
16:28 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
16:27 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
16:27 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
16:26 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
16:25 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
16:25 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt-wdqs1003.eqiad.wmnet
16:25 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:25 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
16:24 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
16:21 andrew@cumin1002: START - Cookbook sre.dns.netbox
16:16 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1003.eqiad.wmnet
16:16 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1002.eqiad.wmnet
16:16 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:14 andrew@cumin1002: START - Cookbook sre.dns.netbox
16:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1019.eqiad.wmnet with OS bullseye
16:09 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 14m 13s)
16:09 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1028.eqiad.wmnet: Apply update to Java 11 - eevans@cumin1002
16:09 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1019.eqiad.wmnet with OS bullseye
16:08 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1002.eqiad.wmnet
16:05 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1001.eqiad.wmnet
16:05 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:03 andrew@cumin1002: START - Cookbook sre.dns.netbox
16:00 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1028.eqiad.wmnet: Apply update to Java 11 - eevans@cumin1002
15:59 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1001.eqiad.wmnet
15:57 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1001.eqiad.wmnet
15:57 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:56 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1002.eqiad.wmnet
15:56 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:56 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
15:55 andrew@cumin1002: START - Cookbook sre.dns.netbox
15:55 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
15:52 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 14m 41s)
15:50 topranks: rebooting cr2-eqdfw to upgrade JunOS T364092
15:49 andrew@cumin1002: START - Cookbook sre.dns.netbox
15:48 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
15:48 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1021.eqiad.wmnet with OS bullseye
15:46 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr3-knams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
15:46 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr3-knams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1020.eqiad.wmnet with OS bullseye
15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1019.eqiad.wmnet with OS bullseye
15:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1019.eqiad.wmnet wikikube-worker1020.eqiad.wmnet wikikube-worker1021.eqiad.wmnet on all recursors
15:46 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1002.eqiad.wmnet
15:46 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1001.eqiad.wmnet
15:46 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1019.eqiad.wmnet wikikube-worker1020.eqiad.wmnet wikikube-worker1021.eqiad.wmnet on all recursors
15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1489 to wikikube-worker1021
15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1021
15:44 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1021
15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1489 to wikikube-worker1021 - cgoubert@cumin1002"
15:43 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1489 to wikikube-worker1021 - cgoubert@cumin1002"
15:41 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
15:41 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1001.eqiad.wmnet
15:41 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1489 to wikikube-worker1021
15:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1447 to wikikube-worker1020
15:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1020
15:39 topranks: deactivate Tranist and peering sessions on cr2-eqdfw in advance of power-supply change T366864
15:39 andrew@cumin1002: START - Cookbook sre.dns.netbox
15:39 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1020
15:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1447 to wikikube-worker1020 - cgoubert@cumin1002"
15:37 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1447 to wikikube-worker1020 - cgoubert@cumin1002"
15:37 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
15:37 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
15:34 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
15:34 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1447 to wikikube-worker1020
15:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1444 to wikikube-worker1019
15:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1019
15:32 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1019
15:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1444 to wikikube-worker1019 - cgoubert@cumin1002"
15:32 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1001.eqiad.wmnet
15:31 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1444 to wikikube-worker1019 - cgoubert@cumin1002"
15:31 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
15:29 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl2002.codfw.wmnet
15:29 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:29 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
15:28 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
15:28 fabfur: upgrading haproxy to 2.8.10 on cp4037 (T367756)
15:28 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp4037.*} and A:cp
15:26 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp4037.*} and A:cp
15:26 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
15:24 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1444 to wikikube-worker1019
15:24 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1444.eqiad.wmnet
15:24 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1444.eqiad.wmnet
15:23 kamila@cumin1002: START - Cookbook sre.dns.netbox
15:21 claime: Depooling mw1444.eqiad.wmnet,mw1447.eqiad.wmnet,mw1489.eqiad.wmnet for reimage - T351074
15:20 topranks: draining transport circuits in/out of eqdfw in advance of router power-supply work/upgrade T366864
15:17 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
15:17 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl2002.codfw.wmnet
15:16 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts wikikube-ctrl2002.codfw.wmnet
15:16 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl2002.codfw.wmnet
15:10 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
15:03 claime: Repooling mw1359.eqiad.wmnet,mw1364.eqiad.wmnet,mw1365.eqiad.wmnet,mw1412.eqiad.wmnet pending fw upgrade - T351074
15:03 cgoubert@cumin1002: conftool action : set/weight=30:pooled=yes; selector: name=(mw1359.eqiad.wmnet|mw1364.eqiad.wmnet|mw1365.eqiad.wmnet|mw1412.eqiad.wmnet)
14:59 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
14:58 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
14:58 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
14:56 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
14:56 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
14:55 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1444.eqiad.wmnet
14:55 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/FAQ On Countering Terrorist and Violent Extremist Content on Wikimedia Projects" "Wikimedia Foundation/Legal/FAQ On Countering Terrorist and Violent Extremist Content on Wikimedia Projects" "Zabe" --reason "per request T367216"
14:54 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1444.eqiad.wmnet
14:53 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
14:52 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet
14:52 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt-wdqs1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
14:50 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Committee appointments/Announcement/Short" "Wikimedia Foundation/Legal/Committee appointments/Announcement/Short" "Zabe" --reason "per request T367216"
14:48 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1412.eqiad.wmnet
14:48 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1412.eqiad.wmnet
14:48 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
14:47 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Committee appointments/Announcement" "Wikimedia Foundation/Legal/Committee appointments/Announcement" "Zabe" --reason "per request T367216"
14:45 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1365.eqiad.wmnet
14:45 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1365.eqiad.wmnet
14:44 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1364.eqiad.wmnet
14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
14:44 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1364.eqiad.wmnet
14:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
14:44 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl2001.codfw.wmnet
14:44 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:44 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
14:43 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
14:43 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
14:41 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Committee appointments" "Wikimedia Foundation/Legal/Committee appointments" "Zabe" --reason "per request T367216"
14:39 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-staging2003 to codfw - jhancock@cumin2002"
14:39 joal@deploy1002: Finished deploy [airflow-dags/analytics@b682892]: (no justification provided) (duration: 00m 33s)
14:38 joal@deploy1002: Started deploy [airflow-dags/analytics@b682892]: (no justification provided)
14:37 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Tools and processes" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Tools and processes" "Zabe" --reason "per request T367217"
14:36 kamila@cumin1002: START - Cookbook sre.dns.netbox
14:34 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Resources/What is a conduct warning" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Resources/What is a conduct warning" "Zabe" --reason "per request T367217"
14:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-staging2003 to codfw - jhancock@cumin2002"
14:31 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Resources" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Resources" "Zabe" --reason "per request T367217"
14:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:30 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1364.eqiad.wmnet
14:29 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1364.eqiad.wmnet
14:28 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee/Legal agreement" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee/Legal agreement" "Zabe" --reason "per request T367217"
14:27 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Brand Stewardship Report" "Wikimedia Foundation/Legal/Brand Stewardship Report" "Zabe" --reason "per request T367216"
14:24 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1359.eqiad.wmnet
14:23 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1359.eqiad.wmnet
14:23 taavi@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt-wdqs1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
14:22 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2001.eqiad.wmnet
14:21 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl2001.codfw.wmnet
14:21 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Announcement/2023 OC and CRC appointments process" "Wikimedia Foundation/Legal/Announcement/2023 OC and CRC appointments process" "Zabe" --reason "per request T367216"
14:18 claime: Depooling mw1359.eqiad.wmnet,mw1364.eqiad.wmnet,mw1365.eqiad.wmnet,mw1412.eqiad.wmnet for reimage - T351074
14:17 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1003.eqiad.wmnet with OS bookworm
14:17 claime: Depooling mw2323.codfw.wmnet,mw2324.codfw.wmnet,mw2326.codfw.wmnet,mw2327.codfw.wmnet,mw2328.codfw.wmnet,mw2329.codfw.wmnet for reimage - T351074
14:17 urbanecm@deploy1002: Finished scap: Backport for Growth: Enable CommunityConfiguration on arwiki, eswiki (T364895) (duration: 15m 34s)
14:16 Amir1: killing updateMenteeData.php --wiki=enwiki --statsd --dbshard s1
14:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for 4 mw servers - cgoubert@cumin1002"
14:11 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for 4 mw servers - cgoubert@cumin1002"
14:11 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/talkheader" "Wikimedia Foundation/Legal/2023 ToU updates/talkheader" "Zabe" --reason "per request T367216"
14:08 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:07 taavi@cumin1002: START - Cookbook sre.hosts.dhcp for host cloudvirt-wdqs1001.eqiad.wmnet
14:06 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Proposed update" "Wikimedia Foundation/Legal/2023 ToU updates/Proposed update" "Zabe" --reason "per request T367216"
14:06 urbanecm@deploy1002: urbanecm: Continuing with sync
14:06 vgutierrez: rolling upgrade on A:cp-codfw to fifo-log-demux 0.7.5 - T364383
14:05 urbanecm@deploy1002: urbanecm: Backport for Growth: Enable CommunityConfiguration on arwiki, eswiki (T364895) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:04 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee/Charter" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee/Charter" "Zabe" --reason "per request T367217"
14:02 vgutierrez: disable puppet on A:cp-codfw before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1046681 - T364383
14:01 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee/Call for applicants" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee/Call for applicants" "Zabe" --reason "per request T367217"
14:01 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
14:01 urbanecm@deploy1002: Started scap: Backport for Growth: Enable CommunityConfiguration on arwiki, eswiki (T364895)
14:01 brouberol@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling reboot on A:datahubsearch
14:00 urbanecm@deploy1002: Finished scap: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153) (duration: 16m 47s)
13:54 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
13:52 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1036.eqiad.wmnet with OS bookworm
13:51 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
13:50 urbanecm@deploy1002: urbanecm, lucaswerkmeister-wmde: Continuing with sync
13:48 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
13:48 urbanecm@deploy1002: urbanecm, lucaswerkmeister-wmde: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:48 taavi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
13:45 brouberol@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling reboot on A:datahubsearch
13:44 urbanecm@deploy1002: Started scap: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153)
13:43 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
13:43 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
13:43 urbanecm@deploy1002: Sync cancelled.
13:43 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
13:43 urbanecm@deploy1002: lucaswerkmeister-wmde, urbanecm: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P65112 and previous config saved to /var/cache/conftool/dbconfig/20240617-133951-ladsgroup.json
13:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
13:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
13:37 urbanecm@deploy1002: Started scap: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153)
13:34 claime: Drained and cordoned wikikube-ctrl2001.codfw.wmnet wikikube-ctrl2002.codfw.wmnet
13:33 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1003.eqiad.wmnet with OS bookworm
13:33 claime: Uncordoned wikikube-ctrl2003.codfw.wmnet
13:33 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
13:33 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
13:26 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
13:25 urbanecm@deploy1002: Finished scap: Backport for Enable subpages for the main namespace in sourceswiki (T367674), CommunityConfiguration: set feedback url instead of bug tool (T363801) (duration: 23m 07s)
13:24 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
13:14 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2002.codfw.wmnet
13:14 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2001.codfw.wmnet
13:14 brouberol@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-jumbo-eqiad
13:13 vgutierrez: rolling upgrade on A:cp-ulsfo to fifo-log-demux 0.7.5 - T364383
13:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
13:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
13:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T352010)', diff saved to https://phabricator.wikimedia.org/P65111 and previous config saved to /var/cache/conftool/dbconfig/20240617-131222-ladsgroup.json
13:10 urbanecm@deploy1002: urbanecm, jhsoby, sgimeno: Continuing with sync
13:07 urbanecm@deploy1002: urbanecm, jhsoby, sgimeno: Backport for Enable subpages for the main namespace in sourceswiki (T367674), CommunityConfiguration: set feedback url instead of bug tool (T363801) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:05 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1036.eqiad.wmnet with OS bookworm
13:03 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1036.eqiad.wmnet with reason: reimage and move to OVS
13:03 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1036.eqiad.wmnet with reason: reimage and move to OVS
13:03 vgutierrez: disable puppet on A:cp-ulsfo before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1046665 - T364383
13:02 urbanecm@deploy1002: Started scap: Backport for Enable subpages for the main namespace in sourceswiki (T367674), CommunityConfiguration: set feedback url instead of bug tool (T363801)
12:59 joal@deploy1002: Finished deploy [airflow-dags/analytics@a8843e6]: (no justification provided) (duration: 00m 03s)
12:59 joal@deploy1002: Started deploy [airflow-dags/analytics@a8843e6]: (no justification provided)
12:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65110 and previous config saved to /var/cache/conftool/dbconfig/20240617-125715-ladsgroup.json
12:53 vgutierrez: upload fifo-log-demux 0.7.5 to apt.wm.o (bullseye-wikimedia)
12:47 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
12:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65109 and previous config saved to /var/cache/conftool/dbconfig/20240617-124207-ladsgroup.json
12:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
12:34 vgutierrez: upgrading HAProxy to version 2.8.10 on cp4051
12:34 vgutierrez: fetch HAProxy 2.8.10 into thirdparty/haproxy28 component for bullseye-wikimedia (apt.wm.o)
12:28 jynus: restarting ms-backup100[12], backup1004-7,11
12:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T352010)', diff saved to https://phabricator.wikimedia.org/P65108 and previous config saved to /var/cache/conftool/dbconfig/20240617-122700-ladsgroup.json
12:14 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2003.codfw.wmnet|wikikube-worker2004.codfw.wmnet|wikikube-worker2007.codfw.wmnet|wikikube-worker2008.codfw.wmnet|wikikube-worker2009.codfw.wmnet|wikikube-worker2010.codfw.wmnet),cluster=kubernetes,service=kubesvc
12:14 claime: pooling and uncordoning wikikube-worker2003.codfw.wmnet wikikube-worker2004.codfw.wmnet wikikube-worker2007.codfw.wmnet wikikube-worker2008.codfw.wmnet wikikube-worker2009.codfw.wmnet wikikube-worker2010.codfw.wmnet - T351074
12:09 ayounsi@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 15830
12:07 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 15830
12:04 jynus: restart db1204, db1205
12:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2008.codfw.wmnet with OS bullseye
12:03 claime: homer 'cr*codfw*' commit 'T351074'
12:02 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1035.eqiad.wmnet with OS bookworm
12:02 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: archiva
12:01 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2003.codfw.wmnet
12:01 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2003.codfw.wmnet
11:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2010.codfw.wmnet with OS bullseye
11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-worker2003.codfw.wmnet
11:54 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-worker2003.codfw.wmnet
11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2009.codfw.wmnet with OS bullseye
11:53 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: archiva
11:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2003.codfw.wmnet with OS bullseye
11:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2007.codfw.wmnet with OS bullseye
11:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2004.codfw.wmnet with OS bullseye
11:47 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
11:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2008.codfw.wmnet with reason: host reimage
11:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2010.codfw.wmnet with reason: host reimage
11:37 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee" "Zabe" --reason "per request T367217"
11:37 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
11:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2009.codfw.wmnet with reason: host reimage
11:34 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
11:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2007.codfw.wmnet with reason: host reimage
11:30 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Office hours/Reminder" "Wikimedia Foundation/Legal/2023 ToU updates/Office hours/Reminder" "Zabe" --reason "per request T367216"
11:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2004.codfw.wmnet with reason: host reimage
11:26 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Office hours/Announcement" "Wikimedia Foundation/Legal/2023 ToU updates/Office hours/Announcement" "Zabe" --reason "per request T367216"
11:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2003.codfw.wmnet with reason: host reimage
11:25 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2010.codfw.wmnet with reason: host reimage
11:25 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2009.codfw.wmnet with reason: host reimage
11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2008.codfw.wmnet with reason: host reimage
11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2007.codfw.wmnet with reason: host reimage
11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2004.codfw.wmnet with reason: host reimage
11:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2003.codfw.wmnet with reason: host reimage
11:23 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2003.codfw.wmnet with reason: host reimage
11:22 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Office hours" "Wikimedia Foundation/Legal/2023 ToU updates/Office hours" "Zabe" --reason "per request T367216"
11:17 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/LandingCNTranslate" "Wikimedia Foundation/Legal/2023 ToU updates/LandingCNTranslate" "Zabe" --reason "per request T367216"
11:17 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on archiva1002.wikimedia.org with reason: Upgrading to bullseye
11:17 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on archiva1002.wikimedia.org with reason: Upgrading to bullseye
11:16 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2003.codfw.wmnet with reason: host reimage
11:16 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1035.eqiad.wmnet with OS bookworm
11:13 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1035.eqiad.wmnet with reason: reimage and move to OVS
11:13 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1035.eqiad.wmnet with reason: reimage and move to OVS
11:11 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/About" "Wikimedia Foundation/Legal/2023 ToU updates/About" "Zabe" --reason "per request T367216"
11:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2010.codfw.wmnet with OS bullseye
11:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2009.codfw.wmnet with OS bullseye
11:08 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2008.codfw.wmnet with OS bullseye
11:08 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2007.codfw.wmnet with OS bullseye
11:08 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2004.codfw.wmnet with OS bullseye
11:07 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2003.codfw.wmnet with OS bullseye
11:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2329 to wikikube-worker2010
11:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2010
11:06 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2010
11:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2329 to wikikube-worker2010 - cgoubert@cumin1002"
11:03 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2329 to wikikube-worker2010 - cgoubert@cumin1002"
11:03 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates" "Wikimedia Foundation/Legal/2023 ToU updates" "Zabe" --reason "per request T367216"
11:01 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
10:59 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
10:58 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:57 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2329 to wikikube-worker2010
10:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2328 to wikikube-worker2009
10:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2009
10:55 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2009
10:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2328 to wikikube-worker2009 - cgoubert@cumin1002"
10:54 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2328 to wikikube-worker2009 - cgoubert@cumin1002"
10:52 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
10:51 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:51 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2328 to wikikube-worker2009
10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2327 to wikikube-worker2008
10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2008
10:50 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2008
10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2327 to wikikube-worker2008 - cgoubert@cumin1002"
10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw2321.codfw.wmnet with reason: hardware issue
10:50 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw2321.codfw.wmnet with reason: hardware issue
10:49 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2327 to wikikube-worker2008 - cgoubert@cumin1002"
10:48 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
10:46 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:46 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2327 to wikikube-worker2008
10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2326 to wikikube-worker2007
10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2007
10:45 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2007
10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2326 to wikikube-worker2007 - cgoubert@cumin1002"
10:43 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2326 to wikikube-worker2007 - cgoubert@cumin1002"
10:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2326 to wikikube-worker2007
10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2324 to wikikube-worker2004
10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2004
10:39 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2004
10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2324 to wikikube-worker2004 - cgoubert@cumin1002"
10:38 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2324 to wikikube-worker2004 - cgoubert@cumin1002"
10:37 jynus: restarting ms-backup200[12], backup2004-7,11
10:35 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:35 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2324 to wikikube-worker2004
10:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2323 to wikikube-worker2003
10:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2003
10:34 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2003
10:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2323 to wikikube-worker2003 - cgoubert@cumin1002"
10:34 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2003
10:34 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2003
10:33 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2323 to wikikube-worker2003 - cgoubert@cumin1002"
10:31 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:31 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2323 to wikikube-worker2003
10:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T367261)', diff saved to https://phabricator.wikimedia.org/P65107 and previous config saved to /var/cache/conftool/dbconfig/20240617-102938-marostegui.json
10:26 jynus: restarting db2183, db2184
10:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for mw232[3-9] - cgoubert@cumin1002"
10:21 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for mw232[3-9] - cgoubert@cumin1002"
10:17 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65106 and previous config saved to /var/cache/conftool/dbconfig/20240617-101431-marostegui.json
10:11 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:10 kamila@cumin1002: START - Cookbook sre.dns.netbox
10:09 claime: Depooling mw2323.codfw.wmnet,mw2324.codfw.wmnet,mw2326.codfw.wmnet,mw2327.codfw.wmnet,mw2328.codfw.wmnet,mw2329.codfw.wmnet for reimage - T351074
10:08 claime: Depooling mw2323.codfw.wmnet,mw2324.codfw.wmnet,mw2326.codfw.wmnet,mw2327.codfw.wmnet,mw2328.codfw.wmnet,mw2329.codfw.wmnet for reimage
10:01 claime: draining and cordoning mw2321 - T367702
10:01 brouberol@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-jumbo-eqiad
10:01 taavi@deploy1002: Finished scap: Backport for Stop loading OSM i18n (T161553) (duration: 34m 07s)
09:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65104 and previous config saved to /var/cache/conftool/dbconfig/20240617-095924-marostegui.json
09:54 jayme@deploy1002: Finished deploy [docker-pkg/deploy@38eb04d]: Update docker-pkg to 4.0.1 (duration: 00m 24s)
09:53 jayme@deploy1002: Started deploy [docker-pkg/deploy@38eb04d]: Update docker-pkg to 4.0.1
09:52 jayme@deploy1002: Finished deploy [docker-pkg/deploy@4dbea81]: Update docker-pkg to 4.0.1 (duration: 00m 38s)
09:51 jayme@deploy1002: Started deploy [docker-pkg/deploy@4dbea81]: Update docker-pkg to 4.0.1
09:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
09:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
09:49 taavi@deploy1002: taavi: Continuing with sync
09:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P65103 and previous config saved to /var/cache/conftool/dbconfig/20240617-094926-marostegui.json
09:48 taavi@deploy1002: taavi: Backport for Stop loading OSM i18n (T161553) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T367261)', diff saved to https://phabricator.wikimedia.org/P65102 and previous config saved to /var/cache/conftool/dbconfig/20240617-094417-marostegui.json
09:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T367261)', diff saved to https://phabricator.wikimedia.org/P65101 and previous config saved to /var/cache/conftool/dbconfig/20240617-094034-marostegui.json
09:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
09:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
09:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
09:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367261)', diff saved to https://phabricator.wikimedia.org/P65100 and previous config saved to /var/cache/conftool/dbconfig/20240617-093427-marostegui.json
09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P65099 and previous config saved to /var/cache/conftool/dbconfig/20240617-093419-marostegui.json
09:26 taavi@deploy1002: Started scap: Backport for Stop loading OSM i18n (T161553)
09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65098 and previous config saved to /var/cache/conftool/dbconfig/20240617-091920-marostegui.json
09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P65097 and previous config saved to /var/cache/conftool/dbconfig/20240617-091912-marostegui.json
09:05 brouberol@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-test-eqiad
09:04 _joe_: removed damaged AOF file for redis rdb1014-6379, resyncing with primary
09:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65096 and previous config saved to /var/cache/conftool/dbconfig/20240617-090413-marostegui.json
09:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P65095 and previous config saved to /var/cache/conftool/dbconfig/20240617-090405-marostegui.json
09:01 urbanecm@deploy1002: Finished scap: Backport for throttle: Fix exemption for ongoing course (duration: 25m 05s)
08:53 claime: hardcycling rdb1014
08:49 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=mw2321.codfw.wmnet
08:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367261)', diff saved to https://phabricator.wikimedia.org/P65094 and previous config saved to /var/cache/conftool/dbconfig/20240617-084906-marostegui.json
08:40 claime: powercycling rdb1014
08:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2189.codfw.wmnet with reason: Maintenance
08:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2189.codfw.wmnet with reason: Maintenance
08:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367261)', diff saved to https://phabricator.wikimedia.org/P65093 and previous config saved to /var/cache/conftool/dbconfig/20240617-083755-marostegui.json
08:36 urbanecm@deploy1002: Started scap: Backport for throttle: Fix exemption for ongoing course
08:25 brouberol@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-test-eqiad
08:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65092 and previous config saved to /var/cache/conftool/dbconfig/20240617-082248-marostegui.json
08:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65091 and previous config saved to /var/cache/conftool/dbconfig/20240617-080741-marostegui.json
07:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367261)', diff saved to https://phabricator.wikimedia.org/P65090 and previous config saved to /var/cache/conftool/dbconfig/20240617-075234-marostegui.json
07:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T367261)', diff saved to https://phabricator.wikimedia.org/P65089 and previous config saved to /var/cache/conftool/dbconfig/20240617-074542-marostegui.json
07:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
07:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
07:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367261)', diff saved to https://phabricator.wikimedia.org/P65088 and previous config saved to /var/cache/conftool/dbconfig/20240617-074530-marostegui.json
07:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65087 and previous config saved to /var/cache/conftool/dbconfig/20240617-073023-marostegui.json
07:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65086 and previous config saved to /var/cache/conftool/dbconfig/20240617-071516-marostegui.json
07:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367261)', diff saved to https://phabricator.wikimedia.org/P65085 and previous config saved to /var/cache/conftool/dbconfig/20240617-070009-marostegui.json
06:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T352010)', diff saved to https://phabricator.wikimedia.org/P65084 and previous config saved to /var/cache/conftool/dbconfig/20240617-065647-ladsgroup.json
06:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
06:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
06:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T352010)', diff saved to https://phabricator.wikimedia.org/P65083 and previous config saved to /var/cache/conftool/dbconfig/20240617-065625-ladsgroup.json
06:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2138 (T367261)', diff saved to https://phabricator.wikimedia.org/P65082 and previous config saved to /var/cache/conftool/dbconfig/20240617-065357-marostegui.json
06:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
06:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
06:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367261)', diff saved to https://phabricator.wikimedia.org/P65081 and previous config saved to /var/cache/conftool/dbconfig/20240617-065335-marostegui.json
06:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P65080 and previous config saved to /var/cache/conftool/dbconfig/20240617-064118-ladsgroup.json
06:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65079 and previous config saved to /var/cache/conftool/dbconfig/20240617-063923-root.json
06:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65078 and previous config saved to /var/cache/conftool/dbconfig/20240617-063826-marostegui.json
06:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P65077 and previous config saved to /var/cache/conftool/dbconfig/20240617-062612-ladsgroup.json
06:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65076 and previous config saved to /var/cache/conftool/dbconfig/20240617-062511-root.json
06:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65075 and previous config saved to /var/cache/conftool/dbconfig/20240617-062418-root.json
06:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65074 and previous config saved to /var/cache/conftool/dbconfig/20240617-062319-marostegui.json
06:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T352010)', diff saved to https://phabricator.wikimedia.org/P65073 and previous config saved to /var/cache/conftool/dbconfig/20240617-061105-ladsgroup.json
06:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65072 and previous config saved to /var/cache/conftool/dbconfig/20240617-061006-root.json
06:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65071 and previous config saved to /var/cache/conftool/dbconfig/20240617-060913-root.json
06:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367261)', diff saved to https://phabricator.wikimedia.org/P65070 and previous config saved to /var/cache/conftool/dbconfig/20240617-060812-marostegui.json
06:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T367261)', diff saved to https://phabricator.wikimedia.org/P65069 and previous config saved to /var/cache/conftool/dbconfig/20240617-060352-marostegui.json
06:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
06:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
06:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
06:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
06:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367261)', diff saved to https://phabricator.wikimedia.org/P65068 and previous config saved to /var/cache/conftool/dbconfig/20240617-060326-marostegui.json
05:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65067 and previous config saved to /var/cache/conftool/dbconfig/20240617-055501-root.json
05:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65066 and previous config saved to /var/cache/conftool/dbconfig/20240617-055407-root.json
05:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65065 and previous config saved to /var/cache/conftool/dbconfig/20240617-054819-marostegui.json
05:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65064 and previous config saved to /var/cache/conftool/dbconfig/20240617-053955-root.json
05:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65063 and previous config saved to /var/cache/conftool/dbconfig/20240617-053902-root.json
05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65062 and previous config saved to /var/cache/conftool/dbconfig/20240617-053312-marostegui.json
05:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65061 and previous config saved to /var/cache/conftool/dbconfig/20240617-052450-root.json
05:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65060 and previous config saved to /var/cache/conftool/dbconfig/20240617-052355-root.json
05:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367261)', diff saved to https://phabricator.wikimedia.org/P65059 and previous config saved to /var/cache/conftool/dbconfig/20240617-051805-marostegui.json
05:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65058 and previous config saved to /var/cache/conftool/dbconfig/20240617-050944-root.json
05:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2125 (T367261)', diff saved to https://phabricator.wikimedia.org/P65057 and previous config saved to /var/cache/conftool/dbconfig/20240617-050852-marostegui.json
05:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65056 and previous config saved to /var/cache/conftool/dbconfig/20240617-050849-root.json
05:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
05:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
05:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P65055 and previous config saved to /var/cache/conftool/dbconfig/20240617-050756-marostegui.json
05:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
05:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
05:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
05:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2122 (T367261)', diff saved to https://phabricator.wikimedia.org/P65054 and previous config saved to /var/cache/conftool/dbconfig/20240617-050324-marostegui.json
05:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
05:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance

2024-06-16

22:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T352010)', diff saved to https://phabricator.wikimedia.org/P65053 and previous config saved to /var/cache/conftool/dbconfig/20240616-221944-ladsgroup.json
22:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
22:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
22:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T352010)', diff saved to https://phabricator.wikimedia.org/P65052 and previous config saved to /var/cache/conftool/dbconfig/20240616-221921-ladsgroup.json
22:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65051 and previous config saved to /var/cache/conftool/dbconfig/20240616-220414-ladsgroup.json
21:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65050 and previous config saved to /var/cache/conftool/dbconfig/20240616-214907-ladsgroup.json
21:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T352010)', diff saved to https://phabricator.wikimedia.org/P65049 and previous config saved to /var/cache/conftool/dbconfig/20240616-213400-ladsgroup.json
14:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T352010)', diff saved to https://phabricator.wikimedia.org/P65047 and previous config saved to /var/cache/conftool/dbconfig/20240616-140214-ladsgroup.json
14:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
14:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
14:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T352010)', diff saved to https://phabricator.wikimedia.org/P65046 and previous config saved to /var/cache/conftool/dbconfig/20240616-140152-ladsgroup.json
13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65045 and previous config saved to /var/cache/conftool/dbconfig/20240616-134645-ladsgroup.json
13:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65044 and previous config saved to /var/cache/conftool/dbconfig/20240616-133137-ladsgroup.json
13:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T352010)', diff saved to https://phabricator.wikimedia.org/P65043 and previous config saved to /var/cache/conftool/dbconfig/20240616-131630-ladsgroup.json
05:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2138 (T352010)', diff saved to https://phabricator.wikimedia.org/P65042 and previous config saved to /var/cache/conftool/dbconfig/20240616-055411-ladsgroup.json
05:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
05:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
05:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T352010)', diff saved to https://phabricator.wikimedia.org/P65041 and previous config saved to /var/cache/conftool/dbconfig/20240616-055359-ladsgroup.json
05:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65040 and previous config saved to /var/cache/conftool/dbconfig/20240616-053852-ladsgroup.json
05:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65039 and previous config saved to /var/cache/conftool/dbconfig/20240616-052345-ladsgroup.json
05:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T352010)', diff saved to https://phabricator.wikimedia.org/P65038 and previous config saved to /var/cache/conftool/dbconfig/20240616-050838-ladsgroup.json
03:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
03:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
03:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T364069)', diff saved to https://phabricator.wikimedia.org/P65037 and previous config saved to /var/cache/conftool/dbconfig/20240616-032102-marostegui.json
03:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P65036 and previous config saved to /var/cache/conftool/dbconfig/20240616-030555-marostegui.json
02:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P65035 and previous config saved to /var/cache/conftool/dbconfig/20240616-025048-marostegui.json
02:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T364069)', diff saved to https://phabricator.wikimedia.org/P65034 and previous config saved to /var/cache/conftool/dbconfig/20240616-023541-marostegui.json
00:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T352010)', diff saved to https://phabricator.wikimedia.org/P65033 and previous config saved to /var/cache/conftool/dbconfig/20240616-000421-ladsgroup.json
00:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
00:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
00:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
00:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
00:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T352010)', diff saved to https://phabricator.wikimedia.org/P65032 and previous config saved to /var/cache/conftool/dbconfig/20240616-000343-ladsgroup.json

2024-06-15

23:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65031 and previous config saved to /var/cache/conftool/dbconfig/20240615-234836-ladsgroup.json
23:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65030 and previous config saved to /var/cache/conftool/dbconfig/20240615-233329-ladsgroup.json
23:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T352010)', diff saved to https://phabricator.wikimedia.org/P65029 and previous config saved to /var/cache/conftool/dbconfig/20240615-231822-ladsgroup.json
21:18 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T364069)', diff saved to https://phabricator.wikimedia.org/P65028 and previous config saved to /var/cache/conftool/dbconfig/20240615-211811-marostegui.json
21:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
21:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
21:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T364069)', diff saved to https://phabricator.wikimedia.org/P65027 and previous config saved to /var/cache/conftool/dbconfig/20240615-211750-marostegui.json
21:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P65026 and previous config saved to /var/cache/conftool/dbconfig/20240615-210243-marostegui.json
20:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P65025 and previous config saved to /var/cache/conftool/dbconfig/20240615-204735-marostegui.json
20:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T364069)', diff saved to https://phabricator.wikimedia.org/P65024 and previous config saved to /var/cache/conftool/dbconfig/20240615-203229-marostegui.json
16:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P65021 and previous config saved to /var/cache/conftool/dbconfig/20240615-163203-marostegui.json
16:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P65020 and previous config saved to /var/cache/conftool/dbconfig/20240615-161656-marostegui.json
16:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65019 and previous config saved to /var/cache/conftool/dbconfig/20240615-160149-marostegui.json
11:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65018 and previous config saved to /var/cache/conftool/dbconfig/20240615-115812-marostegui.json
11:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
11:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
11:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65017 and previous config saved to /var/cache/conftool/dbconfig/20240615-115750-marostegui.json
11:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P65016 and previous config saved to /var/cache/conftool/dbconfig/20240615-114243-marostegui.json
11:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P65015 and previous config saved to /var/cache/conftool/dbconfig/20240615-112736-marostegui.json
11:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65014 and previous config saved to /var/cache/conftool/dbconfig/20240615-111229-marostegui.json
09:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2125 (T352010)', diff saved to https://phabricator.wikimedia.org/P65013 and previous config saved to /var/cache/conftool/dbconfig/20240615-092730-ladsgroup.json
09:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
09:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
07:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65012 and previous config saved to /var/cache/conftool/dbconfig/20240615-071215-marostegui.json
07:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
07:11 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
07:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65011 and previous config saved to /var/cache/conftool/dbconfig/20240615-071152-marostegui.json
06:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P65010 and previous config saved to /var/cache/conftool/dbconfig/20240615-065645-marostegui.json
06:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P65009 and previous config saved to /var/cache/conftool/dbconfig/20240615-064138-marostegui.json
06:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65008 and previous config saved to /var/cache/conftool/dbconfig/20240615-062631-marostegui.json
06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T367261)', diff saved to https://phabricator.wikimedia.org/P65007 and previous config saved to /var/cache/conftool/dbconfig/20240615-061919-marostegui.json
06:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
06:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T367261)', diff saved to https://phabricator.wikimedia.org/P65006 and previous config saved to /var/cache/conftool/dbconfig/20240615-061908-marostegui.json
06:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P65005 and previous config saved to /var/cache/conftool/dbconfig/20240615-060401-marostegui.json
05:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P65004 and previous config saved to /var/cache/conftool/dbconfig/20240615-054854-marostegui.json
05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T367261)', diff saved to https://phabricator.wikimedia.org/P65003 and previous config saved to /var/cache/conftool/dbconfig/20240615-053346-marostegui.json
05:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T367261)', diff saved to https://phabricator.wikimedia.org/P65002 and previous config saved to /var/cache/conftool/dbconfig/20240615-050236-marostegui.json
05:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
05:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
05:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
05:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
02:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
02:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
02:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P65001 and previous config saved to /var/cache/conftool/dbconfig/20240615-024019-ladsgroup.json
02:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65000 and previous config saved to /var/cache/conftool/dbconfig/20240615-023904-marostegui.json
02:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
02:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
02:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P64999 and previous config saved to /var/cache/conftool/dbconfig/20240615-023842-marostegui.json
02:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P64998 and previous config saved to /var/cache/conftool/dbconfig/20240615-022512-ladsgroup.json
02:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P64997 and previous config saved to /var/cache/conftool/dbconfig/20240615-022335-marostegui.json
02:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P64996 and previous config saved to /var/cache/conftool/dbconfig/20240615-021005-ladsgroup.json
02:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P64995 and previous config saved to /var/cache/conftool/dbconfig/20240615-020827-marostegui.json
01:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P64994 and previous config saved to /var/cache/conftool/dbconfig/20240615-015458-ladsgroup.json
01:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P64993 and previous config saved to /var/cache/conftool/dbconfig/20240615-015320-marostegui.json

2024-06-14

23:09 mnz@deploy1002: Finished deploy [airflow-dags/research@ee5a291]: (no justification provided) (duration: 00m 30s)
23:09 mnz@deploy1002: Started deploy [airflow-dags/research@ee5a291]: (no justification provided)
22:55 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4041.ulsfo.wmnet
22:50 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4041.ulsfo.wmnet with OS bullseye
22:33 mnz@deploy1002: Finished deploy [airflow-dags/research@5e1cd80]: (no justification provided) (duration: 00m 31s)
22:33 mnz@deploy1002: Started deploy [airflow-dags/research@5e1cd80]: (no justification provided)
22:27 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
22:24 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
22:03 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4041.ulsfo.wmnet with OS bullseye
22:02 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4041.ulsfo.wmnet with OS bullseye
21:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P64992 and previous config saved to /var/cache/conftool/dbconfig/20240614-214910-marostegui.json
21:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
21:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
21:46 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4041.ulsfo.wmnet with OS bullseye
21:33 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4040.ulsfo.wmnet
21:33 Emperor: restart swift-proxy on ms-fe1010 T360913
21:31 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4041.ulsfo.wmnet
21:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64991 and previous config saved to /var/cache/conftool/dbconfig/20240614-211239-ladsgroup.json
20:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P64990 and previous config saved to /var/cache/conftool/dbconfig/20240614-205731-ladsgroup.json
20:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P64989 and previous config saved to /var/cache/conftool/dbconfig/20240614-204224-ladsgroup.json
20:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64988 and previous config saved to /var/cache/conftool/dbconfig/20240614-202717-ladsgroup.json
20:22 cdobbins@cumin1002: conftool action : set/pooled=yes; selector: name=4040.ulsfo.wmnet
20:14 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS bullseye
19:52 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
19:49 cdobbins@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
19:27 cdobbins@cumin1002: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS bullseye
19:27 cdobbins@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4040.ulsfo.wmnet with OS bullseye
19:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P64987 and previous config saved to /var/cache/conftool/dbconfig/20240614-192643-ladsgroup.json
19:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
19:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
19:00 cdobbins@cumin1002: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS bullseye
18:54 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=4040.ulsfo.wmnet
17:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
17:23 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
17:11 jdrewniak@deploy1002: Finished scap: Backport for For now scope hatnote and infobox styles (T367462) (duration: 16m 06s)
17:01 jdrewniak@deploy1002: jdlrobson, jdrewniak: Continuing with sync
16:31 jan_drewniak: starting friday backport for T367462 https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaMessages/+/1043827
16:25 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
16:22 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
16:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1002.eqiad.wmnet with OS bookworm
16:01 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS bullseye
16:00 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4039.ulsfo.wmnet with OS bullseye
15:58 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1002.eqiad.wmnet with reason: host reimage
15:55 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1002.eqiad.wmnet with reason: host reimage
15:48 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS bullseye
15:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
15:37 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be1002.eqiad.wmnet with OS bookworm
15:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
15:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
15:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T364069)', diff saved to https://phabricator.wikimedia.org/P64984 and previous config saved to /var/cache/conftool/dbconfig/20240614-153727-marostegui.json
15:37 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-be1002.eqiad.wmnet with OS bookworm
15:32 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:32 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:31 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:31 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:29 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:27 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be1002.eqiad.wmnet with OS bookworm
15:27 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:27 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:26 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:25 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4039.ulsfo.wmnet
15:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P64982 and previous config saved to /var/cache/conftool/dbconfig/20240614-152220-marostegui.json
15:21 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:21 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P64981 and previous config saved to /var/cache/conftool/dbconfig/20240614-150713-marostegui.json
14:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be2003.codfw.wmnet with OS bookworm
14:54 jynus: upgrade db1245 to mariadb 10.6 T360751
14:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T364069)', diff saved to https://phabricator.wikimedia.org/P64980 and previous config saved to /var/cache/conftool/dbconfig/20240614-145206-marostegui.json
14:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367261)', diff saved to https://phabricator.wikimedia.org/P64979 and previous config saved to /var/cache/conftool/dbconfig/20240614-144925-marostegui.json
14:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P64978 and previous config saved to /var/cache/conftool/dbconfig/20240614-143418-marostegui.json
14:34 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be2003.codfw.wmnet with reason: host reimage
14:31 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be2003.codfw.wmnet with reason: host reimage
14:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P64976 and previous config saved to /var/cache/conftool/dbconfig/20240614-141911-marostegui.json
14:16 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
14:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
14:12 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bookworm
14:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be2002.codfw.wmnet with OS bookworm
14:11 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
14:11 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1034.eqiad.wmnet with OS bookworm
14:10 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
14:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ldap-maint hosts - jmm@cumin2002 - T367490"
14:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367261)', diff saved to https://phabricator.wikimedia.org/P64975 and previous config saved to /var/cache/conftool/dbconfig/20240614-140404-marostegui.json
14:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T367261)', diff saved to https://phabricator.wikimedia.org/P64974 and previous config saved to /var/cache/conftool/dbconfig/20240614-140125-marostegui.json
14:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2211.codfw.wmnet with reason: Maintenance
14:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2211.codfw.wmnet with reason: Maintenance
13:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2201.codfw.wmnet with reason: Maintenance
13:59 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2201.codfw.wmnet with reason: Maintenance
13:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367261)', diff saved to https://phabricator.wikimedia.org/P64973 and previous config saved to /var/cache/conftool/dbconfig/20240614-135900-marostegui.json
13:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
13:52 jynus: restart db2139, db2141
13:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be2002.codfw.wmnet with reason: host reimage
13:49 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ldap-maint hosts - jmm@cumin2002 - T367490"
13:47 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be2002.codfw.wmnet with reason: host reimage
13:44 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
13:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P64972 and previous config saved to /var/cache/conftool/dbconfig/20240614-134354-marostegui.json
13:41 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
13:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P64971 and previous config saved to /var/cache/conftool/dbconfig/20240614-132847-marostegui.json
13:28 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2002.codfw.wmnet with OS bookworm
13:24 jynus: restart db1216, db1225, db1240, db1245
13:23 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1034.eqiad.wmnet with OS bookworm
13:22 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1034.eqiad.wmnet with reason: reimage and move to OVS
13:22 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1034.eqiad.wmnet with reason: reimage and move to OVS
13:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be2001.codfw.wmnet with OS bookworm
13:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367261)', diff saved to https://phabricator.wikimedia.org/P64970 and previous config saved to /var/cache/conftool/dbconfig/20240614-131339-marostegui.json
13:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T367261)', diff saved to https://phabricator.wikimedia.org/P64969 and previous config saved to /var/cache/conftool/dbconfig/20240614-131113-marostegui.json
13:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2192.codfw.wmnet with reason: Maintenance
13:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2192.codfw.wmnet with reason: Maintenance
13:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367261)', diff saved to https://phabricator.wikimedia.org/P64968 and previous config saved to /var/cache/conftool/dbconfig/20240614-131051-marostegui.json
13:05 jynus: restart db1150, db1171
12:58 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:58 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:58 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be2001.codfw.wmnet with reason: host reimage
12:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P64967 and previous config saved to /var/cache/conftool/dbconfig/20240614-125543-marostegui.json
12:54 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be2001.codfw.wmnet with reason: host reimage
12:51 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2001.codfw.wmnet with OS bookworm
12:45 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gitlab2002.wikimedia.org with reason: GitLab upgrade
12:45 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:15:00 on gitlab2002.wikimedia.org with reason: GitLab upgrade
12:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P64966 and previous config saved to /var/cache/conftool/dbconfig/20240614-124036-marostegui.json
12:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367261)', diff saved to https://phabricator.wikimedia.org/P64964 and previous config saved to /var/cache/conftool/dbconfig/20240614-122530-marostegui.json
12:23 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host moss-be2001.codfw.wmnet with OS bookworm
12:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T367261)', diff saved to https://phabricator.wikimedia.org/P64963 and previous config saved to /var/cache/conftool/dbconfig/20240614-122255-marostegui.json
12:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
12:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
12:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367261)', diff saved to https://phabricator.wikimedia.org/P64962 and previous config saved to /var/cache/conftool/dbconfig/20240614-122233-marostegui.json
12:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64961 and previous config saved to /var/cache/conftool/dbconfig/20240614-122210-ladsgroup.json
12:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
12:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
12:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P64960 and previous config saved to /var/cache/conftool/dbconfig/20240614-120918-ladsgroup.json
12:09 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on clouddb1018.eqiad.wmnet with reason: hardware issues T367499
12:08 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on clouddb1018.eqiad.wmnet with reason: hardware issues T367499
12:08 fnegri@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host clouddb1018.eqiad.wmnet
12:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P64959 and previous config saved to /var/cache/conftool/dbconfig/20240614-120727-marostegui.json
12:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64958 and previous config saved to /var/cache/conftool/dbconfig/20240614-120704-ladsgroup.json
12:01 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: GitLab to new version
11:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P64957 and previous config saved to /var/cache/conftool/dbconfig/20240614-115411-ladsgroup.json
11:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P64956 and previous config saved to /var/cache/conftool/dbconfig/20240614-115220-marostegui.json
11:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64955 and previous config saved to /var/cache/conftool/dbconfig/20240614-115159-ladsgroup.json
11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64954 and previous config saved to /var/cache/conftool/dbconfig/20240614-114002-ladsgroup.json
11:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance
11:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance
11:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P64953 and previous config saved to /var/cache/conftool/dbconfig/20240614-113904-ladsgroup.json
11:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367261)', diff saved to https://phabricator.wikimedia.org/P64952 and previous config saved to /var/cache/conftool/dbconfig/20240614-113712-marostegui.json
11:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-maint1001.eqiad.wmnet
11:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ldap-maint1001.eqiad.wmnet with OS bookworm
11:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P64951 and previous config saved to /var/cache/conftool/dbconfig/20240614-113654-ladsgroup.json
11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T367261)', diff saved to https://phabricator.wikimedia.org/P64950 and previous config saved to /var/cache/conftool/dbconfig/20240614-113325-marostegui.json
11:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
11:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367261)', diff saved to https://phabricator.wikimedia.org/P64949 and previous config saved to /var/cache/conftool/dbconfig/20240614-113303-marostegui.json
11:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
11:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
11:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P64948 and previous config saved to /var/cache/conftool/dbconfig/20240614-112357-ladsgroup.json
11:21 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1018.eqiad.wmnet
11:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-maint1001.eqiad.wmnet with reason: host reimage
11:18 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1018.eqiad.wmnet with reason: T366555
11:18 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1018.eqiad.wmnet with reason: T366555
11:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P64947 and previous config saved to /var/cache/conftool/dbconfig/20240614-111756-marostegui.json
11:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-maint1001.eqiad.wmnet with reason: host reimage
11:06 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
11:06 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
11:02 jynus: restart backup* hosts
11:02 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2001.codfw.wmnet with OS bookworm
11:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P64946 and previous config saved to /var/cache/conftool/dbconfig/20240614-110249-marostegui.json
11:00 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-be2001.codfw.wmnet with OS bookworm
10:59 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2001.codfw.wmnet with OS bookworm
10:56 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: sync
10:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1018.eqiad.wmnet with reason: T366555
10:55 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on clouddb1018.eqiad.wmnet with reason: T366555
10:55 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: sync
10:55 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: sync
10:54 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: sync
10:54 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-be2001.codfw.wmnet with OS bookworm
10:54 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s7
10:54 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s2
10:54 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: sync
10:53 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: sync
10:53 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: sync
10:52 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: sync
10:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367261)', diff saved to https://phabricator.wikimedia.org/P64945 and previous config saved to /var/cache/conftool/dbconfig/20240614-104742-marostegui.json
10:45 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2001.codfw.wmnet with OS bookworm
10:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe2002.codfw.wmnet with OS bookworm
10:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T367261)', diff saved to https://phabricator.wikimedia.org/P64943 and previous config saved to /var/cache/conftool/dbconfig/20240614-104352-marostegui.json
10:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
10:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
10:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367261)', diff saved to https://phabricator.wikimedia.org/P64942 and previous config saved to /var/cache/conftool/dbconfig/20240614-104330-marostegui.json
10:39 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
10:37 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
10:33 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
10:30 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
10:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P64941 and previous config saved to /var/cache/conftool/dbconfig/20240614-102823-marostegui.json
10:28 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2002.codfw.wmnet with OS bookworm
10:25 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host moss-be2001.codfw.wmnet with OS bookworm
10:17 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-maint1001.eqiad.wmnet with OS bookworm
10:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P64940 and previous config saved to /var/cache/conftool/dbconfig/20240614-101316-marostegui.json
10:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-maint1001.eqiad.wmnet - jmm@cumin2002"
09:59 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-maint1001.eqiad.wmnet - jmm@cumin2002"
09:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367261)', diff saved to https://phabricator.wikimedia.org/P64939 and previous config saved to /var/cache/conftool/dbconfig/20240614-095809-marostegui.json
09:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2128 (T367261)', diff saved to https://phabricator.wikimedia.org/P64938 and previous config saved to /var/cache/conftool/dbconfig/20240614-095434-marostegui.json
09:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
09:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
09:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
09:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
09:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T367261)', diff saved to https://phabricator.wikimedia.org/P64937 and previous config saved to /var/cache/conftool/dbconfig/20240614-095356-marostegui.json
09:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-maint1001.eqiad.wmnet on all recursors
09:45 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
09:45 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-maint1001.eqiad.wmnet on all recursors
09:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:45 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-maint1001.eqiad.wmnet - jmm@cumin2002"
09:44 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
09:44 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
09:43 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
09:43 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: GitLab to new version
09:43 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-maint1001.eqiad.wmnet - jmm@cumin2002"
09:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P64936 and previous config saved to /var/cache/conftool/dbconfig/20240614-093849-marostegui.json
09:37 jynus: upgrade and restart dbprov[12]00[3456]
09:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T364069)', diff saved to https://phabricator.wikimedia.org/P64935 and previous config saved to /var/cache/conftool/dbconfig/20240614-093657-marostegui.json
09:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
09:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
09:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64934 and previous config saved to /var/cache/conftool/dbconfig/20240614-093634-marostegui.json
09:31 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
09:31 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
09:31 jmm@cumin2002: START - Cookbook sre.dns.netbox
09:31 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-maint1001.eqiad.wmnet
09:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
09:30 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
09:29 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
09:29 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
09:25 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
09:25 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply
09:23 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
09:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P64933 and previous config saved to /var/cache/conftool/dbconfig/20240614-092342-marostegui.json
09:23 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/apertium: apply
09:22 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
09:22 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
09:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P64932 and previous config saved to /var/cache/conftool/dbconfig/20240614-092127-marostegui.json
09:14 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
09:13 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
09:10 ryankemper@cumin2002: END (ERROR) - Cookbook sre.hadoop.reboot-workers (exit_code=97) for Hadoop analytics cluster
09:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T367261)', diff saved to https://phabricator.wikimedia.org/P64931 and previous config saved to /var/cache/conftool/dbconfig/20240614-090835-marostegui.json
09:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-maint2001.codfw.wmnet
09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ldap-maint2001.codfw.wmnet with OS bookworm
09:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P64930 and previous config saved to /var/cache/conftool/dbconfig/20240614-090620-marostegui.json
09:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2123 (T367261)', diff saved to https://phabricator.wikimedia.org/P64929 and previous config saved to /var/cache/conftool/dbconfig/20240614-090457-marostegui.json
09:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
09:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
09:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
09:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
09:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
09:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
09:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance
09:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance
08:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
08:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
08:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367261)', diff saved to https://phabricator.wikimedia.org/P64928 and previous config saved to /var/cache/conftool/dbconfig/20240614-085817-marostegui.json
08:55 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2001.codfw.wmnet with OS bookworm
08:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64927 and previous config saved to /var/cache/conftool/dbconfig/20240614-085113-marostegui.json
08:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-maint2001.codfw.wmnet with reason: host reimage
08:47 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-maint2001.codfw.wmnet with reason: host reimage
08:44 marostegui: dbmaint eqiad s8 deploy schema change T367261
08:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Long schema change
08:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Long schema change
08:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P64926 and previous config saved to /var/cache/conftool/dbconfig/20240614-084310-marostegui.json
08:35 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe2002.codfw.wmnet with OS bookworm
08:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-maint2001.codfw.wmnet with OS bookworm
08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-maint2001.codfw.wmnet - jmm@cumin2002"
08:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P64925 and previous config saved to /var/cache/conftool/dbconfig/20240614-082803-marostegui.json
08:27 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-maint2001.codfw.wmnet - jmm@cumin2002"
08:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-maint2001.codfw.wmnet on all recursors
08:27 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-maint2001.codfw.wmnet on all recursors
08:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-maint2001.codfw.wmnet - jmm@cumin2002"
08:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-maint2001.codfw.wmnet - jmm@cumin2002"
08:24 jmm@cumin2002: START - Cookbook sre.dns.netbox
08:24 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-maint2001.codfw.wmnet
08:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
08:21 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
08:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
08:14 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:14 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
08:14 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
08:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367261)', diff saved to https://phabricator.wikimedia.org/P64924 and previous config saved to /var/cache/conftool/dbconfig/20240614-081255-marostegui.json
08:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1213 (T367261)', diff saved to https://phabricator.wikimedia.org/P64923 and previous config saved to /var/cache/conftool/dbconfig/20240614-080938-marostegui.json
08:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
08:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
08:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367261)', diff saved to https://phabricator.wikimedia.org/P64922 and previous config saved to /var/cache/conftool/dbconfig/20240614-080915-marostegui.json
08:03 marostegui: dbmaint codfw s8 deploy schema change T367261
07:56 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2002.codfw.wmnet with OS bookworm
07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P64921 and previous config saved to /var/cache/conftool/dbconfig/20240614-075408-marostegui.json
07:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P64920 and previous config saved to /var/cache/conftool/dbconfig/20240614-073902-marostegui.json
07:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping1003.eqiad.wmnet
07:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
07:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367261)', diff saved to https://phabricator.wikimedia.org/P64919 and previous config saved to /var/cache/conftool/dbconfig/20240614-072354-marostegui.json
07:23 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
07:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T367261)', diff saved to https://phabricator.wikimedia.org/P64918 and previous config saved to /var/cache/conftool/dbconfig/20240614-072034-marostegui.json
07:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1210.eqiad.wmnet with reason: Maintenance
07:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1210.eqiad.wmnet with reason: Maintenance
07:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367261)', diff saved to https://phabricator.wikimedia.org/P64917 and previous config saved to /var/cache/conftool/dbconfig/20240614-072012-marostegui.json
07:19 jmm@cumin2002: START - Cookbook sre.dns.netbox
07:17 marostegui: dbmaint eqiad s1 deploy schema change T367261
07:14 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping1003.eqiad.wmnet
07:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping2003.codfw.wmnet
07:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
07:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Long schema change
07:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Long schema change
07:07 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
07:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P64916 and previous config saved to /var/cache/conftool/dbconfig/20240614-070505-marostegui.json
06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox
06:53 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping2003.codfw.wmnet
06:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P64915 and previous config saved to /var/cache/conftool/dbconfig/20240614-064958-marostegui.json
06:41 marostegui: dbmaint codfw s1 deploy schema change T367261
06:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367261)', diff saved to https://phabricator.wikimedia.org/P64914 and previous config saved to /var/cache/conftool/dbconfig/20240614-063451-marostegui.json
06:34 moritzm: rebalance ganeti/C in eqiad following reboots
06:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T367261)', diff saved to https://phabricator.wikimedia.org/P64913 and previous config saved to /var/cache/conftool/dbconfig/20240614-063138-marostegui.json
06:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
06:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
06:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367261)', diff saved to https://phabricator.wikimedia.org/P64912 and previous config saved to /var/cache/conftool/dbconfig/20240614-063116-marostegui.json
06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P64911 and previous config saved to /var/cache/conftool/dbconfig/20240614-061609-marostegui.json
06:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P64910 and previous config saved to /var/cache/conftool/dbconfig/20240614-060102-marostegui.json
05:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367261)', diff saved to https://phabricator.wikimedia.org/P64909 and previous config saved to /var/cache/conftool/dbconfig/20240614-054555-marostegui.json
05:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T367261)', diff saved to https://phabricator.wikimedia.org/P64908 and previous config saved to /var/cache/conftool/dbconfig/20240614-054041-marostegui.json
05:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
05:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
05:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367261)', diff saved to https://phabricator.wikimedia.org/P64907 and previous config saved to /var/cache/conftool/dbconfig/20240614-054019-marostegui.json
05:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P64906 and previous config saved to /var/cache/conftool/dbconfig/20240614-053023-ladsgroup.json
05:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
05:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
05:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T352010)', diff saved to https://phabricator.wikimedia.org/P64905 and previous config saved to /var/cache/conftool/dbconfig/20240614-053001-ladsgroup.json
05:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P64904 and previous config saved to /var/cache/conftool/dbconfig/20240614-052512-marostegui.json
05:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P64903 and previous config saved to /var/cache/conftool/dbconfig/20240614-051454-ladsgroup.json
05:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P64902 and previous config saved to /var/cache/conftool/dbconfig/20240614-051005-marostegui.json
04:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P64901 and previous config saved to /var/cache/conftool/dbconfig/20240614-045947-ladsgroup.json
04:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367261)', diff saved to https://phabricator.wikimedia.org/P64900 and previous config saved to /var/cache/conftool/dbconfig/20240614-045458-marostegui.json
04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T367261)', diff saved to https://phabricator.wikimedia.org/P64899 and previous config saved to /var/cache/conftool/dbconfig/20240614-045129-marostegui.json
04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
04:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64898 and previous config saved to /var/cache/conftool/dbconfig/20240614-044840-marostegui.json
04:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
04:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
04:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
04:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
04:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T352010)', diff saved to https://phabricator.wikimedia.org/P64897 and previous config saved to /var/cache/conftool/dbconfig/20240614-044440-ladsgroup.json
03:39 cdobbins@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_eqsin
03:39 cdobbins@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_eqsin
01:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T352010)', diff saved to https://phabricator.wikimedia.org/P64896 and previous config saved to /var/cache/conftool/dbconfig/20240614-010717-ladsgroup.json
01:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
01:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance

2024-06-13

23:56 zabe@deploy1002: Finished scap: T361041, Update interwiki cache (duration: 11m 07s)
23:48 foks: removing 7 files for legal compliance
23:45 zabe@deploy1002: Started scap: T361041, Update interwiki cache
23:23 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=sysop_plwiki --cluster=all 2>&1 | tee /tmp/sysop_plwiki.UpdateSearchIndexConfig.log # T361041
23:20 zabe@deploy1002: Finished scap: T361041 (duration: 11m 36s)
23:17 foks: removing 9 files for legal compliance
23:08 zabe@deploy1002: Started scap: T361041
23:06 zabe@deploy1002: Sync cancelled.
23:02 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: Upgrade to Java 11 — T350567 - eevans@cumin1002
23:01 zabe@deploy1002: zabe: T361041 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:59 zabe@deploy1002: Started scap: T361041
22:49 zabe: create plwiki sysop wiki # T361041
22:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
22:05 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
21:33 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-eqiad: Upgrade to Java 11 — T350567 - eevans@cumin1002
21:32 jsn@deploy1002: Finished scap: Backport for Deploy QuickSurvey for Automoderator patroller workstream survey (T362969) (duration: 14m 18s)
21:23 jsn@deploy1002: jsn, kgraessle: Continuing with sync
21:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T364069)', diff saved to https://phabricator.wikimedia.org/P64894 and previous config saved to /var/cache/conftool/dbconfig/20240613-212230-marostegui.json
21:20 jsn@deploy1002: jsn, kgraessle: Backport for Deploy QuickSurvey for Automoderator patroller workstream survey (T362969) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:17 jsn@deploy1002: Started scap: Backport for Deploy QuickSurvey for Automoderator patroller workstream survey (T362969)
21:15 jsn@deploy1002: Finished scap: Backport for Look for iPadOS in user-agent, in addition to iOS. (T362723) (duration: 14m 11s)
21:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P64893 and previous config saved to /var/cache/conftool/dbconfig/20240613-210723-marostegui.json
21:07 jsn@deploy1002: dbrant, jsn: Continuing with sync
21:04 jsn@deploy1002: dbrant, jsn: Backport for Look for iPadOS in user-agent, in addition to iOS. (T362723) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:04 topranks: changing BGP aggregate contribution policy / external route announcement cr2-eqdfw (T367439)
21:03 topranks: changing BGP aggregate contribution policy / external route announcement cr2-eqord (T367439)
21:01 jsn@deploy1002: Started scap: Backport for Look for iPadOS in user-agent, in addition to iOS. (T362723)
20:55 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-codfw: Upgrade to Java 11 — T350567 - eevans@cumin1002
20:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P64892 and previous config saved to /var/cache/conftool/dbconfig/20240613-205215-marostegui.json
20:50 cdobbins@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.eqsin.wmnet
20:44 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS bullseye
20:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T364069)', diff saved to https://phabricator.wikimedia.org/P64891 and previous config saved to /var/cache/conftool/dbconfig/20240613-203708-marostegui.json
20:17 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
20:14 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
20:13 foks: removing 1 file for legal compliance
20:00 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl1003.eqiad.wmnet
19:59 foks: removing 2 files for legal compliance
19:58 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl1003.eqiad.wmnet
19:58 kamila@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl1003.eqiad.wmnet
19:53 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
19:51 foks: removing 2 files for legal compliance
19:51 cdobbins@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS bullseye
19:41 foks: removing 2 files for legal compliance
19:28 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: Upgrade to Java 11 — T350567 - eevans@cumin1002
19:27 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1013.eqiad.wmnet
19:27 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1013.eqiad.wmnet
19:27 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
19:10 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: reimage failing
19:10 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: reimage failing
18:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
18:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
18:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64890 and previous config saved to /var/cache/conftool/dbconfig/20240613-184924-ladsgroup.json
18:36 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
18:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64889 and previous config saved to /var/cache/conftool/dbconfig/20240613-183417-ladsgroup.json
18:29 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.9 refs T361403
18:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
18:28 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
18:26 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
18:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
18:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64888 and previous config saved to /var/cache/conftool/dbconfig/20240613-181911-ladsgroup.json
18:17 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS bullseye
18:16 brennen: 1.43.0-wmf.9 train (T361403): no current blockers, rolling to group2
18:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64887 and previous config saved to /var/cache/conftool/dbconfig/20240613-180404-ladsgroup.json
17:57 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
17:57 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS bullseye
17:39 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
17:33 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4038.ulsfo.wmnet
17:19 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240603/ using stat1009.eqiad.wmnet)
17:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367261)', diff saved to https://phabricator.wikimedia.org/P64886 and previous config saved to /var/cache/conftool/dbconfig/20240613-170602-marostegui.json
16:57 brennen@deploy1002: Finished scap: Backport for Convert local function to arrow function to fix context (T367366) (duration: 16m 51s)
16:43 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:43 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS info - pt1979@cumin2002"
16:43 brennen@deploy1002: jforrester, brennen: Backport for Convert local function to arrow function to fix context (T367366) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:41 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS info - pt1979@cumin2002"
16:40 brennen@deploy1002: Started scap: Backport for Convert local function to arrow function to fix context (T367366)
16:39 pt1979@cumin2002: START - Cookbook sre.dns.netbox
16:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P64884 and previous config saved to /var/cache/conftool/dbconfig/20240613-163547-marostegui.json
16:30 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240603 using stat1009.eqiad.wmnet)
16:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe2002.codfw.wmnet with OS bookworm
16:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240603 using stat1009.eqiad.wmnet)
16:24 mutante: gitlab-replica.wikimedia.org - short downtime - renaming to gitlab-replica-a
16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:23 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 100%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64883 and previous config saved to /var/cache/conftool/dbconfig/20240613-162321-arnaudb.json
16:21 pt1979@cumin2002: START - Cookbook sre.dns.netbox
16:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367261)', diff saved to https://phabricator.wikimedia.org/P64882 and previous config saved to /var/cache/conftool/dbconfig/20240613-162040-marostegui.json
16:18 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
16:18 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Main board swap — T362033
16:18 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Main board swap — T362033
16:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T367261)', diff saved to https://phabricator.wikimedia.org/P64881 and previous config saved to /var/cache/conftool/dbconfig/20240613-161641-marostegui.json
16:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2217.codfw.wmnet with reason: Maintenance
16:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2217.codfw.wmnet with reason: Maintenance
16:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T367261)', diff saved to https://phabricator.wikimedia.org/P64880 and previous config saved to /var/cache/conftool/dbconfig/20240613-161617-marostegui.json
16:11 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
16:11 cdanis: gnt-node failover -f ganeti2028.codfw.wmnet
16:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
16:09 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:08 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
16:08 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
16:08 cdanis: forcibly rebooted ganeti2028, drdbd hung
16:08 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 75%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64878 and previous config saved to /var/cache/conftool/dbconfig/20240613-160816-arnaudb.json
16:07 ebernhardson@deploy1002: Finished deploy [airflow-dags/search@ee5a291]: make public data from wdqs subgraph analysis readable by others (duration: 00m 22s)
16:06 ebernhardson@deploy1002: Started deploy [airflow-dags/search@ee5a291]: make public data from wdqs subgraph analysis readable by others
16:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2220 (T364069)', diff saved to https://phabricator.wikimedia.org/P64877 and previous config saved to /var/cache/conftool/dbconfig/20240613-160453-marostegui.json
16:04 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2220.codfw.wmnet with reason: Maintenance
16:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2220.codfw.wmnet with reason: Maintenance
16:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T364069)', diff saved to https://phabricator.wikimedia.org/P64876 and previous config saved to /var/cache/conftool/dbconfig/20240613-160431-marostegui.json
16:04 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
16:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P64875 and previous config saved to /var/cache/conftool/dbconfig/20240613-160110-marostegui.json
15:54 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
15:53 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 50%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64874 and previous config saved to /var/cache/conftool/dbconfig/20240613-155310-arnaudb.json
15:52 elukey: drop mediawiki-services-restbase docker images from the Docker Registry - T367427
15:51 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
15:50 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2002.codfw.wmnet with OS bookworm
15:50 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-fe2002.codfw.wmnet with OS bookworm
15:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P64873 and previous config saved to /var/cache/conftool/dbconfig/20240613-154924-marostegui.json
15:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P64872 and previous config saved to /var/cache/conftool/dbconfig/20240613-154603-marostegui.json
15:44 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: host reimage
15:42 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: host reimage
15:41 cdobbins@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_eqsin
15:38 cdobbins@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_eqsin
15:38 ChrisDobbins901_: cdobbins@cumin1002 sudo -i cookbook sre.cdn.roll-reboot --alias 'cp-upload_eqsin' --batchsize 1 --reason T366555 --task-id T366555 --grace-sleep 5400
15:38 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply
15:38 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 25%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64871 and previous config saved to /var/cache/conftool/dbconfig/20240613-153805-arnaudb.json
15:37 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/ratelimit: apply
15:37 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply
15:37 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2002.codfw.wmnet with OS bookworm
15:36 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe2002.codfw.wmnet with OS bookworm
15:35 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/ratelimit: apply
15:34 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
15:34 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
15:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P64870 and previous config saved to /var/cache/conftool/dbconfig/20240613-153417-marostegui.json
15:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T367261)', diff saved to https://phabricator.wikimedia.org/P64869 and previous config saved to /var/cache/conftool/dbconfig/20240613-153056-marostegui.json
15:28 Lucas_WMDE: STOPPED lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki enwiki --current --all --touched-after=20240524120000 --start '["55386869"]' 2>&1 | tee -a ~/T315510-enwiki-9; date # Ctrl+C – had slowed down, unnecessary work by this point; was at --start '["55914913"]'
15:28 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
15:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2214 (T367261)', diff saved to https://phabricator.wikimedia.org/P64868 and previous config saved to /var/cache/conftool/dbconfig/20240613-152748-marostegui.json
15:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2214.codfw.wmnet with reason: Maintenance
15:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2214.codfw.wmnet with reason: Maintenance
15:27 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
15:26 elukey: drop mediawiki-services-parsoid docker images from the Docker Registry - T367427
15:25 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2002.codfw.wmnet with OS bookworm
15:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
15:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
15:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367261)', diff saved to https://phabricator.wikimedia.org/P64867 and previous config saved to /var/cache/conftool/dbconfig/20240613-152420-marostegui.json
15:23 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 10%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64866 and previous config saved to /var/cache/conftool/dbconfig/20240613-152300-arnaudb.json
15:22 elukey: drop eventgate-ci docker images from the Docker Registry
15:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T364069)', diff saved to https://phabricator.wikimedia.org/P64865 and previous config saved to /var/cache/conftool/dbconfig/20240613-151910-marostegui.json
15:15 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
15:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P64864 and previous config saved to /var/cache/conftool/dbconfig/20240613-150913-marostegui.json
15:08 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:07 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
15:07 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:07 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
15:07 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:07 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
15:05 volans: upgrading spicerack on cumin1002 to v8.6.0
15:04 topranks: rebooting lsw1-f6-codfw to upgrade JunOS on switch T365983
15:04 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:35:00 on an-worker[1169-1171].eqiad.wmnet,es1039.eqiad.wmnet,ms-be1080.eqiad.wmnet with reason: JunOS upgrade lsw1-f6-eqiad
15:04 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:35:00 on an-worker[1169-1171].eqiad.wmnet,es1039.eqiad.wmnet,ms-be1080.eqiad.wmnet with reason: JunOS upgrade lsw1-f6-eqiad
15:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64863 and previous config saved to /var/cache/conftool/dbconfig/20240613-150332-ladsgroup.json
15:03 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-f6-eqiad,lsw1-f6-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f6-eqiad
15:03 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-f6-eqiad,lsw1-f6-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f6-eqiad
15:01 cdanis@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
15:01 cdanis@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
15:00 cdanis@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
14:59 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1003
14:59 cdanis@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
14:59 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1003
14:59 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:57 kamila@cumin1002: START - Cookbook sre.dns.netbox
14:57 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:57 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1003
14:57 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1003
14:55 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:55 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
14:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P64862 and previous config saved to /var/cache/conftool/dbconfig/20240613-145406-marostegui.json
14:53 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
14:51 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1039.eqiad.wmnet with reason: T365983
14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1039.eqiad.wmnet with reason: T365983
14:50 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 depool ahead of T365983', diff saved to https://phabricator.wikimedia.org/P64861 and previous config saved to /var/cache/conftool/dbconfig/20240613-145035-arnaudb.json
14:49 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
14:49 moritzm: rebalance ganeti/B in eqiad following reboots
14:49 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1033.eqiad.wmnet with OS bookworm
14:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64860 and previous config saved to /var/cache/conftool/dbconfig/20240613-144825-ladsgroup.json
14:47 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1003
14:46 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:46 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
14:45 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1003
14:44 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
14:44 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
14:44 kamila@cumin1002: START - Cookbook sre.dns.netbox
14:44 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
14:44 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
14:41 hashar@deploy1002: Finished deploy [gerrit/gerrit@89042ad]: Gerrit to snapshot version 3.9.5-22-g7380128525 on gerrit1003 # T358762 (duration: 00m 05s)
14:41 hashar@deploy1002: Started deploy [gerrit/gerrit@89042ad]: Gerrit to snapshot version 3.9.5-22-g7380128525 on gerrit1003 # T358762
14:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367261)', diff saved to https://phabricator.wikimedia.org/P64859 and previous config saved to /var/cache/conftool/dbconfig/20240613-143859-marostegui.json
14:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T367261)', diff saved to https://phabricator.wikimedia.org/P64858 and previous config saved to /var/cache/conftool/dbconfig/20240613-143554-marostegui.json
14:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance
14:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance
14:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64857 and previous config saved to /var/cache/conftool/dbconfig/20240613-143531-marostegui.json
14:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64856 and previous config saved to /var/cache/conftool/dbconfig/20240613-143318-ladsgroup.json
14:32 hashar@deploy1002: Finished deploy [gerrit/gerrit@89042ad]: Gerrit to snapshot version 3.9.5-22-g7380128525 on gerrit2002 # T358762 (duration: 00m 07s)
14:32 hashar@deploy1002: Started deploy [gerrit/gerrit@89042ad]: Gerrit to snapshot version 3.9.5-22-g7380128525 on gerrit2002 # T358762
14:27 bblack: authdns-update for https://gerrit.wikimedia.org/r/1042490 (remaps some Facebook ranges to codfw+eqiad)
14:24 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
14:21 cgoubert@deploy1002: Finished scap: Change mwapi listener to mw-api-int - T333120 (duration: 06m 47s)
14:21 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
14:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P64855 and previous config saved to /var/cache/conftool/dbconfig/20240613-142024-marostegui.json
14:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64854 and previous config saved to /var/cache/conftool/dbconfig/20240613-141810-ladsgroup.json
14:16 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
14:16 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
14:15 cgoubert@deploy1002: Started scap: Change mwapi listener to mw-api-int - T333120
14:05 Lucas_WMDE: UTC afternoon backport+config window done
14:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Load EntitySchema on Test Wikidata clients (T363153) (duration: 14m 14s)
14:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P64853 and previous config saved to /var/cache/conftool/dbconfig/20240613-140517-marostegui.json
14:03 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1033.eqiad.wmnet with OS bookworm
14:00 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1033.eqiad.wmnet with reason: reimage and move to OVS
14:00 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: sync
13:59 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1033.eqiad.wmnet with reason: reimage and move to OVS
13:59 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: sync
13:56 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
13:55 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: sync
13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64852 and previous config saved to /var/cache/conftool/dbconfig/20240613-135523-ladsgroup.json
13:55 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: sync
13:55 claime: roll-restarting shellbox-constraints
13:53 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Load EntitySchema on Test Wikidata clients (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:51 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Load EntitySchema on Test Wikidata clients (T363153)
13:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64851 and previous config saved to /var/cache/conftool/dbconfig/20240613-135010-marostegui.json
13:48 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
13:47 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
13:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64850 and previous config saved to /var/cache/conftool/dbconfig/20240613-134701-marostegui.json
13:47 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:40:00 on lsw1-f6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
13:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
13:46 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:40:00 on lsw1-f6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
13:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
13:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367261)', diff saved to https://phabricator.wikimedia.org/P64849 and previous config saved to /var/cache/conftool/dbconfig/20240613-134639-marostegui.json
13:45 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [svwikt] Add a temporary logo for the 100.000 pages (T364247) (duration: 13m 24s)
13:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T352010)', diff saved to https://phabricator.wikimedia.org/P64848 and previous config saved to /var/cache/conftool/dbconfig/20240613-134456-ladsgroup.json
13:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
13:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64847 and previous config saved to /var/cache/conftool/dbconfig/20240613-134017-ladsgroup.json
13:36 logmsgbot: lucaswerkmeister-wmde@deploy1002 superpes, lucaswerkmeister-wmde: Continuing with sync
13:34 logmsgbot: lucaswerkmeister-wmde@deploy1002 superpes, lucaswerkmeister-wmde: Backport for [svwikt] Add a temporary logo for the 100.000 pages (T364247) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:33 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:33 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
13:32 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [svwikt] Add a temporary logo for the 100.000 pages (T364247)
13:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P64846 and previous config saved to /var/cache/conftool/dbconfig/20240613-133132-marostegui.json
13:30 volans: upgrading spicerack on cumin2002 to v8.6.0
13:26 moritzm: installing pillow security updates
13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64845 and previous config saved to /var/cache/conftool/dbconfig/20240613-132512-ladsgroup.json
13:18 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1032.eqiad.wmnet with OS bookworm
13:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64844 and previous config saved to /var/cache/conftool/dbconfig/20240613-131746-ladsgroup.json
13:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
13:17 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
13:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P64843 and previous config saved to /var/cache/conftool/dbconfig/20240613-131625-marostegui.json
13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P64842 and previous config saved to /var/cache/conftool/dbconfig/20240613-131006-ladsgroup.json
13:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
13:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
13:06 moritzm: installing pillow security updates
13:03 jmm@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet
13:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367261)', diff saved to https://phabricator.wikimedia.org/P64841 and previous config saved to /var/cache/conftool/dbconfig/20240613-130117-marostegui.json
12:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T367261)', diff saved to https://phabricator.wikimedia.org/P64840 and previous config saved to /var/cache/conftool/dbconfig/20240613-125700-marostegui.json
12:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
12:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
12:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T367261)', diff saved to https://phabricator.wikimedia.org/P64839 and previous config saved to /var/cache/conftool/dbconfig/20240613-125648-marostegui.json
12:52 jmm@cumin1002: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
12:51 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
12:48 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
12:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P64838 and previous config saved to /var/cache/conftool/dbconfig/20240613-124141-marostegui.json
12:39 elukey: reset BIOS/BMC to factory default on sretest1001 - T365372
12:30 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1032.eqiad.wmnet with OS bookworm
12:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P64837 and previous config saved to /var/cache/conftool/dbconfig/20240613-122634-marostegui.json
12:26 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1032.eqiad.wmnet with reason: reimage and move to OVS
12:26 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1032.eqiad.wmnet with reason: reimage and move to OVS
12:21 ladsgroup@deploy1002: Finished scap: Backport for Temporarily bump circuit breaking threshold to 350 (duration: 12m 13s)
12:20 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:19 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:17 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:16 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:15 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:12 ladsgroup@deploy1002: ladsgroup: Continuing with sync
12:12 ladsgroup@deploy1002: ladsgroup: Backport for Temporarily bump circuit breaking threshold to 350 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T367261)', diff saved to https://phabricator.wikimedia.org/P64836 and previous config saved to /var/cache/conftool/dbconfig/20240613-121127-marostegui.json
12:09 ladsgroup@deploy1002: Started scap: Backport for Temporarily bump circuit breaking threshold to 350
12:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T367261)', diff saved to https://phabricator.wikimedia.org/P64835 and previous config saved to /var/cache/conftool/dbconfig/20240613-120711-marostegui.json
12:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
12:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
12:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
12:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
12:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367261)', diff saved to https://phabricator.wikimedia.org/P64834 and previous config saved to /var/cache/conftool/dbconfig/20240613-120644-marostegui.json
11:58 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
11:57 fabfur: enabling puppet && repool cp4037 (T360454)
11:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P64832 and previous config saved to /var/cache/conftool/dbconfig/20240613-115137-marostegui.json
11:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P64831 and previous config saved to /var/cache/conftool/dbconfig/20240613-113630-marostegui.json
11:35 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
11:29 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
11:28 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
11:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2001.codfw.wmnet
11:22 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
11:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367261)', diff saved to https://phabricator.wikimedia.org/P64830 and previous config saved to /var/cache/conftool/dbconfig/20240613-112122-marostegui.json
11:20 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubemaster2001.codfw.wmnet
11:19 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2003.codfw.wmnet
11:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T367261)', diff saved to https://phabricator.wikimedia.org/P64829 and previous config saved to /var/cache/conftool/dbconfig/20240613-111706-marostegui.json
11:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
11:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64828 and previous config saved to /var/cache/conftool/dbconfig/20240613-111655-ladsgroup.json
11:16 moritzm: installing pillow security updates
11:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
11:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
11:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367261)', diff saved to https://phabricator.wikimedia.org/P64827 and previous config saved to /var/cache/conftool/dbconfig/20240613-111642-marostegui.json
11:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
11:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P64826 and previous config saved to /var/cache/conftool/dbconfig/20240613-111633-ladsgroup.json
11:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2002.codfw.wmnet
11:09 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
11:08 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:08 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:07 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubemaster2002.codfw.wmnet
11:01 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
11:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P64825 and previous config saved to /var/cache/conftool/dbconfig/20240613-110135-marostegui.json
11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P64824 and previous config saved to /var/cache/conftool/dbconfig/20240613-110126-ladsgroup.json
10:59 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:58 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1001.eqiad.wmnet
10:55 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
10:52 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
10:49 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
10:49 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubemaster1001.eqiad.wmnet
10:48 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:48 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:48 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
10:48 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:47 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1002.eqiad.wmnet
10:47 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:46 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:46 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P64823 and previous config saved to /var/cache/conftool/dbconfig/20240613-104628-marostegui.json
10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P64822 and previous config saved to /var/cache/conftool/dbconfig/20240613-104619-ladsgroup.json
10:43 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
10:42 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
10:41 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
10:41 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2010.codfw.wmnet
10:41 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubemaster1002.eqiad.wmnet
10:39 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
10:34 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main2010.codfw.wmnet
10:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2009.codfw.wmnet
10:33 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
10:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367261)', diff saved to https://phabricator.wikimedia.org/P64821 and previous config saved to /var/cache/conftool/dbconfig/20240613-103120-marostegui.json
10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P64820 and previous config saved to /var/cache/conftool/dbconfig/20240613-103111-ladsgroup.json
10:31 cmooney@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1003']
10:30 cmooney@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1003']
10:29 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
10:29 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
10:28 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main2009.codfw.wmnet
10:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2008.codfw.wmnet
10:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T367261)', diff saved to https://phabricator.wikimedia.org/P64819 and previous config saved to /var/cache/conftool/dbconfig/20240613-102659-marostegui.json
10:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
10:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[2287-2290].codfw.wmnet
10:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[2287-2290].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
10:26 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
10:26 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
10:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
10:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
10:23 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[2287-2290].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
10:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
10:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
10:22 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main2008.codfw.wmnet
10:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2007.codfw.wmnet
10:21 hashar: Gerrit upgrade completed
10:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
10:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
10:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367261)', diff saved to https://phabricator.wikimedia.org/P64818 and previous config saved to /var/cache/conftool/dbconfig/20240613-102016-marostegui.json
10:20 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:15 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main2007.codfw.wmnet
10:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2006.codfw.wmnet
10:10 fabfur: cp4037 depooled && puppet disable to profile benthos configuration (T360454)
10:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main2006.codfw.wmnet
10:09 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
10:08 hashar@deploy1002: Finished deploy [gerrit/gerrit@ee8252a]: Gerrit to snapshot version 3.9.5-21-g553ea468a1 on gerrit1003 # T367029 T367135 (duration: 00m 06s)
10:08 hashar@deploy1002: Started deploy [gerrit/gerrit@ee8252a]: Gerrit to snapshot version 3.9.5-21-g553ea468a1 on gerrit1003 # T367029 T367135
10:06 cgoubert@cumin1002: START - Cookbook sre.hosts.decommission for hosts mw[2287-2290].codfw.wmnet
10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[2281,2283-2286].codfw.wmnet
10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[2281,2283-2286].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
10:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P64816 and previous config saved to /var/cache/conftool/dbconfig/20240613-100509-marostegui.json
10:04 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[2281,2283-2286].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
10:04 hashar@deploy1002: Finished deploy [gerrit/gerrit@ee8252a]: Gerrit to snapshot version 3.9.5-21-g553ea468a1 (duration: 00m 08s)
10:04 hashar@deploy1002: Started deploy [gerrit/gerrit@ee8252a]: Gerrit to snapshot version 3.9.5-21-g553ea468a1
10:03 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl2003.codfw.wmnet
10:03 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:03 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
10:02 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
10:02 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:01 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
09:59 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
09:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main1010.eqiad.wmnet
09:53 kamila@cumin1002: START - Cookbook sre.dns.netbox
09:52 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main1010.eqiad.wmnet
09:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main1009.eqiad.wmnet
09:50 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2001.eqiad.wmnet
09:50 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2003.eqiad.wmnet
09:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P64815 and previous config saved to /var/cache/conftool/dbconfig/20240613-095002-marostegui.json
09:47 cgoubert@cumin1002: START - Cookbook sre.hosts.decommission for hosts mw[2281,2283-2286].codfw.wmnet
09:46 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl2003.codfw.wmnet
09:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main1009.eqiad.wmnet
09:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main1008.eqiad.wmnet
09:39 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main1008.eqiad.wmnet
09:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main1007.eqiad.wmnet
09:39 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2001.codfw.wmnet
09:38 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
09:37 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
09:37 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367261)', diff saved to https://phabricator.wikimedia.org/P64814 and previous config saved to /var/cache/conftool/dbconfig/20240613-093455-marostegui.json
09:33 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main1007.eqiad.wmnet
09:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main1006.eqiad.wmnet
09:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1224 (T367261)', diff saved to https://phabricator.wikimedia.org/P64813 and previous config saved to /var/cache/conftool/dbconfig/20240613-093158-marostegui.json
09:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1224.eqiad.wmnet with reason: Maintenance
09:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1224.eqiad.wmnet with reason: Maintenance
09:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367261)', diff saved to https://phabricator.wikimedia.org/P64812 and previous config saved to /var/cache/conftool/dbconfig/20240613-093136-marostegui.json
09:26 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main1006.eqiad.wmnet
09:22 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
09:17 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
09:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P64811 and previous config saved to /var/cache/conftool/dbconfig/20240613-091629-marostegui.json
09:12 arnaudb@cumin1002: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64810 and previous config saved to /var/cache/conftool/dbconfig/20240613-091200-arnaudb.json
09:07 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
09:07 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
09:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P64809 and previous config saved to /var/cache/conftool/dbconfig/20240613-090122-marostegui.json
08:59 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
08:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64808 and previous config saved to /var/cache/conftool/dbconfig/20240613-085654-arnaudb.json
08:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367261)', diff saved to https://phabricator.wikimedia.org/P64807 and previous config saved to /var/cache/conftool/dbconfig/20240613-084615-marostegui.json
08:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T367261)', diff saved to https://phabricator.wikimedia.org/P64806 and previous config saved to /var/cache/conftool/dbconfig/20240613-084310-marostegui.json
08:43 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
08:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
08:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
08:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367261)', diff saved to https://phabricator.wikimedia.org/P64805 and previous config saved to /var/cache/conftool/dbconfig/20240613-084248-marostegui.json
08:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64804 and previous config saved to /var/cache/conftool/dbconfig/20240613-084149-arnaudb.json
08:37 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
08:36 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
08:30 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
08:29 kart_: Updated MinT to 2024-06-12-111204-production (T363563)
08:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P64803 and previous config saved to /var/cache/conftool/dbconfig/20240613-082741-marostegui.json
08:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db2125 (re)pooling @ 25%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64802 and previous config saved to /var/cache/conftool/dbconfig/20240613-082643-arnaudb.json
08:25 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
08:15 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
08:13 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
08:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P64801 and previous config saved to /var/cache/conftool/dbconfig/20240613-081234-marostegui.json
08:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db2125 (re)pooling @ 10%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64800 and previous config saved to /var/cache/conftool/dbconfig/20240613-081138-arnaudb.json
08:11 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
08:08 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2125.codfw.wmnet with reason: index issue
08:08 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2125.codfw.wmnet with reason: index issue
08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'index error depool db2125', diff saved to https://phabricator.wikimedia.org/P64799 and previous config saved to /var/cache/conftool/dbconfig/20240613-080624-arnaudb.json
08:06 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
07:59 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
07:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367261)', diff saved to https://phabricator.wikimedia.org/P64798 and previous config saved to /var/cache/conftool/dbconfig/20240613-075727-marostegui.json
07:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64797 and previous config saved to /var/cache/conftool/dbconfig/20240613-075500-root.json
07:54 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T367261)', diff saved to https://phabricator.wikimedia.org/P64796 and previous config saved to /var/cache/conftool/dbconfig/20240613-075420-marostegui.json
07:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
07:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64795 and previous config saved to /var/cache/conftool/dbconfig/20240613-075358-marostegui.json
07:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64794 and previous config saved to /var/cache/conftool/dbconfig/20240613-073955-root.json
07:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P64793 and previous config saved to /var/cache/conftool/dbconfig/20240613-073851-marostegui.json
07:28 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
07:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64792 and previous config saved to /var/cache/conftool/dbconfig/20240613-072450-root.json
07:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P64791 and previous config saved to /var/cache/conftool/dbconfig/20240613-072344-marostegui.json
07:21 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
07:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64790 and previous config saved to /var/cache/conftool/dbconfig/20240613-070944-root.json
07:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64789 and previous config saved to /var/cache/conftool/dbconfig/20240613-070837-marostegui.json
07:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64788 and previous config saved to /var/cache/conftool/dbconfig/20240613-070531-marostegui.json
07:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
07:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
07:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T367261)', diff saved to https://phabricator.wikimedia.org/P64787 and previous config saved to /var/cache/conftool/dbconfig/20240613-070509-marostegui.json
06:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64786 and previous config saved to /var/cache/conftool/dbconfig/20240613-065439-root.json
06:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P64785 and previous config saved to /var/cache/conftool/dbconfig/20240613-065002-marostegui.json
06:42 moritzm: rebalance ganeti clusters in eqiad following reboots
06:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64784 and previous config saved to /var/cache/conftool/dbconfig/20240613-063934-root.json
06:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P64783 and previous config saved to /var/cache/conftool/dbconfig/20240613-063455-marostegui.json
06:27 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T367261)', diff saved to https://phabricator.wikimedia.org/P64782 and previous config saved to /var/cache/conftool/dbconfig/20240613-061948-marostegui.json
06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T367261)', diff saved to https://phabricator.wikimedia.org/P64781 and previous config saved to /var/cache/conftool/dbconfig/20240613-061636-marostegui.json
06:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
06:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367261)', diff saved to https://phabricator.wikimedia.org/P64780 and previous config saved to /var/cache/conftool/dbconfig/20240613-061613-marostegui.json
06:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P64779 and previous config saved to /var/cache/conftool/dbconfig/20240613-060927-ladsgroup.json
06:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
06:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
06:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64778 and previous config saved to /var/cache/conftool/dbconfig/20240613-060905-ladsgroup.json
06:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P64777 and previous config saved to /var/cache/conftool/dbconfig/20240613-060107-marostegui.json
05:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2218 (T364069)', diff saved to https://phabricator.wikimedia.org/P64776 and previous config saved to /var/cache/conftool/dbconfig/20240613-055747-marostegui.json
05:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance
05:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance
05:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T364069)', diff saved to https://phabricator.wikimedia.org/P64775 and previous config saved to /var/cache/conftool/dbconfig/20240613-055725-marostegui.json
05:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P64774 and previous config saved to /var/cache/conftool/dbconfig/20240613-055358-ladsgroup.json
05:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1238.eqiad.wmnet with reason: Long schema change
05:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1238.eqiad.wmnet with reason: Long schema change
05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P64773 and previous config saved to /var/cache/conftool/dbconfig/20240613-054600-marostegui.json
05:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P64772 and previous config saved to /var/cache/conftool/dbconfig/20240613-054218-marostegui.json
05:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P64771 and previous config saved to /var/cache/conftool/dbconfig/20240613-053851-ladsgroup.json
05:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367261)', diff saved to https://phabricator.wikimedia.org/P64770 and previous config saved to /var/cache/conftool/dbconfig/20240613-053052-marostegui.json
05:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T367261)', diff saved to https://phabricator.wikimedia.org/P64769 and previous config saved to /var/cache/conftool/dbconfig/20240613-052746-marostegui.json
05:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
05:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
05:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367261)', diff saved to https://phabricator.wikimedia.org/P64768 and previous config saved to /var/cache/conftool/dbconfig/20240613-052723-marostegui.json
05:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P64767 and previous config saved to /var/cache/conftool/dbconfig/20240613-052711-marostegui.json
05:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64766 and previous config saved to /var/cache/conftool/dbconfig/20240613-052344-ladsgroup.json
05:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P64765 and previous config saved to /var/cache/conftool/dbconfig/20240613-051216-marostegui.json
05:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T364069)', diff saved to https://phabricator.wikimedia.org/P64764 and previous config saved to /var/cache/conftool/dbconfig/20240613-051204-marostegui.json
04:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P64763 and previous config saved to /var/cache/conftool/dbconfig/20240613-045709-marostegui.json
04:55 marostegui: dbmaint eqiad s5 deploy schema change on db1230 T364299
04:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Long schema change
04:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Long schema change
04:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1230 T367146', diff saved to https://phabricator.wikimedia.org/P64762 and previous config saved to /var/cache/conftool/dbconfig/20240613-045254-root.json
04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1183 to s5 primary and set section read-write T367146', diff saved to https://phabricator.wikimedia.org/P64761 and previous config saved to /var/cache/conftool/dbconfig/20240613-045141-root.json
04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T367146', diff saved to https://phabricator.wikimedia.org/P64760 and previous config saved to /var/cache/conftool/dbconfig/20240613-045121-root.json
04:51 marostegui: Starting s5 eqiad failover from db1230 to db1183 - T367146
04:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367261)', diff saved to https://phabricator.wikimedia.org/P64759 and previous config saved to /var/cache/conftool/dbconfig/20240613-044201-marostegui.json
04:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T367261)', diff saved to https://phabricator.wikimedia.org/P64758 and previous config saved to /var/cache/conftool/dbconfig/20240613-043848-marostegui.json
04:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
04:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
04:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
04:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
04:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s5 T367146
04:32 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1183 with weight 0 T367146', diff saved to https://phabricator.wikimedia.org/P64757 and previous config saved to /var/cache/conftool/dbconfig/20240613-043239-root.json
04:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s5 T367146
00:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T364069)', diff saved to https://phabricator.wikimedia.org/P64756 and previous config saved to /var/cache/conftool/dbconfig/20240613-004247-marostegui.json
00:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2208.codfw.wmnet with reason: Maintenance
00:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2208.codfw.wmnet with reason: Maintenance
00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64755 and previous config saved to /var/cache/conftool/dbconfig/20240613-003507-ladsgroup.json
00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
00:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
00:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P64754 and previous config saved to /var/cache/conftool/dbconfig/20240613-003444-ladsgroup.json
00:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P64753 and previous config saved to /var/cache/conftool/dbconfig/20240613-001937-ladsgroup.json
00:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P64752 and previous config saved to /var/cache/conftool/dbconfig/20240613-000430-ladsgroup.json

2024-06-12

23:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P64751 and previous config saved to /var/cache/conftool/dbconfig/20240612-234923-ladsgroup.json
22:17 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
22:13 krinkle@deploy1002: Finished scap: Backport for Move etcd.php from wmf-config/ to src/ (T308932) (duration: 13m 42s)
22:10 eevans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
22:08 eevans@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
22:07 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4037.ulsfo.wmnet with OS bullseye
22:06 eevans@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
22:04 krinkle@deploy1002: krinkle: Continuing with sync
22:04 eevans@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
22:03 krinkle@deploy1002: krinkle: Backport for Move etcd.php from wmf-config/ to src/ (T308932) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:59 krinkle@deploy1002: Started scap: Backport for Move etcd.php from wmf-config/ to src/ (T308932)
21:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
21:42 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Apply remote logging fix (r1042273) - eevans@cumin1002
21:41 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
21:36 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: sync
21:36 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: sync
21:36 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
21:35 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
21:34 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
21:33 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
21:33 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
21:32 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
21:31 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: sync
21:31 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: sync
21:30 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
21:30 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
21:28 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: sync
21:28 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
21:28 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
21:27 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
21:26 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
21:25 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
21:24 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/page-analytics: apply
21:22 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: sync
21:22 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: sync
21:21 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Apply remote logging fix (r1042273) - eevans@cumin1002
21:20 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs1010.eqiad.wmnet: Apply remote logging fix (r1042273) - eevans@cumin1002
21:19 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
21:18 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
21:17 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
21:17 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
21:13 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs1010.eqiad.wmnet: Apply remote logging fix (r1042273) - eevans@cumin1002
21:11 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
21:05 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
21:05 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
21:04 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
20:53 cjming: end of UTC late backport window
20:52 cjming@deploy1002: Finished scap: Backport for Don't squish images in non-responsive skins e.g. Vector 2010 (T113101) (duration: 12m 52s)
20:47 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
20:44 cjming@deploy1002: cjming, jdlrobson: Continuing with sync
20:42 cjming@deploy1002: cjming, jdlrobson: Backport for Don't squish images in non-responsive skins e.g. Vector 2010 (T113101) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:39 cjming@deploy1002: Started scap: Backport for Don't squish images in non-responsive skins e.g. Vector 2010 (T113101)
20:29 cjming@deploy1002: Finished scap: Backport for Disable quick surveys using deprecated configuration (T367128) (duration: 11m 59s)
20:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T367261)', diff saved to https://phabricator.wikimedia.org/P64750 and previous config saved to /var/cache/conftool/dbconfig/20240612-202233-marostegui.json
20:21 cjming@deploy1002: jdlrobson, cjming: Continuing with sync
20:19 cjming@deploy1002: jdlrobson, cjming: Backport for Disable quick surveys using deprecated configuration (T367128) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:17 cjming@deploy1002: Started scap: Backport for Disable quick surveys using deprecated configuration (T367128)
20:10 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_codfw
20:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P64749 and previous config saved to /var/cache/conftool/dbconfig/20240612-200726-marostegui.json
20:00 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
19:59 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
19:58 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.9 refs T361403
19:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P64748 and previous config saved to /var/cache/conftool/dbconfig/20240612-195219-marostegui.json
19:49 hashar@deploy1002: Finished deploy [gerrit/gerrit@e4c49f9]: wm-patch-demo: silently ignore errors - T367155 (duration: 00m 07s)
19:49 hashar@deploy1002: Started deploy [gerrit/gerrit@e4c49f9]: wm-patch-demo: silently ignore errors - T367155
19:48 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
19:48 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
19:48 brennen: 1.43.0-wmf.9 train (T361403): blockers (hopefully) resolved, rolling to group1
19:46 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
19:45 brennen@deploy1002: Finished scap: Backport for Call NamespaceRegistrationHandler::setConstants() earlier (T367334 T363153) (duration: 13m 06s)
19:45 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
19:43 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
19:43 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
19:41 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
19:40 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
19:40 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
19:39 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
19:39 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
19:38 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/page-analytics: apply
19:37 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
19:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T367261)', diff saved to https://phabricator.wikimedia.org/P64747 and previous config saved to /var/cache/conftool/dbconfig/20240612-193712-marostegui.json
19:36 brennen@deploy1002: brennen: Continuing with sync
19:36 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
19:36 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
19:36 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
19:35 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
19:35 brennen@deploy1002: brennen: Backport for Call NamespaceRegistrationHandler::setConstants() earlier (T367334 T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:35 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
19:34 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
19:34 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
19:34 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
19:33 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
19:32 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
19:32 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
19:32 brennen@deploy1002: Started scap: Backport for Call NamespaceRegistrationHandler::setConstants() earlier (T367334 T363153)
19:32 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
19:31 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
19:31 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
19:30 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
19:30 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
19:30 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
19:29 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
19:29 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
19:28 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
19:27 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
19:26 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
19:25 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
19:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2209 (T367261)', diff saved to https://phabricator.wikimedia.org/P64746 and previous config saved to /var/cache/conftool/dbconfig/20240612-192327-marostegui.json
19:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance
19:23 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
19:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance
19:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T367261)', diff saved to https://phabricator.wikimedia.org/P64745 and previous config saved to /var/cache/conftool/dbconfig/20240612-192303-marostegui.json
19:22 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
19:22 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
19:22 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
19:19 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
19:19 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
19:18 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
19:17 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
19:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
19:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
19:11 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
19:10 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
19:09 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
19:08 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
19:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P64744 and previous config saved to /var/cache/conftool/dbconfig/20240612-190755-marostegui.json
19:06 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
19:06 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
19:03 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
19:02 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
19:02 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
19:02 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
18:59 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
18:59 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
18:59 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
18:58 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
18:58 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
18:57 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
18:55 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
18:52 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
18:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P64742 and previous config saved to /var/cache/conftool/dbconfig/20240612-185248-marostegui.json
18:51 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
18:49 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
18:48 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
18:42 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
18:41 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
18:40 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
18:40 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
18:39 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
18:39 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
18:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T367261)', diff saved to https://phabricator.wikimedia.org/P64741 and previous config saved to /var/cache/conftool/dbconfig/20240612-183741-marostegui.json
18:24 ejegg: fundraising civicrm upgraded from 955166d1 to 76857844
18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2205 (T367261)', diff saved to https://phabricator.wikimedia.org/P64740 and previous config saved to /var/cache/conftool/dbconfig/20240612-182343-marostegui.json
18:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance
18:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance
18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T367261)', diff saved to https://phabricator.wikimedia.org/P64739 and previous config saved to /var/cache/conftool/dbconfig/20240612-182321-marostegui.json
18:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P64738 and previous config saved to /var/cache/conftool/dbconfig/20240612-180814-marostegui.json
18:04 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
18:01 brennen: 1.43.0-wmf.9 train (T361403): currently blocked on T367334, holding at group0 until resolved.
17:59 mutante: gitlab-replica-old - downtime, renaming to gitlab-replica-b
17:58 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on gitlab-replica-old.wikimedia.org with reason: renaming gitlab-replica
17:58 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab-replica-old.wikimedia.org with reason: renaming gitlab-replica
17:58 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
17:57 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab1003.wikimedia.org with reason: renaming gitlab-replica
17:57 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab1003.wikimedia.org with reason: renaming gitlab-replica
17:56 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
17:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P64737 and previous config saved to /var/cache/conftool/dbconfig/20240612-175306-marostegui.json
17:52 brett: authdns-update run on dns1004 (T364891)
17:51 brett: Repool ulsfo as A:cp-text nvme upgrades are complete (T364891)
17:49 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
17:39 brett: Remove downtime of cache_text/cp text servers in ulsfo - T364891
17:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T367261)', diff saved to https://phabricator.wikimedia.org/P64736 and previous config saved to /var/cache/conftool/dbconfig/20240612-173759-marostegui.json
17:30 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=cache_text,dc=ulsfo
17:26 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
17:25 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
17:25 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
17:25 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
17:24 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
17:24 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
17:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T367261)', diff saved to https://phabricator.wikimedia.org/P64735 and previous config saved to /var/cache/conftool/dbconfig/20240612-172406-marostegui.json
17:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance
17:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance
17:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T367261)', diff saved to https://phabricator.wikimedia.org/P64734 and previous config saved to /var/cache/conftool/dbconfig/20240612-172344-marostegui.json
17:13 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
17:13 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
17:10 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
17:09 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
17:09 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:sessionstore
17:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P64733 and previous config saved to /var/cache/conftool/dbconfig/20240612-170837-marostegui.json
16:56 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
16:55 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
16:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P64732 and previous config saved to /var/cache/conftool/dbconfig/20240612-165329-marostegui.json
16:38 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
16:31 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
16:28 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
16:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T367261)', diff saved to https://phabricator.wikimedia.org/P64730 and previous config saved to /var/cache/conftool/dbconfig/20240612-162426-marostegui.json
16:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance
16:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance
16:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T367261)', diff saved to https://phabricator.wikimedia.org/P64729 and previous config saved to /var/cache/conftool/dbconfig/20240612-162403-marostegui.json
16:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P64728 and previous config saved to /var/cache/conftool/dbconfig/20240612-162134-ladsgroup.json
16:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
16:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
16:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P64727 and previous config saved to /var/cache/conftool/dbconfig/20240612-162110-ladsgroup.json
16:20 brett: cumin 'A:cp-text and A:ulsfo' 'systemctl poweroff' - T364891
16:19 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
16:19 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 8 hosts with reason: T364891
16:18 brett@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on 8 hosts with reason: T364891
16:18 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
16:18 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
16:17 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
16:17 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
16:17 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
16:13 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
16:11 jhathaway@deploy1002: Finished scap: (no justification provided) (duration: 03m 19s)
16:10 jhathaway@deploy1002: Started scap: (no justification provided)
16:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P64726 and previous config saved to /var/cache/conftool/dbconfig/20240612-160856-marostegui.json
16:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P64725 and previous config saved to /var/cache/conftool/dbconfig/20240612-160603-ladsgroup.json
16:05 eevans@cumin1002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:sessionstore
16:00 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
15:55 otto@deploy1002: Finished scap: Backport for Remove EventLoggingLegacyConverter code - it has been moved to EventLogging (T353817) (duration: 12m 19s)
15:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P64724 and previous config saved to /var/cache/conftool/dbconfig/20240612-155349-marostegui.json
15:53 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
15:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P64723 and previous config saved to /var/cache/conftool/dbconfig/20240612-155056-ladsgroup.json
15:47 otto@deploy1002: otto: Continuing with sync
15:46 otto@deploy1002: otto: Backport for Remove EventLoggingLegacyConverter code - it has been moved to EventLogging (T353817) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:43 otto@deploy1002: Started scap: Backport for Remove EventLoggingLegacyConverter code - it has been moved to EventLogging (T353817)
15:42 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
15:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T367261)', diff saved to https://phabricator.wikimedia.org/P64722 and previous config saved to /var/cache/conftool/dbconfig/20240612-153842-marostegui.json
15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P64721 and previous config saved to /var/cache/conftool/dbconfig/20240612-153549-ladsgroup.json
15:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1027.eqiad.wmnet
15:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding sretest2001 to codfw - jhancock@cumin2002"
15:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1027.eqiad.wmnet
15:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding sretest2001 to codfw - jhancock@cumin2002"
15:31 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1027.eqiad.wmnet
15:28 denisse@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=logstash,name=eqiad
15:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
15:27 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
15:25 volans: uploaded spicerack_8.6.0 to apt.wikimedia.org bullseye-wikimedia
15:25 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1003']
15:24 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1003']
15:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T367261)', diff saved to https://phabricator.wikimedia.org/P64720 and previous config saved to /var/cache/conftool/dbconfig/20240612-152403-marostegui.json
15:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
15:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
15:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T367261)', diff saved to https://phabricator.wikimedia.org/P64719 and previous config saved to /var/cache/conftool/dbconfig/20240612-152351-marostegui.json
15:23 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1003']
15:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1027.eqiad.wmnet
15:12 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1003']
15:12 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
15:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P64718 and previous config saved to /var/cache/conftool/dbconfig/20240612-150844-marostegui.json
15:02 cdanis: T364907 💙[email protected] ~ 🕚☕ sudo -i reprepro --keepunreferencedfiles includedeb bullseye-wikimedia ~/otelcol-contrib_0.102.0_linux_amd64.deb
15:02 brett: authdns-update run on dns1004
15:01 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
15:00 brett: Depooling ulsfo in preparation for A:cp-text downtime/poweroff for nvme upgrades (T364891)
15:00 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Revert "Only register EntitySchema namespace when feature is enabled", Revert "Allow loading EntitySchema on client (only) wikis" (duration: 12m 36s)
14:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P64717 and previous config saved to /var/cache/conftool/dbconfig/20240612-145337-marostegui.json
14:53 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
14:53 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
14:51 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
14:50 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Revert "Only register EntitySchema namespace when feature is enabled", Revert "Allow loading EntitySchema on client (only) wikis" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-main-eqiad
14:49 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
14:49 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
14:47 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Revert "Only register EntitySchema namespace when feature is enabled", Revert "Allow loading EntitySchema on client (only) wikis"
14:46 oblivian@deploy1002: Finished scap: Backport for Use the statsd-exporter service where available (T365265) (duration: 12m 05s)
14:44 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1003.eqiad.wmnet with OS bookworm
14:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T367261)', diff saved to https://phabricator.wikimedia.org/P64716 and previous config saved to /var/cache/conftool/dbconfig/20240612-143830-marostegui.json
14:38 oblivian@deploy1002: oblivian: Continuing with sync
14:37 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1003
14:37 oblivian@deploy1002: oblivian: Backport for Use the statsd-exporter service where available (T365265) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:36 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1003
14:35 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:35 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1003 to a new rack - kamila@cumin1002"
14:34 moritzm: failover ganeti master in eqiad to ganeti1028
14:34 oblivian@deploy1002: Started scap: Backport for Use the statsd-exporter service where available (T365265)
14:34 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1003 to a new rack - kamila@cumin1002"
14:31 moritzm: installing gst-plugins-base1.0 security updates
14:31 kamila@cumin1002: START - Cookbook sre.dns.netbox
14:31 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:29 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1038.eqiad.wmnet
14:29 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1038.eqiad.wmnet
14:28 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:27 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:27 claime: trafficserver: move 95% of traffic to mw-on-k8s
14:27 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Allow loading EntitySchema on client (only) wikis (T363153), Only register EntitySchema namespace when feature is enabled (T363153) (duration: 12m 32s)
14:27 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:24 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
14:24 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T367261)', diff saved to https://phabricator.wikimedia.org/P64715 and previous config saved to /var/cache/conftool/dbconfig/20240612-142412-marostegui.json
14:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
14:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
14:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
14:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
14:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367261)', diff saved to https://phabricator.wikimedia.org/P64714 and previous config saved to /var/cache/conftool/dbconfig/20240612-142335-marostegui.json
14:22 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:22 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:22 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
14:21 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
14:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1038.eqiad.wmnet
14:20 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s5
14:20 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s8
14:20 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
14:20 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
14:19 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:19 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
14:19 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
14:19 jayme@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
14:18 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
14:17 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Allow loading EntitySchema on client (only) wikis (T363153), Only register EntitySchema namespace when feature is enabled (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:15 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:15 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1020.eqiad.wmnet
14:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Allow loading EntitySchema on client (only) wikis (T363153), Only register EntitySchema namespace when feature is enabled (T363153)
14:10 moritzm: installing libarchive security updates
14:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P64713 and previous config saved to /var/cache/conftool/dbconfig/20240612-140827-marostegui.json
14:07 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1020.eqiad.wmnet
14:02 vgutierrez: repool text@esams with IPIP encapsulation enabled - T366466
14:02 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bookworm
14:00 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
13:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1038.eqiad.wmnet
13:55 dcausse@deploy1002: Finished deploy [wdqs/wdqs@1cf4017]: deploy to test server wdqs2023 (fix loadData.sh) (duration: 00m 13s)
13:54 dcausse@deploy1002: Started deploy [wdqs/wdqs@1cf4017]: deploy to test server wdqs2023 (fix loadData.sh)
13:53 vgutierrez: rolling restart of pybal on lvs3010 and lvs3008 - T366466
13:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P64712 and previous config saved to /var/cache/conftool/dbconfig/20240612-135319-marostegui.json
13:49 fabfur: depooled cp4037 to test benthos/haproxy configuration (T365718)
13:48 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb1020.eqiad.wmnet with reason: T366555
13:48 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb1020.eqiad.wmnet with reason: T366555
13:48 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
13:46 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s8
13:46 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s5
13:46 cgoubert@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-main-eqiad
13:45 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet,service=s4
13:45 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet,service=s6
13:45 claime: Starting kafka-main reboots in eqiad
13:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
13:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
13:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T364069)', diff saved to https://phabricator.wikimedia.org/P64710 and previous config saved to /var/cache/conftool/dbconfig/20240612-134414-marostegui.json
13:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1022.eqiad.wmnet
13:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1022.eqiad.wmnet
13:39 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for poolcounter2004.codfw.wmnet: Renew puppet certificate - elukey@cumin1002
13:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367261)', diff saved to https://phabricator.wikimedia.org/P64709 and previous config saved to /var/cache/conftool/dbconfig/20240612-133812-marostegui.json
13:38 elukey@cumin1002: START - Cookbook sre.puppet.renew-cert for poolcounter2004.codfw.wmnet: Renew puppet certificate - elukey@cumin1002
13:37 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for poolcounter2003.codfw.wmnet: Renew puppet certificate - elukey@cumin1002
13:36 elukey@cumin1002: START - Cookbook sre.puppet.renew-cert for poolcounter2003.codfw.wmnet: Renew puppet certificate - elukey@cumin1002
13:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1022.eqiad.wmnet
13:36 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for poolcounter1004.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
13:35 elukey@cumin1002: START - Cookbook sre.puppet.renew-cert for poolcounter1004.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
13:35 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for poolcounter1005.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
13:34 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1010.eqiad.wmnet
13:34 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1010.eqiad.wmnet
13:34 elukey@cumin1002: START - Cookbook sre.puppet.renew-cert for poolcounter1005.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
13:34 brouberol@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
13:31 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:31 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add ntp-[abc].anycast.wmnet addresses - sukhe@cumin1002"
13:30 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add ntp-[abc].anycast.wmnet addresses - sukhe@cumin1002"
13:30 brouberol@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
13:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P64708 and previous config saved to /var/cache/conftool/dbconfig/20240612-132907-marostegui.json
13:28 sukhe: add ntp-[abc].anycast.wmnet: 10.3.0.[5-7]/32: T366360
13:28 sukhe@cumin1002: START - Cookbook sre.dns.netbox
13:26 vgutierrez: depool text@esams before enabling IPIP encapsulation - T366466
13:26 dcausse@deploy1002: Finished deploy [wdqs/wdqs@43b966f]: deploy to test server wdqs2023 (duration: 00m 14s)
13:25 dcausse@deploy1002: Started deploy [wdqs/wdqs@43b966f]: deploy to test server wdqs2023
13:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T367261)', diff saved to https://phabricator.wikimedia.org/P64707 and previous config saved to /var/cache/conftool/dbconfig/20240612-132351-marostegui.json
13:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
13:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
13:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1022.eqiad.wmnet
13:21 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Only register EntitySchema namespace when feature is enabled (T363153) (duration: 12m 15s)
13:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet
13:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1021.eqiad.wmnet
13:18 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1010.eqiad.wmnet with reason: Troubleshooting remote logging — T350567
13:18 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1010.eqiad.wmnet with reason: Troubleshooting remote logging — T350567
13:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P64706 and previous config saved to /var/cache/conftool/dbconfig/20240612-131400-marostegui.json
13:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1021.eqiad.wmnet
13:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on logstash1031.eqiad.wmnet with reason: reboot/ganeti
13:13 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
13:13 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on logstash1031.eqiad.wmnet with reason: reboot/ganeti
13:12 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Only register EntitySchema namespace when feature is enabled (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
13:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
13:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Only register EntitySchema namespace when feature is enabled (T363153)
13:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64705 and previous config saved to /var/cache/conftool/dbconfig/20240612-130232-root.json
13:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
13:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
12:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T364069)', diff saved to https://phabricator.wikimedia.org/P64704 and previous config saved to /var/cache/conftool/dbconfig/20240612-125853-marostegui.json
12:58 ladsgroup@deploy1002: Finished scap: Backport for override circuit breaking threshold for ES hosts (duration: 16m 34s)
12:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet
12:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1020.eqiad.wmnet
12:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1240.eqiad.wmnet with reason: Maintenance
12:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1240.eqiad.wmnet with reason: Maintenance
12:50 ladsgroup@deploy1002: ladsgroup: Continuing with sync
12:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1020.eqiad.wmnet
12:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on logstash1030.eqiad.wmnet with reason: reboot/ganeti
12:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64703 and previous config saved to /var/cache/conftool/dbconfig/20240612-124727-root.json
12:47 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on logstash1030.eqiad.wmnet with reason: reboot/ganeti
12:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1223.eqiad.wmnet with reason: Maintenance
12:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1223.eqiad.wmnet with reason: Maintenance
12:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T367261)', diff saved to https://phabricator.wikimedia.org/P64702 and previous config saved to /var/cache/conftool/dbconfig/20240612-124456-marostegui.json
12:44 ladsgroup@deploy1002: ladsgroup: Backport for override circuit breaking threshold for ES hosts synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:42 ladsgroup@deploy1002: Started scap: Backport for override circuit breaking threshold for ES hosts
12:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1003.eqiad.wmnet
12:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping1003.eqiad.wmnet
12:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64701 and previous config saved to /var/cache/conftool/dbconfig/20240612-123222-root.json
12:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P64700 and previous config saved to /var/cache/conftool/dbconfig/20240612-122948-marostegui.json
12:29 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
12:29 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
12:28 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
12:25 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
12:25 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
12:25 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
12:24 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/echostore: apply
12:18 Emperor: restart swift-proxy on ms-fe1013 T360913
12:17 Emperor: restart swift-proxy on ms-fe2011 ms-fe2014 T360913
12:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64699 and previous config saved to /var/cache/conftool/dbconfig/20240612-121716-root.json
12:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P64698 and previous config saved to /var/cache/conftool/dbconfig/20240612-121441-marostegui.json
12:14 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
12:14 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
12:13 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
12:13 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
12:13 jayme@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
12:12 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/echostore: apply
12:12 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/echostore: apply
12:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet
12:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1033.eqiad.wmnet
12:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1033.eqiad.wmnet
12:10 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/echostore: apply
12:10 jayme@deploy1002: helmfile [staging] START helmfile.d/services/echostore: apply
12:05 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
12:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1033.eqiad.wmnet
12:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64697 and previous config saved to /var/cache/conftool/dbconfig/20240612-120211-root.json
12:00 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
11:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T367261)', diff saved to https://phabricator.wikimedia.org/P64696 and previous config saved to /var/cache/conftool/dbconfig/20240612-115934-marostegui.json
11:59 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
11:59 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
11:58 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
11:57 claime: Manual restart of dump_cloud_ip_ranges.service on A:puppetserver and A:puppetmaster
11:55 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
11:55 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
11:54 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
11:54 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
11:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
11:53 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
11:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1034.eqiad.wmnet
11:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1034.eqiad.wmnet
11:53 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
11:53 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
11:52 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
11:52 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
11:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T367261)', diff saved to https://phabricator.wikimedia.org/P64695 and previous config saved to /var/cache/conftool/dbconfig/20240612-115143-marostegui.json
11:51 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
11:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
11:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
11:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1212.eqiad.wmnet with reason: Maintenance
11:51 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
11:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1212.eqiad.wmnet with reason: Maintenance
11:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T367261)', diff saved to https://phabricator.wikimedia.org/P64693 and previous config saved to /var/cache/conftool/dbconfig/20240612-115103-marostegui.json
11:50 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
11:50 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
11:50 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
11:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64692 and previous config saved to /var/cache/conftool/dbconfig/20240612-114705-root.json
11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1034.eqiad.wmnet
11:46 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
11:45 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
11:45 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
11:45 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-mcrouter: apply
11:45 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mw-mcrouter: apply
11:44 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
11:44 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
11:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1191', diff saved to https://phabricator.wikimedia.org/P64691 and previous config saved to /var/cache/conftool/dbconfig/20240612-114410-root.json
11:42 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
11:42 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
11:39 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
11:38 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
11:37 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
11:37 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
11:37 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
11:37 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
11:37 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
11:37 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
11:36 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
11:36 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
11:36 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
11:36 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
11:36 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
11:35 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
11:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P64690 and previous config saved to /var/cache/conftool/dbconfig/20240612-113556-marostegui.json
11:35 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
11:31 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
11:31 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
11:30 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
11:22 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1031.eqiad.wmnet with OS bookworm
11:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P64689 and previous config saved to /var/cache/conftool/dbconfig/20240612-112048-marostegui.json
11:14 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
11:14 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
11:13 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
11:12 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
11:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
11:12 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
11:10 moritzm: rebalance ganeti cluster in eqsin following reboots
11:08 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
11:08 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for EntitySchemaSlotViewRenderer: Fix Phan failure (duration: 12m 10s)
11:08 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
11:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T367261)', diff saved to https://phabricator.wikimedia.org/P64688 and previous config saved to /var/cache/conftool/dbconfig/20240612-110541-marostegui.json
11:04 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety" "Zabe" --reason "per request T367217"
11:03 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl1003.eqiad.wmnet
11:03 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:03 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
11:01 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department" "Wikimedia Foundation/Legal" "Zabe" --reason "per request T367216"
11:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5007.eqsin.wmnet
10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
10:58 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
10:58 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
10:58 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for EntitySchemaSlotViewRenderer: Fix Phan failure synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:57 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Global Advocacy/Conversation hours and Events" "Wikimedia Foundation/Legal/Global Advocacy/Conversation hours and Events" "Zabe" --reason "per request T367219"
10:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T367261)', diff saved to https://phabricator.wikimedia.org/P64687 and previous config saved to /var/cache/conftool/dbconfig/20240612-105615-marostegui.json
10:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
10:56 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for EntitySchemaSlotViewRenderer: Fix Phan failure
10:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
10:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T367261)', diff saved to https://phabricator.wikimedia.org/P64686 and previous config saved to /var/cache/conftool/dbconfig/20240612-105554-marostegui.json
10:54 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
10:54 kamila@cumin1002: START - Cookbook sre.dns.netbox
10:53 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Global Advocacy/About" "Wikimedia Foundation/Legal/Global Advocacy/About" "Zabe" --reason "per request T367219"
10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
10:52 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
10:48 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl1003.eqiad.wmnet
10:46 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl1003.eqiad.wmnet
10:41 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Global Advocacy" "Wikimedia Foundation/Legal/Global Advocacy" "Zabe" --reason "per request T367219"
10:41 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1019.eqiad.wmnet
10:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P64685 and previous config saved to /var/cache/conftool/dbconfig/20240612-104047-marostegui.json
10:33 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1031.eqiad.wmnet with OS bookworm
10:27 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1019.eqiad.wmnet
10:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5007.eqsin.wmnet
10:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P64684 and previous config saved to /var/cache/conftool/dbconfig/20240612-102540-marostegui.json
10:25 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
10:25 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
10:25 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
10:24 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
10:24 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
10:23 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
10:23 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
10:23 godog: remove MediaWiki.jawiki.GrowthExperiments.NewcomerTask.update_.* from graphite hosts - T362633
10:23 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
10:23 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
10:22 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
10:19 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet,service=s6
10:19 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet,service=s4
10:19 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Grants:Community Resources" "Wikimedia Foundation/Advancement/Community Growth/Community Resources" "Zabe" --reason "per request T365837"
10:17 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
10:16 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
10:16 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
10:16 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
10:16 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
10:16 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
10:16 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
10:16 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
10:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on 9 hosts with reason: decommissioning
10:15 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
10:15 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
10:15 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
10:15 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on 9 hosts with reason: decommissioning
10:14 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
10:14 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
10:10 claime: Depooling mw2281.codfw.wmnet,mw22[83-90].codfw.wmnet for decommission - T367275
10:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T367261)', diff saved to https://phabricator.wikimedia.org/P64683 and previous config saved to /var/cache/conftool/dbconfig/20240612-101032-marostegui.json
10:08 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
10:07 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
10:07 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
10:07 zabe: zabe@mwmaint1002:~$ foreachwikiindblist 'all - s4' refreshImageMetadata.php --mime image/webp # T364680
09:48 fabfur: disabling puppet on cp4037 to test benthos configuration (T360454)
09:47 fabfur: disabling puppet on cp4037 to test benthos configuration
09:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P64680 and previous config saved to /var/cache/conftool/dbconfig/20240612-094738-marostegui.json
09:47 _joe_: running dump_cloud_ip_ranges on puppetmaster1001 to test fixed script
09:43 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s7
09:43 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s2
09:33 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
09:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P64679 and previous config saved to /var/cache/conftool/dbconfig/20240612-093231-marostegui.json
09:32 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
09:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T367261)', diff saved to https://phabricator.wikimedia.org/P64678 and previous config saved to /var/cache/conftool/dbconfig/20240612-091724-marostegui.json
09:11 moritzm: failover ganeti cluster for eqsin to ganeti5004
09:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T367261)', diff saved to https://phabricator.wikimedia.org/P64677 and previous config saved to /var/cache/conftool/dbconfig/20240612-090959-marostegui.json
09:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
09:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
09:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T367261)', diff saved to https://phabricator.wikimedia.org/P64676 and previous config saved to /var/cache/conftool/dbconfig/20240612-090937-marostegui.json
09:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64675 and previous config saved to /var/cache/conftool/dbconfig/20240612-090834-ladsgroup.json
09:06 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
09:04 Lucas_WMDE: START lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki enwiki --current --all --touched-after=20240524120000 --start '["55386869"]' 2>&1 | tee -a ~/T315510-enwiki-9; date
09:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64674 and previous config saved to /var/cache/conftool/dbconfig/20240612-090435-ladsgroup.json
09:04 Lucas_WMDE: STOPPED lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki enwiki --current --all --touched-after=20240524120000 --start '["55019880"]' 2>&1 | tee -a ~/T315510-enwiki-8; date # Ctrl+C, had become very slow, trying restart
08:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P64673 and previous config saved to /var/cache/conftool/dbconfig/20240612-085430-marostegui.json
08:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64672 and previous config saved to /var/cache/conftool/dbconfig/20240612-085329-ladsgroup.json
08:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5006.eqsin.wmnet
08:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5006.eqsin.wmnet
08:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64671 and previous config saved to /var/cache/conftool/dbconfig/20240612-084929-ladsgroup.json
08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5006.eqsin.wmnet
08:42 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1002.eqiad.wmnet with reason: host reimage
08:42 zabe: zabe@mwmaint1002:~$ mwscript refreshImageMetadata.php commonswiki --mime image/webp # T364680
08:39 slyngshede@cumin1002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Mike Pham out of all services on: 2200 hosts
08:39 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1002.eqiad.wmnet with reason: host reimage
08:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P64670 and previous config saved to /var/cache/conftool/dbconfig/20240612-083923-marostegui.json
08:38 slyngshede@cumin1002: START - Cookbook sre.idm.logout Logging Mike Pham out of all services on: 2200 hosts
08:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 50%: Maint over', diff saved to https://phabricator.wikimedia.org/P64669 and previous config saved to /var/cache/conftool/dbconfig/20240612-083824-ladsgroup.json
08:36 Lucas_WMDE: lucaswerkmeister-wmde@deploy1002 ~ $ mwscript-k8s --comment 'T367174, P12703' extensions/Wikibase/repo/maintenance/changePropertyDataType.php wikidatawiki -- --property-id P12703 --new-data-type external-id --summary 'T367174' # succeeded
08:35 Lucas_WMDE: lucaswerkmeister-wmde@deploy1002 ~ $ mwscript-k8s --comment 'T367174, P12583' extensions/Wikibase/repo/maintenance/changePropertyDataType.php wikidatawiki -- --property-id P12583 --new-data-type external-id --summary 'T367174' # succeeded
08:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 50%: Maint over', diff saved to https://phabricator.wikimedia.org/P64668 and previous config saved to /var/cache/conftool/dbconfig/20240612-083424-ladsgroup.json
08:28 brouberol@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
08:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
08:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
08:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2123', diff saved to https://phabricator.wikimedia.org/P64667 and previous config saved to /var/cache/conftool/dbconfig/20240612-082702-marostegui.json
08:26 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_codfw
08:26 fabfur: start rebooting all cp-upload_codfw hosts for T366555 (spaced 1.5 hrs)
08:25 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
08:25 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1002
08:25 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1002
08:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T367261)', diff saved to https://phabricator.wikimedia.org/P64666 and previous config saved to /var/cache/conftool/dbconfig/20240612-082415-marostegui.json
08:24 brouberol@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
08:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64665 and previous config saved to /var/cache/conftool/dbconfig/20240612-082318-ladsgroup.json
08:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Maintenance
08:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Maintenance
08:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64664 and previous config saved to /var/cache/conftool/dbconfig/20240612-081918-ladsgroup.json
08:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
08:17 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host ganeti1019.eqiad.wmnet with OS bullseye
08:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64663 and previous config saved to /var/cache/conftool/dbconfig/20240612-081643-root.json
08:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T367261)', diff saved to https://phabricator.wikimedia.org/P64662 and previous config saved to /var/cache/conftool/dbconfig/20240612-081551-marostegui.json
08:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
08:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
08:15 brouberol@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
08:15 brouberol@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
08:12 brouberol@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
08:12 brouberol@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
08:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P64661 and previous config saved to /var/cache/conftool/dbconfig/20240612-081158-ladsgroup.json
08:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
08:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
08:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
08:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
08:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
08:09 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
08:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
08:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
07:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5005.eqsin.wmnet
07:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
07:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1019.eqiad.wmnet with OS bullseye
07:36 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host ganeti1019.eqiad.wmnet with OS bullseye
07:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
07:23 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5005.eqsin.wmnet
07:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5004.eqsin.wmnet
07:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
07:20 marostegui: dbmaint optimize pagelinks on old s6 codfw master db2214 T364069
07:16 kartik@deploy1002: Finished scap: Backport for Content Translation: Set MT threshold 85% in the Portuguese Wikipedia (T356356) (duration: 13m 11s)
07:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Long schema change
07:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
07:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Long schema change
07:14 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
07:14 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
07:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2214.codfw.wmnet with reason: Long schema change
07:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2214.codfw.wmnet with reason: Long schema change
07:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5004.eqsin.wmnet
07:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2214 T367262', diff saved to https://phabricator.wikimedia.org/P64660 and previous config saved to /var/cache/conftool/dbconfig/20240612-071340-root.json
07:12 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2129 to s6 primary T367262', diff saved to https://phabricator.wikimedia.org/P64659 and previous config saved to /var/cache/conftool/dbconfig/20240612-071158-root.json
07:06 kartik@deploy1002: kartik: Continuing with sync
07:05 kartik@deploy1002: kartik: Backport for Content Translation: Set MT threshold 85% in the Portuguese Wikipedia (T356356) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:04 marostegui: Starting s6 codfw failover from db2214 to db2129 - T367262
07:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
07:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T364069)', diff saved to https://phabricator.wikimedia.org/P64658 and previous config saved to /var/cache/conftool/dbconfig/20240612-070302-marostegui.json
07:02 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
07:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
07:02 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
07:02 kartik@deploy1002: Started scap: Backport for Content Translation: Set MT threshold 85% in the Portuguese Wikipedia (T356356)
07:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T364069)', diff saved to https://phabricator.wikimedia.org/P64657 and previous config saved to /var/cache/conftool/dbconfig/20240612-070240-marostegui.json
07:02 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
06:58 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1019.eqiad.wmnet with OS bullseye
06:55 moritzm: remove ganeti1019 from eqiad cluster T367071
06:54 moritzm: rebalance ganeti clusters in codfw following reboots
06:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P64656 and previous config saved to /var/cache/conftool/dbconfig/20240612-064733-marostegui.json
06:44 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
06:43 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
06:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s6 T367262
06:42 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2129 with weight 0 T367262', diff saved to https://phabricator.wikimedia.org/P64655 and previous config saved to /var/cache/conftool/dbconfig/20240612-064200-root.json
06:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s6 T367262
06:40 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
06:40 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
06:38 hashar@deploy1002: Finished deploy [gerrit/gerrit@69984f7]: wm-zuul-status: fix reload button - T360550 (duration: 00m 07s)
06:38 hashar@deploy1002: Started deploy [gerrit/gerrit@69984f7]: wm-zuul-status: fix reload button - T360550
06:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P64654 and previous config saved to /var/cache/conftool/dbconfig/20240612-063225-marostegui.json
06:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T364069)', diff saved to https://phabricator.wikimedia.org/P64653 and previous config saved to /var/cache/conftool/dbconfig/20240612-061718-marostegui.json
05:59 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
05:59 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
05:58 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
05:58 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
05:51 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
05:51 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
05:17 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
05:17 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
05:17 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
05:16 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
05:16 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
05:16 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
00:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T364069)', diff saved to https://phabricator.wikimedia.org/P64652 and previous config saved to /var/cache/conftool/dbconfig/20240612-005420-marostegui.json
00:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
00:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
00:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T364069)', diff saved to https://phabricator.wikimedia.org/P64651 and previous config saved to /var/cache/conftool/dbconfig/20240612-005347-marostegui.json
00:53 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_codfw
00:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P64650 and previous config saved to /var/cache/conftool/dbconfig/20240612-003840-marostegui.json
00:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P64649 and previous config saved to /var/cache/conftool/dbconfig/20240612-002332-marostegui.json
00:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T364069)', diff saved to https://phabricator.wikimedia.org/P64648 and previous config saved to /var/cache/conftool/dbconfig/20240612-000825-marostegui.json

2024-06-11

23:45 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
23:45 eevans@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
22:56 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
22:29 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:aqs-codfw
21:56 ladsgroup@deploy1002: Finished scap: Backport for Fix Linker::makeExternalLink build failures (T367127) (duration: 12m 33s)
21:51 ejegg: fundraising civicrm upgraded from 7252b1b9 to f7855d25
21:47 ladsgroup@deploy1002: matmarex, ladsgroup: Continuing with sync
21:47 ladsgroup@deploy1002: matmarex, ladsgroup: Backport for Fix Linker::makeExternalLink build failures (T367127) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:44 ladsgroup@deploy1002: Started scap: Backport for Fix Linker::makeExternalLink build failures (T367127)
21:42 ladsgroup@deploy1002: Finished scap: Backport for Reduce the threshold for section wide circuit breaking to 300 (duration: 12m 08s)
21:33 ladsgroup@deploy1002: ladsgroup: Continuing with sync
21:32 ladsgroup@deploy1002: ladsgroup: Backport for Reduce the threshold for section wide circuit breaking to 300 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:30 ladsgroup@deploy1002: Started scap: Backport for Reduce the threshold for section wide circuit breaking to 300
21:27 ladsgroup@deploy1002: Finished scap: Backport for [zghwiki] Add patroller and autopatrolled groups (T357411) (duration: 11m 53s)
21:18 ladsgroup@deploy1002: pppery, ladsgroup: Continuing with sync
21:18 ladsgroup@deploy1002: pppery, ladsgroup: Backport for [zghwiki] Add patroller and autopatrolled groups (T357411) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:16 ladsgroup@deploy1002: Started scap: Backport for [zghwiki] Add patroller and autopatrolled groups (T357411)
21:15 ladsgroup@deploy1002: Finished scap: Backport for Stop writing to the old pagelinks columns of s2 (T352010) (duration: 12m 02s)
21:06 ladsgroup@deploy1002: ladsgroup: Continuing with sync
21:05 ladsgroup@deploy1002: ladsgroup: Backport for Stop writing to the old pagelinks columns of s2 (T352010) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:03 ladsgroup@deploy1002: Started scap: Backport for Stop writing to the old pagelinks columns of s2 (T352010)
21:01 ladsgroup@deploy1002: Finished scap: Backport for Avoid wrapping floated tables using computed styles (T366314) (duration: 14m 28s)
20:52 ejegg: re-enabled fundraising scheduled jobs
20:52 ladsgroup@deploy1002: jdlrobson, ladsgroup: Continuing with sync
20:49 ladsgroup@deploy1002: jdlrobson, ladsgroup: Backport for Avoid wrapping floated tables using computed styles (T366314) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:46 ladsgroup@deploy1002: Started scap: Backport for Avoid wrapping floated tables using computed styles (T366314)
20:46 ladsgroup@deploy1002: Finished scap: Backport for Drop unused config, enable responsive tables on group 0 (T301212 T366314) (duration: 14m 18s)
20:36 ladsgroup@deploy1002: ladsgroup, jdlrobson: Continuing with sync
20:34 ladsgroup@deploy1002: ladsgroup, jdlrobson: Backport for Drop unused config, enable responsive tables on group 0 (T301212 T366314) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:31 ladsgroup@deploy1002: Started scap: Backport for Drop unused config, enable responsive tables on group 0 (T301212 T366314)
20:30 ladsgroup@deploy1002: Finished scap: Backport for [ptwikinews] Set atom feed link (T356003), [jawikinews] Set $wgArticleCountMethod to any (T364189) (duration: 12m 52s)
20:21 ladsgroup@deploy1002: pppery, ladsgroup: Continuing with sync
20:20 ladsgroup@deploy1002: pppery, ladsgroup: Backport for [ptwikinews] Set atom feed link (T356003), [jawikinews] Set $wgArticleCountMethod to any (T364189) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:17 ladsgroup@deploy1002: Started scap: Backport for [ptwikinews] Set atom feed link (T356003), [jawikinews] Set $wgArticleCountMethod to any (T364189)
20:16 ladsgroup@deploy1002: Finished scap: Backport for MediaWiki.org: restrict unfuzzy rights to autoconfirmed (T366994) (duration: 12m 54s)
20:13 eevans@cumin1002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:aqs-codfw
20:07 ladsgroup@deploy1002: ladsgroup, pppery: Continuing with sync
20:06 ladsgroup@deploy1002: ladsgroup, pppery: Backport for MediaWiki.org: restrict unfuzzy rights to autoconfirmed (T366994) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:03 ladsgroup@deploy1002: Started scap: Backport for MediaWiki.org: restrict unfuzzy rights to autoconfirmed (T366994)
19:38 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1002
19:38 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1002
19:33 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
19:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64646 and previous config saved to /var/cache/conftool/dbconfig/20240611-192403-ladsgroup.json
19:23 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
19:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64645 and previous config saved to /var/cache/conftool/dbconfig/20240611-190855-ladsgroup.json
18:59 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:aqs-eqiad
18:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64644 and previous config saved to /var/cache/conftool/dbconfig/20240611-185348-ladsgroup.json
18:46 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:44 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:41 ebernhardson@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64643 and previous config saved to /var/cache/conftool/dbconfig/20240611-183841-ladsgroup.json
18:37 ebernhardson@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
18:22 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.9 refs T361403
18:19 ebernhardson@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:19 ebernhardson@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
18:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T364069)', diff saved to https://phabricator.wikimedia.org/P64642 and previous config saved to /var/cache/conftool/dbconfig/20240611-181526-marostegui.json
18:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
18:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
18:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
18:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
18:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T364069)', diff saved to https://phabricator.wikimedia.org/P64641 and previous config saved to /var/cache/conftool/dbconfig/20240611-181448-marostegui.json
18:10 brennen: 1.43.0-wmf.9 train (T361403): no blockers, rolling to group0
18:08 ejegg: stopped fundraising scheduled jobs
17:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P64640 and previous config saved to /var/cache/conftool/dbconfig/20240611-175941-marostegui.json
17:59 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:58 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
17:56 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:56 taavi@deploy1002: Finished scap: Backport for wikitech: Stop loading OpenStackManager (T161553 T338477 T359544) (duration: 12m 00s)
17:56 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
17:47 taavi@deploy1002: taavi: Continuing with sync
17:47 taavi@deploy1002: taavi: Backport for wikitech: Stop loading OpenStackManager (T161553 T338477 T359544) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:45 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:45 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
17:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P64639 and previous config saved to /var/cache/conftool/dbconfig/20240611-174434-marostegui.json
17:44 taavi@deploy1002: Started scap: Backport for wikitech: Stop loading OpenStackManager (T161553 T338477 T359544)
17:37 rzl@deploy1002: Finished scap: (no justification provided) (duration: 11m 40s)
17:33 rzl: rzl@cumin2002:~$ sudo cumin 'C:profile::mediawiki::webserver' 'enable-puppet T366649'
17:33 rzl@deploy1002: rzl: Continuing with sync
17:30 rzl@deploy1002: rzl: (no justification provided) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T364069)', diff saved to https://phabricator.wikimedia.org/P64638 and previous config saved to /var/cache/conftool/dbconfig/20240611-172928-marostegui.json
17:26 rzl@deploy1002: Started scap: (no justification provided)
17:14 rzl: rzl@cumin2002:~$ sudo cumin 'C:profile::mediawiki::webserver' 'disable-puppet T366649'
17:11 ejegg: fundraising civicrm upgraded from ebfbad86 to 7252b1b9
17:09 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:09 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:09 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
17:08 ebernhardson@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:08 ebernhardson@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
17:04 ebernhardson@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:04 ebernhardson@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
17:04 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
17:04 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
16:59 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
16:56 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
16:56 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
16:56 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
16:53 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
16:53 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
16:51 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
16:47 ryankemper@cumin2002: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop test cluster
16:40 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:restbase-codfw
16:37 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
16:36 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
16:35 ebernhardson@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:35 ebernhardson@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
16:33 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "updated wikikube-ctrl1002 status - kamila@cumin1002 - T366204"
16:31 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1013.eqiad.wmnet|wikikube-worker1014.eqiad.wmnet|wikikube-worker1017.eqiad.wmnet|wikikube-worker1018.eqiad.wmnet),cluster=kubernetes,service=kubesvc
16:31 claime: pool and uncordon wikikube-worker1013.eqiad.wmnet,wikikube-worker1014.eqiad.wmnet,wikikube-worker1017.eqiad.wmnet,wikikube-worker1018.eqiad.wmnet - T351074
16:31 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "updated wikikube-ctrl1002 status - kamila@cumin1002 - T366204"
16:29 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
16:28 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:27 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl1001.eqiad.wmnet
16:26 kamila@cumin1002: START - Cookbook sre.dns.netbox
16:21 arnaudb@cumin1002: dbctl commit (dc=all): 'es1038 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64637 and previous config saved to /var/cache/conftool/dbconfig/20240611-162154-arnaudb.json
16:21 claime: homer 'cr*eqiad*' commit 'T351074'
16:16 elukey: manual run of docker-report-k8s on build2001 (some failed results)
16:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1017.eqiad.wmnet with OS bullseye
16:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1018.eqiad.wmnet with OS bullseye
16:07 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1002
16:06 arnaudb@cumin1002: dbctl commit (dc=all): 'es1038 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64636 and previous config saved to /var/cache/conftool/dbconfig/20240611-160649-arnaudb.json
16:06 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop test cluster
16:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1014.eqiad.wmnet with OS bullseye
16:05 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1002
16:05 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:05 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update moved wikikube-ctrl1002 host in eqiad - kamila@cumin1002"
16:04 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
16:04 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update moved wikikube-ctrl1002 host in eqiad - kamila@cumin1002"
16:04 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
16:03 claime: roll restarting eventgate-main eqiad
16:00 kamila@cumin1002: START - Cookbook sre.dns.netbox
15:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1017.eqiad.wmnet with reason: host reimage
15:51 arnaudb@cumin1002: dbctl commit (dc=all): 'es1038 (re)pooling @ 50%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64635 and previous config saved to /var/cache/conftool/dbconfig/20240611-155143-arnaudb.json
15:51 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
15:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1018.eqiad.wmnet with reason: host reimage
15:50 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/termbox: apply
15:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1014.eqiad.wmnet with reason: host reimage
15:45 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1018.eqiad.wmnet with reason: host reimage
15:45 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1017.eqiad.wmnet with reason: host reimage
15:44 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1014.eqiad.wmnet with reason: host reimage
14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:35:00 on 6 hosts with reason: upgrade lsw1-f5-eqiad
14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:35:00 on 6 hosts with reason: upgrade lsw1-f5-eqiad
14:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping2003.codfw.wmnet
14:53 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1013.eqiad.wmnet with OS bullseye
14:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1013.eqiad.wmnet on all recursors
14:52 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1013.eqiad.wmnet on all recursors
14:52 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-f5-eqiad,lsw1-f5-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: prep upgrade of device
14:52 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:51 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1403 to wikikube-worker1014
14:51 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-f5-eqiad,lsw1-f5-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: prep upgrade of device
14:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1037.eqiad.wmnet
14:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1037.eqiad.wmnet
14:51 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from mw1403 to wikikube-worker1014.eqiad.wmnet
14:51 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1403 to wikikube-worker1014.eqiad.wmnet
14:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3007.esams.wmnet
14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1402 to wikikube-worker1013
14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1013
14:46 arnaudb@cumin1002: dbctl commit (dc=all): 'es1038 depool T365982', diff saved to https://phabricator.wikimedia.org/P64631 and previous config saved to /var/cache/conftool/dbconfig/20240611-144624-arnaudb.json
14:45 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1013
14:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1402 to wikikube-worker1013 - cgoubert@cumin1002"
14:45 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
14:44 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
14:44 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1402 to wikikube-worker1013 - cgoubert@cumin1002"
14:44 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl1002.eqiad.wmnet
14:44 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:44 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
14:44 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3007.esams.wmnet
14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1037.eqiad.wmnet
14:42 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
14:41 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:39 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
14:38 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
14:38 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
14:38 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1402 to wikikube-worker1013
14:36 kamila@cumin1002: START - Cookbook sre.dns.netbox
14:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3008.esams.wmnet
14:35 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
14:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3008.esams.wmnet
14:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1038.eqiad.wmnet with reason: T365982
14:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on es1038.eqiad.wmnet with reason: T365982
14:29 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl1002.eqiad.wmnet
14:29 claime: depooling mw1402 mw1403 mw1406 mw1411 for reimage to k8s - T351074
14:29 Lucas_WMDE: UTC afternoon backport+config window done
14:28 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Enable Vector appearance menu & larger font-size on wikipedias (T362148) (duration: 19m 08s)
14:28 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:20:00 on lsw1-f5-eqiad.mgmt with reason: prep upgrade of device
14:28 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:20:00 on lsw1-f5-eqiad.mgmt with reason: prep upgrade of device
14:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3008.esams.wmnet
14:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1037.eqiad.wmnet
14:20 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
14:19 logmsgbot: lucaswerkmeister-wmde@deploy1002 jdrewniak, lucaswerkmeister-wmde: Continuing with sync
14:18 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl1002.eqiad.wmnet
14:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1036.eqiad.wmnet
14:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1036.eqiad.wmnet
14:12 logmsgbot: lucaswerkmeister-wmde@deploy1002 jdrewniak, lucaswerkmeister-wmde: Backport for Enable Vector appearance menu & larger font-size on wikipedias (T362148) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3008.esams.wmnet
14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1036.eqiad.wmnet
14:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Enable Vector appearance menu & larger font-size on wikipedias (T362148)
14:08 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
14:07 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Enable CampaignEvents on swahili wikipedia (T366502) (duration: 14m 40s)
14:05 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1035.eqiad.wmnet with OS bullseye
14:04 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s3
14:04 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1
14:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy FORCED
14:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1036.eqiad.wmnet
14:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1017.eqiad.wmnet
14:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1035.eqiad.wmnet
13:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1035.eqiad.wmnet
13:58 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, cmelo: Continuing with sync
13:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1035.mgmt.eqiad.wmnet with reboot policy FORCED
13:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1037.mgmt.eqiad.wmnet with reboot policy FORCED
13:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
13:55 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
13:55 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, cmelo: Backport for Enable CampaignEvents on swahili wikipedia (T366502) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1035.eqiad.wmnet
13:52 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Enable CampaignEvents on swahili wikipedia (T366502)
13:52 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
13:51 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Configures the necessary user rights for CampaignEvents on swahili (T366502) (duration: 44m 51s)
13:50 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts stat1007.eqiad.wmnet
13:50 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:50 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1017.eqiad.wmnet
13:49 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
13:49 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy FORCED
13:48 btullis@cumin1002: START - Cookbook sre.dns.netbox
13:47 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s3
13:47 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s1
13:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy FORCED
13:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy FORCED
13:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1037.mgmt.eqiad.wmnet with reboot policy FORCED
13:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
13:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1035.mgmt.eqiad.wmnet with reboot policy FORCED
13:45 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:45 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1035-38 - jclark@cumin1002"
13:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1035.eqiad.wmnet
13:45 vgutierrez: rolling switch from tcp-mss-clamper to ferm based MSS clamping on A:ncredir - T365689
13:44 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1035-38 - jclark@cumin1002"
13:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
13:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet
13:42 jiji@cumin1002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:wikikube-worker-eqiad
13:40 btullis@cumin1002: START - Cookbook sre.hosts.decommission for hosts stat1007.eqiad.wmnet
13:40 jclark@cumin1002: START - Cookbook sre.dns.netbox
13:40 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts stat1006.eqiad.wmnet
13:40 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:40 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: stat1006.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
13:36 vgutierrez: repool ncredir6001 - T365689
13:36 eevans@cumin1002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:restbase-codfw
13:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet
13:33 moritzm: failover ganeti cluster for esams01 to ganeti3005
13:32 moritzm: failover ganeti cluster for esams02 to ganeti3006
13:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3006.esams.wmnet
13:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3006.esams.wmnet
13:22 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1016.eqiad.wmnet,service=s5
13:22 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1016.eqiad.wmnet,service=s8
13:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
13:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T352010)', diff saved to https://phabricator.wikimedia.org/P64630 and previous config saved to /var/cache/conftool/dbconfig/20240611-132043-ladsgroup.json
13:19 logmsgbot: lucaswerkmeister-wmde@deploy1002 cmelo, lucaswerkmeister-wmde: Continuing with sync
13:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3006.esams.wmnet
13:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1031.eqiad.wmnet
13:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1031.eqiad.wmnet
13:15 vgutierrez: depool ncredir6001 - T365689
13:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1031.eqiad.wmnet
13:11 logmsgbot: lucaswerkmeister-wmde@deploy1002 cmelo, lucaswerkmeister-wmde: Backport for Configures the necessary user rights for CampaignEvents on swahili (T366502) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:10 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: stat1006.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
13:09 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
13:09 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
13:09 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
13:07 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
13:07 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
13:06 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_codfw
13:06 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3006.esams.wmnet
13:06 vgutierrez: disable puppet on A:ncredir before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1035724 - T365689
13:06 fabfur: start rebooting all cp-text_codfw hosts for T366555 (spaced 1.5 hrs)
13:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Configures the necessary user rights for CampaignEvents on swahili (T366502)
13:06 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
13:06 btullis@cumin1002: START - Cookbook sre.dns.netbox
13:06 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
13:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P64629 and previous config saved to /var/cache/conftool/dbconfig/20240611-130535-ladsgroup.json
13:04 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1016.eqiad.wmnet
13:03 vgutierrez: repool text@eqiad with IPIP encapsulation enabled - T366466
13:02 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
13:01 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
12:59 btullis@cumin1002: START - Cookbook sre.hosts.decommission for hosts stat1006.eqiad.wmnet
12:53 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1016.eqiad.wmnet
12:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P64628 and previous config saved to /var/cache/conftool/dbconfig/20240611-125028-ladsgroup.json
12:50 vgutierrez: rolling restart of pybal on lvs1020 and lvs1017 - T366466
12:49 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1016.eqiad.wmnet,service=s8
12:49 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1016.eqiad.wmnet,service=s5
12:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T352010)', diff saved to https://phabricator.wikimedia.org/P64627 and previous config saved to /var/cache/conftool/dbconfig/20240611-123521-ladsgroup.json
12:32 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
12:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
12:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2205 (T352010)', diff saved to https://phabricator.wikimedia.org/P64626 and previous config saved to /var/cache/conftool/dbconfig/20240611-123046-ladsgroup.json
12:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2205.codfw.wmnet with reason: Maintenance
12:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2205.codfw.wmnet with reason: Maintenance
12:26 fabfur: cancelled previous command (text@eqiad is going to be depooled at the same time)
12:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3005.esams.wmnet
12:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3005.esams.wmnet
12:23 fabfur: start rebooting all cp-text_codfw hosts for T366555 (spaced 1.5 hrs)
12:19 vgutierrez: depool text@eqiad before enabling IPIP encapsulation - T366466
12:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
12:14 sfaci@deploy1002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
12:13 sfaci@deploy1002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
12:13 sfaci@deploy1002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
12:11 sfaci@deploy1002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
12:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3005.esams.wmnet
12:10 sfaci@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
12:09 sfaci@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
12:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64625 and previous config saved to /var/cache/conftool/dbconfig/20240611-120710-ladsgroup.json
12:07 sfaci@deploy1002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
12:06 claime: Finished kafka-main reboots in codfw
12:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-main-codfw
12:05 sfaci@deploy1002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
12:05 sfaci@deploy1002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
12:04 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts stat1005.eqiad.wmnet
12:04 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:04 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: stat1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
12:04 moritzm: rebalance ganeti cluster in ulsfo following reboots
12:04 sfaci@deploy1002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
12:03 sfaci@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
12:02 sfaci@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
12:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4005.ulsfo.wmnet
12:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4005.ulsfo.wmnet
11:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: repl issues
11:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: repl issues
11:57 sfaci@deploy1002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
11:55 sfaci@deploy1002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
11:55 sfaci@deploy1002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
11:55 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: stat1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
11:54 sfaci@deploy1002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
11:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64624 and previous config saved to /var/cache/conftool/dbconfig/20240611-115203-ladsgroup.json
11:51 jayme: removed similar-users deployments from all k8s clusters - T345274
11:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64621 and previous config saved to /var/cache/conftool/dbconfig/20240611-113656-ladsgroup.json
11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T364069)', diff saved to https://phabricator.wikimedia.org/P64620 and previous config saved to /var/cache/conftool/dbconfig/20240611-113452-marostegui.json
11:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
11:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T364069)', diff saved to https://phabricator.wikimedia.org/P64619 and previous config saved to /var/cache/conftool/dbconfig/20240611-113430-marostegui.json
11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1030.eqiad.wmnet
11:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64618 and previous config saved to /var/cache/conftool/dbconfig/20240611-113121-root.json
11:29 moritzm: failover ganeti master in ulsfo to ganeti4008
11:27 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: stat1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
11:26 klausman@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
11:24 klausman@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
11:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
11:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
11:23 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:22 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64617 and previous config saved to /var/cache/conftool/dbconfig/20240611-112149-ladsgroup.json
11:21 klausman@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
11:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P64616 and previous config saved to /var/cache/conftool/dbconfig/20240611-111922-marostegui.json
11:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64615 and previous config saved to /var/cache/conftool/dbconfig/20240611-111616-root.json
11:15 klausman@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
11:13 jayme: removing similar-users service - T345274
11:12 btullis@cumin1002: START - Cookbook sre.dns.netbox
11:09 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1015.eqiad.wmnet,service=s4
11:09 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1015.eqiad.wmnet,service=s6
11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
11:07 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1015.eqiad.wmnet
11:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4007.ulsfo.wmnet
11:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet
11:06 cgoubert@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-main-codfw
11:05 klausman@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
11:05 claime: Starting kafka-main reboots in codfw
11:04 btullis@cumin1002: START - Cookbook sre.hosts.decommission for hosts stat1004.eqiad.wmnet
11:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P64614 and previous config saved to /var/cache/conftool/dbconfig/20240611-110414-marostegui.json
11:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet
10:57 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
10:57 klausman@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
10:50 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4007.ulsfo.wmnet
10:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T364069)', diff saved to https://phabricator.wikimedia.org/P64613 and previous config saved to /var/cache/conftool/dbconfig/20240611-104908-marostegui.json
10:48 marostegui: dbmaint codfw s5 deploy schema change on db2123 T364069
10:48 marostegui: dbmaint codfw s5 deploy schema change on db2123 T364299
10:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2123.codfw.wmnet with reason: Long schema change
10:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2123.codfw.wmnet with reason: Long schema change
10:45 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1015.eqiad.wmnet
10:45 claime: move 90% of traffic to mw-on-k8s - T362323
10:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2123 T367145', diff saved to https://phabricator.wikimedia.org/P64612 and previous config saved to /var/cache/conftool/dbconfig/20240611-104336-root.json
10:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4006.ulsfo.wmnet
10:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet
10:42 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2213 to s5 primary T367145', diff saved to https://phabricator.wikimedia.org/P64611 and previous config saved to /var/cache/conftool/dbconfig/20240611-104232-root.json
10:42 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
10:42 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
10:42 marostegui: Starting s5 codfw failover from db2123 to db2213 - T367145
10:41 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
10:40 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1015.eqiad.wmnet,service=s6
10:40 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1015.eqiad.wmnet,service=s4
10:39 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
10:39 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
10:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
10:38 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
10:38 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
10:37 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
10:37 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
10:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet
10:34 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
10:32 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
10:29 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2213 from API/vslow/dump T367145', diff saved to https://phabricator.wikimedia.org/P64610 and previous config saved to /var/cache/conftool/dbconfig/20240611-102900-root.json
10:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s5 T367145
10:28 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2213 with weight 0 T367145', diff saved to https://phabricator.wikimedia.org/P64609 and previous config saved to /var/cache/conftool/dbconfig/20240611-102820-root.json
10:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s5 T367145
10:27 jayme@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
10:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64608 and previous config saved to /var/cache/conftool/dbconfig/20240611-102444-ladsgroup.json
10:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
10:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
10:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64607 and previous config saved to /var/cache/conftool/dbconfig/20240611-102125-ladsgroup.json
10:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
10:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
10:20 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4006.ulsfo.wmnet
10:18 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1030.eqiad.wmnet
10:16 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
10:16 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
10:16 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
10:16 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1014.eqiad.wmnet,service=s7
10:16 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1014.eqiad.wmnet,service=s2
10:16 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
10:15 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1029.eqiad.wmnet
10:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
10:15 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1014.eqiad.wmnet
10:15 jayme@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
10:14 filippo@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-logging-eqiad
10:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T360332)', diff saved to https://phabricator.wikimedia.org/P64606 and previous config saved to /var/cache/conftool/dbconfig/20240611-101400-arnaudb.json
10:11 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
10:10 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
10:10 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
10:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
10:09 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
10:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx1001.wikimedia.org
10:08 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
10:08 jayme@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
10:07 brouberol@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
10:07 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:06 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:06 brouberol@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
10:06 brouberol@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
10:06 brouberol@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
10:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mx1001.wikimedia.org
10:04 brouberol@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
10:04 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1014.eqiad.wmnet
10:03 brouberol@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
10:02 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:02 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
10:02 brouberol@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:01 jmm@cumin2002: END (PASS) - Cookbook sre.pki.restart-reboot (exit_code=0) rolling reboot on A:pki
10:01 brouberol@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
10:01 brouberol@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
10:00 brouberol@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
10:00 sukhe: [end] running authdns-update to send Bolivia (BO) and Paraguay (PY) to magru: T346722
09:59 brouberol@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
09:59 brouberol@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
09:59 sukhe: [start] running authdns-update to send Bolivia (BO) and Paraguay (PY) to magru
09:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64605 and previous config saved to /var/cache/conftool/dbconfig/20240611-095853-arnaudb.json
09:58 brouberol@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:58 brouberol@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
09:57 brouberol@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
09:57 brouberol@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
09:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1029.eqiad.wmnet
09:56 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1014.eqiad.wmnet,service=s2
09:56 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1014.eqiad.wmnet,service=s7
09:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1028.eqiad.wmnet
09:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
09:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
09:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet. on all recursors
09:49 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet. on all recursors
09:45 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
09:44 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
09:44 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
09:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64604 and previous config saved to /var/cache/conftool/dbconfig/20240611-094347-arnaudb.json
09:43 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
09:42 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet. on all recursors
09:42 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet. on all recursors
09:42 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
09:42 jmm@cumin2002: START - Cookbook sre.pki.restart-reboot rolling reboot on A:pki
09:41 jayme@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
09:37 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp2027.codfw.wmnet
09:36 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
09:35 moritzm: rebalance ganeti clusters in codfw following reboots
09:34 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
09:34 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
09:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T360332)', diff saved to https://phabricator.wikimedia.org/P64603 and previous config saved to /var/cache/conftool/dbconfig/20240611-092839-arnaudb.json
09:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx2001.wikimedia.org
09:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1026.eqiad.wmnet
09:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet
09:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T360332)', diff saved to https://phabricator.wikimedia.org/P64602 and previous config saved to /var/cache/conftool/dbconfig/20240611-092504-arnaudb.json
09:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1222.eqiad.wmnet with reason: Maintenance
09:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1222.eqiad.wmnet with reason: Maintenance
09:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mx2001.wikimedia.org
09:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet
09:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2022.codfw.wmnet
09:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
09:16 filippo@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-logging-eqiad
09:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
09:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1026.eqiad.wmnet
09:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet
09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
09:04 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2022.codfw.wmnet
09:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
08:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2032.codfw.wmnet
08:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2032.codfw.wmnet
08:53 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
08:53 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
08:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2032.codfw.wmnet
08:47 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
08:46 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
08:46 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
08:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2032.codfw.wmnet
08:46 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
08:45 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
08:45 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
08:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2031.codfw.wmnet
08:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2031.codfw.wmnet
08:41 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet
08:38 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki enwiki --current --all --touched-after=20240524120000 --start '["55019880"]' 2>&1 | tee -a ~/T315510-enwiki-8; date
08:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2031.codfw.wmnet
08:33 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp2027.ulsfo.wmnet
08:32 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2027.codfw.wmnet
08:31 marostegui: Install 10.11 on db1153 (non used x2 replica) T365805
08:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1153.eqiad.wmnet with reason: Long schema change
08:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1153.eqiad.wmnet with reason: Long schema change
08:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2031.codfw.wmnet
08:31 filippo@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-logging-codfw
08:30 marostegui: Install 10.11 on db1153 (non used x2 replioca)
08:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1222 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64600 and previous config saved to /var/cache/conftool/dbconfig/20240611-081314-root.json
08:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1024.eqiad.wmnet
08:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1024.eqiad.wmnet
08:02 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
08:02 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
07:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1024.eqiad.wmnet
07:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1222 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64599 and previous config saved to /var/cache/conftool/dbconfig/20240611-075809-root.json
07:55 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2029.codfw.wmnet
07:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2030.codfw.wmnet
07:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2030.codfw.wmnet
07:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2030.codfw.wmnet
07:47 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1024.eqiad.wmnet
07:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1023.eqiad.wmnet
07:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2030.codfw.wmnet
07:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
07:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1222 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64598 and previous config saved to /var/cache/conftool/dbconfig/20240611-074304-root.json
07:40 kart_: Updated MinT to 2024-06-11-052620-production (T364122, T346226, T357548)
07:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64597 and previous config saved to /var/cache/conftool/dbconfig/20240611-074009-root.json
07:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
07:37 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
07:36 filippo@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-logging-codfw
07:28 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
07:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1222 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64596 and previous config saved to /var/cache/conftool/dbconfig/20240611-072758-root.json
07:26 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
07:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64595 and previous config saved to /var/cache/conftool/dbconfig/20240611-072504-root.json
07:18 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
07:17 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
07:13 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
07:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1222 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64594 and previous config saved to /var/cache/conftool/dbconfig/20240611-071253-root.json
07:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1023.eqiad.wmnet
07:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64593 and previous config saved to /var/cache/conftool/dbconfig/20240611-070958-root.json
07:05 arnaudb@deploy1002: Finished scap: Backport for Revert "dbconfig: temporary disable writes on es6" (duration: 11m 36s)
07:02 moritzm: failover ganeti master in codfw to ganeti2020
06:57 arnaudb@deploy1002: arnaudb: Continuing with sync
06:56 arnaudb@deploy1002: arnaudb: Backport for Revert "dbconfig: temporary disable writes on es6" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
06:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64592 and previous config saved to /var/cache/conftool/dbconfig/20240611-065453-root.json
06:54 arnaudb@deploy1002: Started scap: Backport for Revert "dbconfig: temporary disable writes on es6"
06:40 arnaudb@cumin1002: dbctl commit (dc=all): 'mimic weight', diff saved to https://phabricator.wikimedia.org/P64591 and previous config saved to /var/cache/conftool/dbconfig/20240611-064041-arnaudb.json
06:40 oblivian@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: incident in progress, blocking deploys --joe (duration: 15m 33s)
06:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64590 and previous config saved to /var/cache/conftool/dbconfig/20240611-063947-root.json
06:39 arnaudb@cumin1002: dbctl commit (dc=all): 'mimic weight', diff saved to https://phabricator.wikimedia.org/P64589 and previous config saved to /var/cache/conftool/dbconfig/20240611-063903-arnaudb.json
06:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote es1037 to es6 primary T367055', diff saved to https://phabricator.wikimedia.org/P64588 and previous config saved to /var/cache/conftool/dbconfig/20240611-063109-arnaudb.json
06:30 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
06:30 arnaudb: Starting es6 eqiad failover from es1038 to es1037 - T367055
06:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64587 and previous config saved to /var/cache/conftool/dbconfig/20240611-062441-root.json
06:24 oblivian@deploy1002: Locking from deployment [ALL REPOSITORIES]: incident in progress, blocking deploys --joe
06:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Set es1037 with weight 0 T367055', diff saved to https://phabricator.wikimedia.org/P64586 and previous config saved to /var/cache/conftool/dbconfig/20240611-062353-arnaudb.json
06:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es6 T367055
06:23 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es6 T367055
06:19 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
06:14 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64585 and previous config saved to /var/cache/conftool/dbconfig/20240611-061413-root.json
06:12 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
06:11 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
06:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64584 and previous config saved to /var/cache/conftool/dbconfig/20240611-060935-root.json
06:09 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
06:07 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
06:07 arnaudb@deploy1002: Finished scap: Backport for dbconfig: temporary disable writes on es6 (T367055) (duration: 15m 42s)
05:59 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64583 and previous config saved to /var/cache/conftool/dbconfig/20240611-055907-root.json
05:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: maintenance
05:58 arnaudb@deploy1002: arnaudb: Continuing with sync
05:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: maintenance
05:58 arnaudb@cumin1002: dbctl commit (dc=all): 'depool db1233', diff saved to https://phabricator.wikimedia.org/P64582 and previous config saved to /var/cache/conftool/dbconfig/20240611-055816-arnaudb.json
05:56 arnaudb@deploy1002: arnaudb: Backport for dbconfig: temporary disable writes on es6 (T367055) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
05:51 arnaudb@deploy1002: Started scap: Backport for dbconfig: temporary disable writes on es6 (T367055)
05:44 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64581 and previous config saved to /var/cache/conftool/dbconfig/20240611-054401-root.json
05:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64580 and previous config saved to /var/cache/conftool/dbconfig/20240611-052856-root.json
05:24 marostegui: dbmaint eqiad s3 deploy schema change on db1223 T364069
05:22 marostegui: dbmaint eqiad s3 deploy schema change on db1223 T364299
05:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1223.eqiad.wmnet with reason: Long schema change
05:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1223.eqiad.wmnet with reason: Long schema change
05:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1223 T367140', diff saved to https://phabricator.wikimedia.org/P64579 and previous config saved to /var/cache/conftool/dbconfig/20240611-052101-root.json
05:20 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1157 to s3 primary and set section read-write T367140', diff saved to https://phabricator.wikimedia.org/P64578 and previous config saved to /var/cache/conftool/dbconfig/20240611-052000-root.json
05:19 marostegui@cumin1002: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - T367140', diff saved to https://phabricator.wikimedia.org/P64577 and previous config saved to /var/cache/conftool/dbconfig/20240611-051941-root.json
05:19 marostegui: Starting s3 eqiad failover from db1223 to db1157 - T367140
05:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64576 and previous config saved to /var/cache/conftool/dbconfig/20240611-051351-root.json
05:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s3 T367140
05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1157 with weight 0 T367140', diff saved to https://phabricator.wikimedia.org/P64575 and previous config saved to /var/cache/conftool/dbconfig/20240611-050351-root.json
05:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s3 T367140
04:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64574 and previous config saved to /var/cache/conftool/dbconfig/20240611-045845-root.json
04:57 marostegui: dbmaint eqiad s2 deploy schema change on db1222 T364299
04:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1222.eqiad.wmnet with reason: Long schema change
04:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1222.eqiad.wmnet with reason: Long schema change
04:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1222 T366687', diff saved to https://phabricator.wikimedia.org/P64573 and previous config saved to /var/cache/conftool/dbconfig/20240611-045447-root.json
04:54 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1162 to s2 primary and set section read-write T366687', diff saved to https://phabricator.wikimedia.org/P64572 and previous config saved to /var/cache/conftool/dbconfig/20240611-045359-root.json
04:53 marostegui@cumin1002: dbctl commit (dc=all): 'Set s2 eqiad as read-only for maintenance - T366687', diff saved to https://phabricator.wikimedia.org/P64571 and previous config saved to /var/cache/conftool/dbconfig/20240611-045341-root.json
04:53 marostegui: Starting s2 eqiad failover from db1222 to db1162 - T366687
04:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2122 (T364069)', diff saved to https://phabricator.wikimedia.org/P64570 and previous config saved to /var/cache/conftool/dbconfig/20240611-044616-marostegui.json
04:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
04:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
04:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64569 and previous config saved to /var/cache/conftool/dbconfig/20240611-044339-root.json
04:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s2 T366687
04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1162 with weight 0 T366687', diff saved to https://phabricator.wikimedia.org/P64568 and previous config saved to /var/cache/conftool/dbconfig/20240611-043333-marostegui.json
04:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s2 T366687
04:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T352010)', diff saved to https://phabricator.wikimedia.org/P64567 and previous config saved to /var/cache/conftool/dbconfig/20240611-041938-ladsgroup.json
04:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P64566 and previous config saved to /var/cache/conftool/dbconfig/20240611-040432-ladsgroup.json
04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.6 (duration: 01m 05s)
04:00 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.9 refs T361403 (duration: 57m 19s)
03:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P64565 and previous config saved to /var/cache/conftool/dbconfig/20240611-034925-ladsgroup.json
03:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T352010)', diff saved to https://phabricator.wikimedia.org/P64564 and previous config saved to /var/cache/conftool/dbconfig/20240611-033418-ladsgroup.json
03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.43.0-wmf.9 refs T361403
00:40 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:restbase-eqiad

2024-06-10

23:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
23:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
22:36 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:36 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
22:30 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:30 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:28 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:27 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
22:25 reedy@deploy1002: Synchronized wmf-config/: sync interwiki lists (duration: 10m 07s)
22:14 reedy@deploy1002: Synchronized langlist-labs: Add fr and bn (duration: 14m 29s)
21:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
21:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
21:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T364069)', diff saved to https://phabricator.wikimedia.org/P64563 and previous config saved to /var/cache/conftool/dbconfig/20240610-215622-marostegui.json
21:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P64562 and previous config saved to /var/cache/conftool/dbconfig/20240610-214115-marostegui.json
21:27 eevans@cumin1002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:restbase-eqiad
21:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P64561 and previous config saved to /var/cache/conftool/dbconfig/20240610-212608-marostegui.json
21:19 ejegg: fundraising python tools upgraded from 8c98b674 to c51f6e62
21:19 ejegg: Standalone SmashPig upgraded from edf573bb to 1d1b770c
21:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T364069)', diff saved to https://phabricator.wikimedia.org/P64560 and previous config saved to /var/cache/conftool/dbconfig/20240610-211101-marostegui.json
20:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T352010)', diff saved to https://phabricator.wikimedia.org/P64559 and previous config saved to /var/cache/conftool/dbconfig/20240610-204622-ladsgroup.json
20:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
20:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
20:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64558 and previous config saved to /var/cache/conftool/dbconfig/20240610-204600-ladsgroup.json
20:36 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
20:36 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
20:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P64557 and previous config saved to /var/cache/conftool/dbconfig/20240610-203053-ladsgroup.json
20:30 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
20:30 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
20:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P64556 and previous config saved to /var/cache/conftool/dbconfig/20240610-201546-ladsgroup.json
20:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64555 and previous config saved to /var/cache/conftool/dbconfig/20240610-200039-ladsgroup.json
19:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1224 (T364069)', diff saved to https://phabricator.wikimedia.org/P64554 and previous config saved to /var/cache/conftool/dbconfig/20240610-195826-marostegui.json
19:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
19:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
19:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T364069)', diff saved to https://phabricator.wikimedia.org/P64553 and previous config saved to /var/cache/conftool/dbconfig/20240610-195804-marostegui.json
19:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P64552 and previous config saved to /var/cache/conftool/dbconfig/20240610-194256-marostegui.json
19:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P64551 and previous config saved to /var/cache/conftool/dbconfig/20240610-192749-marostegui.json
19:22 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
19:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T364069)', diff saved to https://phabricator.wikimedia.org/P64550 and previous config saved to /var/cache/conftool/dbconfig/20240610-191242-marostegui.json
19:02 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
19:02 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
18:11 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
17:50 amastilovic@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
17:50 amastilovic@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
17:47 amastilovic@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
17:46 amastilovic@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
17:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T364069)', diff saved to https://phabricator.wikimedia.org/P64547 and previous config saved to /var/cache/conftool/dbconfig/20240610-174349-marostegui.json
17:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
17:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
17:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T364069)', diff saved to https://phabricator.wikimedia.org/P64546 and previous config saved to /var/cache/conftool/dbconfig/20240610-174327-marostegui.json
17:37 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
17:36 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
17:30 dancy@deploy1002: Installation of scap version "4.87.0" completed for 285 hosts
17:29 amastilovic@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
17:29 amastilovic@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
17:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P64545 and previous config saved to /var/cache/conftool/dbconfig/20240610-172820-marostegui.json
17:25 dancy@deploy1002: Installing scap version "4.87.0" for 285 hosts
17:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P64544 and previous config saved to /var/cache/conftool/dbconfig/20240610-171313-marostegui.json
17:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
17:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
16:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T364069)', diff saved to https://phabricator.wikimedia.org/P64543 and previous config saved to /var/cache/conftool/dbconfig/20240610-165806-marostegui.json
16:26 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
16:21 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
16:20 marostegui: Drop flaggedpage_pending from s1 T365568
16:05 cdanis: 💙[email protected] ~ 🕛☕ sudo cumin -b 8 '*.codfw.wmnet and C:geoip::data::puppet%fetch_ipinfo_dbs=true' 'sha512sum /usr/share/GeoIPInfo/GeoLite2-ASN.mmdb || run-puppet-agent'
16:01 cdanis: 💙[email protected] ~ 🕛☕ sudo systemctl restart sync-puppet-volatile
16:00 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
16:00 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:cassandra-dev
15:54 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
15:47 marostegui: Drop flaggedpage_pending from s3 T365568
15:46 marostegui: Drop flaggedpage_pending from s5 T365568
15:43 marostegui: Drop flaggedpage_pending from s2 T365568
15:42 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
15:42 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
15:41 godog: bounce benthos@mw_accesslog_metrics.service on centrallog hosts
15:41 marostegui: Drop flaggedpage_pending from s7 T365568
15:40 marostegui: Drop flaggedpage_pending from s6 T365568
15:34 ladsgroup@deploy1002: Synchronized portals: (no justification provided) (duration: 11m 20s)
15:31 eevans@cumin1002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:cassandra-dev
15:31 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
15:29 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
15:22 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: (no justification provided) (duration: 10m 28s)
15:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2024.codfw.wmnet
15:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2024.codfw.wmnet
15:05 cdobbins@cumin1002: conftool action : set/pooled=yes; selector: name=4046.ulsfo.wmnet
15:04 ladsgroup@deploy1002: Finished scap: Backport for errorpages: Add dark mode support (duration: 17m 15s)
15:03 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
15:02 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
15:02 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
15:02 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
15:02 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
15:01 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
15:01 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
15:01 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
15:01 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4046.ulsfo.wmnet
15:01 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
15:01 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
15:00 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
15:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
14:59 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
14:59 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
14:58 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
14:58 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
14:57 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
14:57 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
14:56 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
14:56 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
14:56 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
14:56 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs1010.eqiad.wmnet: Apply update to Java 11 - eevans@cumin1002
14:56 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
14:55 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
14:55 ladsgroup@deploy1002: ladsgroup and ebrahim: Continuing with sync
14:54 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
14:54 ladsgroup@deploy1002: ladsgroup and ebrahim: Backport for errorpages: Add dark mode support synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2024.codfw.wmnet
14:53 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
14:53 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
14:52 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
14:52 moritzm: powercycling ganeti1019, stuck on reboot
14:52 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
14:52 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
14:52 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
14:52 ChrisDobbins901_: sudo -i cookbook sre.hosts.reboot-single -r 'Kernel upgrade' 'P{cp4046.*}'
14:51 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
14:51 cdobbins@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4046.ulsfo.wmnet
14:51 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
14:51 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
14:51 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
14:50 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
14:50 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
14:50 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
14:49 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
14:48 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
14:48 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs1010.eqiad.wmnet: Apply update to Java 11 - eevans@cumin1002
14:47 urandom: aqs1010: restarting cassandra to apply upgrade to Java 11 — T350567
14:47 ladsgroup@deploy1002: Started scap: Backport for errorpages: Add dark mode support
14:46 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=cp4046.ulsfo.wmnet
14:46 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:45 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
14:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2023.codfw.wmnet
14:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T364069)', diff saved to https://phabricator.wikimedia.org/P64539 and previous config saved to /var/cache/conftool/dbconfig/20240610-144501-marostegui.json
14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
14:44 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
14:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
14:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64538 and previous config saved to /var/cache/conftool/dbconfig/20240610-144439-marostegui.json
14:44 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
14:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic1107.eqiad.wmnet with reason: T365982
14:43 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
14:43 swfrench@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
14:43 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic1107.eqiad.wmnet with reason: T365982
14:42 swfrench@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
14:41 swfrench@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
14:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1019.eqiad.wmnet
14:41 swfrench@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
14:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2023.codfw.wmnet
14:39 swfrench@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
14:38 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
14:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
14:36 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
14:36 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
14:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1019.eqiad.wmnet
14:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
14:31 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2023.codfw.wmnet
14:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
14:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P64537 and previous config saved to /var/cache/conftool/dbconfig/20240610-142931-marostegui.json
14:23 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
14:23 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
14:19 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
14:19 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
14:19 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/datasets-config: apply
14:19 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/datasets-config: apply
14:18 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/datasets-config: apply
14:18 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/datasets-config-next: apply
14:18 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/datasets-config-next: apply
14:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P64536 and previous config saved to /var/cache/conftool/dbconfig/20240610-141422-marostegui.json
14:11 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
14:10 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
13:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64535 and previous config saved to /var/cache/conftool/dbconfig/20240610-135914-marostegui.json
13:57 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1107.eqiad.wmnet for T348977 - bking@cumin2002
13:57 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1107.eqiad.wmnet for T348977 - bking@cumin2002
13:57 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic1107 for T348977 - bking@cumin2002
13:57 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1107 for T348977 - bking@cumin2002
13:50 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4047.ulsfo.wmnet
13:49 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
13:48 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
13:47 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
13:47 taavi@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad and A:lvs
13:47 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
13:46 taavi@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad and A:lvs
13:43 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/echoserver: apply
13:43 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/echoserver: apply
13:42 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
13:42 elukey@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
13:37 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
13:36 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
13:36 elukey: move recommendation-api on wikikube to prometheus metrics (offboarded from statsd) - T205870
13:36 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
13:35 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
13:34 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
13:34 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
13:30 marostegui: dbmaint codfw s4 deploy schema change on db2140 T364069
13:29 taavi: taavi@mw1447 ~ $ sudo /usr/local/sbin/restart-php-fpm-all php7.4-fpm 9223372 # leftover from me restarting LVS during deployment
13:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Long schema change
13:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Long schema change
13:27 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
13:26 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
13:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64534 and previous config saved to /var/cache/conftool/dbconfig/20240610-132619-ladsgroup.json
13:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
13:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
13:25 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
13:25 elukey@deploy1002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
13:20 ladsgroup@deploy1002: Finished scap: Backport for [huwiki] Add "suppressredirect" user right to editor user group (T366438) (duration: 15m 05s)
13:19 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4047.ulsfo.wmnet
13:18 taavi@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.restart-pybal (exit_code=99) rolling-restart of pybal on A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad and A:lvs
13:18 taavi@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad and A:lvs
13:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2021.codfw.wmnet
13:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2021.codfw.wmnet
13:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1018.eqiad.wmnet
13:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1018.eqiad.wmnet
13:11 taavi: restarting eqiad low-traffic LVS for https://gerrit.wikimedia.org/r/c/operations/puppet/+/941459
13:11 ladsgroup@deploy1002: ladsgroup and gergesshamon: Continuing with sync
13:10 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4047.ulsfo.wmnet
13:10 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:09 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4047.ulsfo.wmnet
13:09 fabfur: rebooting cp4047 (T366555)
13:09 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:08 ladsgroup@deploy1002: ladsgroup and gergesshamon: Backport for [huwiki] Add "suppressredirect" user right to editor user group (T366438) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
13:07 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
13:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2021.codfw.wmnet
13:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1018.eqiad.wmnet
13:05 ladsgroup@deploy1002: Started scap: Backport for [huwiki] Add "suppressredirect" user right to editor user group (T366438)
13:04 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
13:04 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
13:03 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
13:03 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
13:01 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
13:01 elukey@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
12:58 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
12:58 elukey@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
12:55 fabfur: repooling text@drmrs (IPIP encapsulation enabled) (T366466)
12:53 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
12:50 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
12:50 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
12:49 elukey@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
12:48 elukey@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
12:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1018.eqiad.wmnet
12:46 elukey@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
12:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2021.codfw.wmnet
12:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2020.codfw.wmnet
12:44 elukey@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
12:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2020.codfw.wmnet
12:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1017.eqiad.wmnet
12:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1017.eqiad.wmnet
12:43 elukey@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
12:41 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
12:40 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
12:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2020.codfw.wmnet
12:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1017.eqiad.wmnet
12:30 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2020.codfw.wmnet
12:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64532 and previous config saved to /var/cache/conftool/dbconfig/20240610-122847-arnaudb.json
12:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2019.codfw.wmnet
12:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
12:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1017.eqiad.wmnet
12:20 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2019.codfw.wmnet
12:15 oblivian@deploy1002: Finished scap: Deploying change to base mediawiki image (take 2) (duration: 22m 39s)
12:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64531 and previous config saved to /var/cache/conftool/dbconfig/20240610-121341-arnaudb.json
12:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2018.codfw.wmnet
12:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2018.codfw.wmnet
11:58 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 50%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64530 and previous config saved to /var/cache/conftool/dbconfig/20240610-115834-arnaudb.json
11:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2018.codfw.wmnet
11:53 oblivian@deploy1002: Started scap: Deploying change to base mediawiki image (take 2)
11:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64528 and previous config saved to /var/cache/conftool/dbconfig/20240610-114957-marostegui.json
11:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
11:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
11:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T364069)', diff saved to https://phabricator.wikimedia.org/P64527 and previous config saved to /var/cache/conftool/dbconfig/20240610-114934-marostegui.json
11:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1016.eqiad.wmnet
11:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1016.eqiad.wmnet
11:44 oblivian@deploy1002: sync-world aborted: Deploying change to base mediawiki image (duration: 10m 21s)
11:43 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2018.codfw.wmnet
11:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 25%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64526 and previous config saved to /var/cache/conftool/dbconfig/20240610-114329-arnaudb.json
11:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1016.eqiad.wmnet
11:39 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
11:36 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
11:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2017.codfw.wmnet
11:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2017.codfw.wmnet
11:35 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
11:34 oblivian@deploy1002: Started scap: Deploying change to base mediawiki image
11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P64525 and previous config saved to /var/cache/conftool/dbconfig/20240610-113426-marostegui.json
11:34 oblivian@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: setting global lock while working on mw-on-k8s --joe. Ping me if you need urgent deployments (duration: 10m 22s)
11:32 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
11:29 fabfur: restarting pybal on lvs6003,lvs6001 to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1039947 (T366466)
11:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1016.eqiad.wmnet
11:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 10%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64524 and previous config saved to /var/cache/conftool/dbconfig/20240610-112821-arnaudb.json
11:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2017.codfw.wmnet
11:26 fabfur: enabling && running puppet on A:lvs-drmrs to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1039947 (T366466)
11:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1015.eqiad.wmnet
11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1015.eqiad.wmnet
11:23 oblivian@deploy1002: Locking from deployment [ALL REPOSITORIES]: setting global lock while working on mw-on-k8s --joe. Ping me if you need urgent deployments
11:19 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
11:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1015.eqiad.wmnet
11:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P64523 and previous config saved to /var/cache/conftool/dbconfig/20240610-111917-marostegui.json
11:19 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
11:19 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:18 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 5%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64522 and previous config saved to /var/cache/conftool/dbconfig/20240610-111315-arnaudb.json
10:47 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
10:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 1%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64519 and previous config saved to /var/cache/conftool/dbconfig/20240610-104303-arnaudb.json
10:41 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
10:41 fabfur: depooling text@drmrs to apply IPIP encapsulation patches (T366466)
10:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2016.codfw.wmnet
10:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2016.codfw.wmnet
10:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
10:27 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
10:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2016.codfw.wmnet
10:25 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
10:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2204 T367019', diff saved to https://phabricator.wikimedia.org/P64518 and previous config saved to /var/cache/conftool/dbconfig/20240610-102511-arnaudb.json
10:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1014.eqiad.wmnet
10:21 claime: repooled all active/active mediawiki services from codfw
10:21 cgoubert@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=api-ro,name=codfw
10:21 cgoubert@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=appservers-ro,name=codfw
10:21 cgoubert@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=mw-api-int-ro,name=codfw
10:21 cgoubert@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=mw-api-ext-ro,name=codfw
10:21 cgoubert@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=mw-web-ro,name=codfw
10:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1014.eqiad.wmnet
10:08 claime: depooled all active/active mediawiki services from codfw
10:08 cgoubert@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=api-ro,name=codfw
10:07 cgoubert@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=appservers-ro,name=codfw
10:06 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2016.codfw.wmnet
10:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
10:05 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
10:02 cgoubert@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=mw-api-int-ro,name=codfw
10:02 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
10:01 cgoubert@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=mw-api-ext-ro,name=codfw
10:01 cgoubert@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=mw-web-ro,name=codfw
10:01 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
09:57 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
09:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 26 hosts with reason: Issue from T367019
09:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5:00:00 on 26 hosts with reason: Issue from T367019
09:54 arnaudb@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 5:00:00 on 870 hosts with reason: Issue from T367019
09:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5:00:00 on 870 hosts with reason: Issue from T367019
09:53 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
09:53 jayme@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
09:47 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4048.ulsfo.wmnet
09:37 godog: roll upgrade prometheus-statsd-exporter to baremetal - T302373
09:34 taavi@deploy1002: Finished scap: Backport for Reapply "wikitech: Replace OSM class in Gerrit blocking hook" (duration: 11m 17s)
09:25 taavi@deploy1002: taavi: Continuing with sync
09:25 taavi@deploy1002: taavi: Backport for Reapply "wikitech: Replace OSM class in Gerrit blocking hook" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:24 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
09:24 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
09:22 taavi@deploy1002: Started scap: Backport for Reapply "wikitech: Replace OSM class in Gerrit blocking hook"
09:22 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
09:22 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
09:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T364069)', diff saved to https://phabricator.wikimedia.org/P64517 and previous config saved to /var/cache/conftool/dbconfig/20240610-091631-marostegui.json
09:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
09:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
09:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T364069)', diff saved to https://phabricator.wikimedia.org/P64516 and previous config saved to /var/cache/conftool/dbconfig/20240610-091606-marostegui.json
09:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db2207 to s2 primary T367019', diff saved to https://phabricator.wikimedia.org/P64515 and previous config saved to /var/cache/conftool/dbconfig/20240610-091506-arnaudb.json
09:14 arnaudb: Starting s2 codfw failover from db2204 to db2207 - T367019
09:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2015.codfw.wmnet
09:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2015.codfw.wmnet
09:01 godog: upload prometheus-statsd-exporter 0.26.1-1 to apt - T302373
09:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P64514 and previous config saved to /var/cache/conftool/dbconfig/20240610-090058-marostegui.json
09:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet
09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1013.eqiad.wmnet
08:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db2207 with weight 0 T367019', diff saved to https://phabricator.wikimedia.org/P64513 and previous config saved to /var/cache/conftool/dbconfig/20240610-085721-arnaudb.json
08:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s2 T367019
08:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s2 T367019
08:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64512 and previous config saved to /var/cache/conftool/dbconfig/20240610-085548-arnaudb.json
08:54 godog: upgrade prometheus-statsd-exporter on webperf - T302373
08:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1013.eqiad.wmnet
08:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2015.codfw.wmnet
08:51 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:50 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new entries for cr2-codfw peering to ssw1-d8-codfw - cmooney@cumin1002"
08:50 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new entries for cr2-codfw peering to ssw1-d8-codfw - cmooney@cumin1002"
08:48 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4048.ulsfo.wmnet
08:47 cmooney@cumin1002: START - Cookbook sre.dns.netbox
08:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet
08:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2015.codfw.wmnet
08:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P64511 and previous config saved to /var/cache/conftool/dbconfig/20240610-084550-marostegui.json
08:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2014.codfw.wmnet
08:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2014.codfw.wmnet
08:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet
08:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1012.eqiad.wmnet
08:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64510 and previous config saved to /var/cache/conftool/dbconfig/20240610-084042-arnaudb.json
08:39 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4048.ulsfo.wmnet
08:39 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4048.ulsfo.wmnet
08:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ping1004.eqiad.wmnet with OS bookworm
08:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2014.codfw.wmnet
08:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1012.eqiad.wmnet
08:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2013.codfw.wmnet
08:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1011.eqiad.wmnet
08:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ping1004.eqiad.wmnet with reason: host reimage
08:14 kostajh: UTC morning deploys done
08:13 kharlan@deploy1002: Finished scap: Backport for IPInfo: Switch to using GeoLite2 data (T361884) (duration: 14m 07s)
08:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 25%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64507 and previous config saved to /var/cache/conftool/dbconfig/20240610-081030-arnaudb.json
08:04 kharlan@deploy1002: kharlan: Continuing with sync
08:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1003.wikimedia.org with reason: Gerrit upgrade
08:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1003.wikimedia.org with reason: Gerrit upgrade
08:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit2002.wikimedia.org with reason: Gerrit upgrade
08:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit2002.wikimedia.org with reason: Gerrit upgrade
08:02 kharlan@deploy1002: kharlan: Backport for IPInfo: Switch to using GeoLite2 data (T361884) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:59 kharlan@deploy1002: Started scap: Backport for IPInfo: Switch to using GeoLite2 data (T361884)
07:58 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2013.codfw.wmnet
07:58 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet
07:57 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ping1004.eqiad.wmnet with OS bookworm
07:57 kharlan@deploy1002: kharlan: Backport for IPInfo: Switch to using GeoLite2 data (T361884) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:56 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
07:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 10%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64506 and previous config saved to /var/cache/conftool/dbconfig/20240610-075524-arnaudb.json
07:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet
07:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1010.eqiad.wmnet
07:53 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
07:53 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
07:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2012.codfw.wmnet
07:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2012.codfw.wmnet
07:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64505 and previous config saved to /var/cache/conftool/dbconfig/20240610-075056-root.json
07:50 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
07:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1010.eqiad.wmnet
07:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2207.codfw.wmnet
07:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2012.codfw.wmnet
07:43 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db2207.codfw.wmnet
07:41 arnaudb@cumin1002: dbctl commit (dc=all): 'depool db2207 maintenance', diff saved to https://phabricator.wikimedia.org/P64504 and previous config saved to /var/cache/conftool/dbconfig/20240610-074157-arnaudb.json
07:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: maintenance
07:41 kharlan@deploy1002: Started scap: Backport for IPInfo: Switch to using GeoLite2 data (T361884)
07:41 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: maintenance
07:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Revert db2207 with weight 500 T367019', diff saved to https://phabricator.wikimedia.org/P64503 and previous config saved to /var/cache/conftool/dbconfig/20240610-073838-arnaudb.json
07:37 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
07:37 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet
07:37 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1010.eqiad.wmnet
07:37 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet
07:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet
07:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1009.eqiad.wmnet
07:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64502 and previous config saved to /var/cache/conftool/dbconfig/20240610-073549-root.json
07:35 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
07:34 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
07:33 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
07:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2012.codfw.wmnet
07:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2011.codfw.wmnet
07:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2011.codfw.wmnet
07:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1009.eqiad.wmnet
07:26 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
07:25 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
07:24 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
07:23 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
07:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2011.codfw.wmnet
07:22 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
07:22 jayme@deploy1002: helmfile [staging] START helmfile.d/services/push-notifications: apply
07:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64501 and previous config saved to /var/cache/conftool/dbconfig/20240610-072043-root.json
07:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet
07:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2011.codfw.wmnet
07:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2010.codfw.wmnet
07:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet
07:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2010.codfw.wmnet
07:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64500 and previous config saved to /var/cache/conftool/dbconfig/20240610-070537-root.json
07:03 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2010.codfw.wmnet
07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T364069)', diff saved to https://phabricator.wikimedia.org/P64499 and previous config saved to /var/cache/conftool/dbconfig/20240610-070249-marostegui.json
07:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
07:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T364069)', diff saved to https://phabricator.wikimedia.org/P64498 and previous config saved to /var/cache/conftool/dbconfig/20240610-070224-marostegui.json
07:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2009.codfw.wmnet
06:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2009.codfw.wmnet
06:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
06:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
06:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
06:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
06:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64497 and previous config saved to /var/cache/conftool/dbconfig/20240610-065640-ladsgroup.json
06:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2009.codfw.wmnet
06:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64496 and previous config saved to /var/cache/conftool/dbconfig/20240610-065031-root.json
06:47 marostegui: dbmaint codfw s4 deploy schema change on db2140 T364299
06:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P64495 and previous config saved to /var/cache/conftool/dbconfig/20240610-064716-marostegui.json
06:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Long schema change
06:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Long schema change
06:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2009.codfw.wmnet
06:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P64494 and previous config saved to /var/cache/conftool/dbconfig/20240610-064132-ladsgroup.json
06:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db2207 with weight 0 T367019', diff saved to https://phabricator.wikimedia.org/P64493 and previous config saved to /var/cache/conftool/dbconfig/20240610-063912-arnaudb.json
06:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2140 T367017', diff saved to https://phabricator.wikimedia.org/P64492 and previous config saved to /var/cache/conftool/dbconfig/20240610-063904-root.json
06:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s2 T367019
06:38 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2179 to s4 primary T367017', diff saved to https://phabricator.wikimedia.org/P64491 and previous config saved to /var/cache/conftool/dbconfig/20240610-063830-root.json
06:38 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s2 T367019
06:38 marostegui: Starting s4 codfw failover from db2140 to db2179 - T367017
06:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64490 and previous config saved to /var/cache/conftool/dbconfig/20240610-063524-root.json
06:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P64489 and previous config saved to /var/cache/conftool/dbconfig/20240610-063208-marostegui.json
06:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P64488 and previous config saved to /var/cache/conftool/dbconfig/20240610-062624-ladsgroup.json
06:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64487 and previous config saved to /var/cache/conftool/dbconfig/20240610-062017-root.json
06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2179 from API/vslow/dump T367017', diff saved to https://phabricator.wikimedia.org/P64486 and previous config saved to /var/cache/conftool/dbconfig/20240610-061939-root.json
06:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s4 T367017
06:18 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2179 with weight 0 T367017', diff saved to https://phabricator.wikimedia.org/P64485 and previous config saved to /var/cache/conftool/dbconfig/20240610-061849-root.json
06:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s4 T367017
06:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T364069)', diff saved to https://phabricator.wikimedia.org/P64484 and previous config saved to /var/cache/conftool/dbconfig/20240610-061658-marostegui.json
06:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64483 and previous config saved to /var/cache/conftool/dbconfig/20240610-061116-ladsgroup.json
05:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
05:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
05:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T352010)', diff saved to https://phabricator.wikimedia.org/P64482 and previous config saved to /var/cache/conftool/dbconfig/20240610-052941-ladsgroup.json
05:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P64481 and previous config saved to /var/cache/conftool/dbconfig/20240610-051432-ladsgroup.json
05:13 marostegui: dbmaint codfw s7 deploy schema change on db2218 T364299
05:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Long schema change
05:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Long schema change
05:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2218 T366875', diff saved to https://phabricator.wikimedia.org/P64480 and previous config saved to /var/cache/conftool/dbconfig/20240610-050738-root.json
05:06 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2121 to s7 primary T366875', diff saved to https://phabricator.wikimedia.org/P64479 and previous config saved to /var/cache/conftool/dbconfig/20240610-050637-marostegui.json
05:06 marostegui: Starting s7 codfw failover from db2218 to db2121 - T366875
04:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P64478 and previous config saved to /var/cache/conftool/dbconfig/20240610-045922-ladsgroup.json
04:52 kart_: Updated Apertium to 2024-06-07-143238-production (T356252)
04:49 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
04:49 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply
04:44 marostegui: Rename flaggedpage_pending in s5 T365568
04:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T352010)', diff saved to https://phabricator.wikimedia.org/P64477 and previous config saved to /var/cache/conftool/dbconfig/20240610-044414-ladsgroup.json
04:42 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
04:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/apertium: apply
04:37 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply
04:37 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2121 from API/vslow/dump T366875', diff saved to https://phabricator.wikimedia.org/P64476 and previous config saved to /var/cache/conftool/dbconfig/20240610-043741-root.json
04:37 kartik@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply
04:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T366875
04:36 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2121 with weight 0 T366875', diff saved to https://phabricator.wikimedia.org/P64475 and previous config saved to /var/cache/conftool/dbconfig/20240610-043649-root.json
04:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s7 T366875
04:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T364069)', diff saved to https://phabricator.wikimedia.org/P64474 and previous config saved to /var/cache/conftool/dbconfig/20240610-043615-marostegui.json
04:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
04:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
04:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
04:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance

2024-06-09

23:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64473 and previous config saved to /var/cache/conftool/dbconfig/20240609-234110-ladsgroup.json
23:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
23:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
23:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T352010)', diff saved to https://phabricator.wikimedia.org/P64472 and previous config saved to /var/cache/conftool/dbconfig/20240609-234047-ladsgroup.json
23:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
23:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
23:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T352010)', diff saved to https://phabricator.wikimedia.org/P64471 and previous config saved to /var/cache/conftool/dbconfig/20240609-232921-ladsgroup.json
23:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P64470 and previous config saved to /var/cache/conftool/dbconfig/20240609-232539-ladsgroup.json
23:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P64469 and previous config saved to /var/cache/conftool/dbconfig/20240609-231413-ladsgroup.json
23:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P64468 and previous config saved to /var/cache/conftool/dbconfig/20240609-231031-ladsgroup.json
22:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P64467 and previous config saved to /var/cache/conftool/dbconfig/20240609-225905-ladsgroup.json
22:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T352010)', diff saved to https://phabricator.wikimedia.org/P64466 and previous config saved to /var/cache/conftool/dbconfig/20240609-225523-ladsgroup.json
22:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T352010)', diff saved to https://phabricator.wikimedia.org/P64465 and previous config saved to /var/cache/conftool/dbconfig/20240609-224357-ladsgroup.json
19:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T352010)', diff saved to https://phabricator.wikimedia.org/P64464 and previous config saved to /var/cache/conftool/dbconfig/20240609-192428-ladsgroup.json
19:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
19:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
19:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T352010)', diff saved to https://phabricator.wikimedia.org/P64463 and previous config saved to /var/cache/conftool/dbconfig/20240609-192404-ladsgroup.json
19:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P64462 and previous config saved to /var/cache/conftool/dbconfig/20240609-190856-ladsgroup.json
18:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P64461 and previous config saved to /var/cache/conftool/dbconfig/20240609-185347-ladsgroup.json
18:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T352010)', diff saved to https://phabricator.wikimedia.org/P64460 and previous config saved to /var/cache/conftool/dbconfig/20240609-183839-ladsgroup.json
16:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T364299)', diff saved to https://phabricator.wikimedia.org/P64459 and previous config saved to /var/cache/conftool/dbconfig/20240609-160621-marostegui.json
15:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P64458 and previous config saved to /var/cache/conftool/dbconfig/20240609-155113-marostegui.json
15:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P64457 and previous config saved to /var/cache/conftool/dbconfig/20240609-153605-marostegui.json
15:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T364299)', diff saved to https://phabricator.wikimedia.org/P64456 and previous config saved to /var/cache/conftool/dbconfig/20240609-152057-marostegui.json
15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T352010)', diff saved to https://phabricator.wikimedia.org/P64455 and previous config saved to /var/cache/conftool/dbconfig/20240609-152020-ladsgroup.json
15:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
15:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
15:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T352010)', diff saved to https://phabricator.wikimedia.org/P64454 and previous config saved to /var/cache/conftool/dbconfig/20240609-151956-ladsgroup.json
15:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P64453 and previous config saved to /var/cache/conftool/dbconfig/20240609-150448-ladsgroup.json
14:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P64452 and previous config saved to /var/cache/conftool/dbconfig/20240609-144940-ladsgroup.json
14:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T352010)', diff saved to https://phabricator.wikimedia.org/P64451 and previous config saved to /var/cache/conftool/dbconfig/20240609-143432-ladsgroup.json
14:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T352010)', diff saved to https://phabricator.wikimedia.org/P64450 and previous config saved to /var/cache/conftool/dbconfig/20240609-143128-ladsgroup.json
14:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
14:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
14:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T352010)', diff saved to https://phabricator.wikimedia.org/P64449 and previous config saved to /var/cache/conftool/dbconfig/20240609-143105-ladsgroup.json
14:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T364069)', diff saved to https://phabricator.wikimedia.org/P64448 and previous config saved to /var/cache/conftool/dbconfig/20240609-143032-marostegui.json
14:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P64447 and previous config saved to /var/cache/conftool/dbconfig/20240609-141557-ladsgroup.json
14:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P64446 and previous config saved to /var/cache/conftool/dbconfig/20240609-141524-marostegui.json
14:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P64445 and previous config saved to /var/cache/conftool/dbconfig/20240609-140049-ladsgroup.json
14:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P64444 and previous config saved to /var/cache/conftool/dbconfig/20240609-140016-marostegui.json
13:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T352010)', diff saved to https://phabricator.wikimedia.org/P64443 and previous config saved to /var/cache/conftool/dbconfig/20240609-134541-ladsgroup.json
13:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T364069)', diff saved to https://phabricator.wikimedia.org/P64442 and previous config saved to /var/cache/conftool/dbconfig/20240609-134508-marostegui.json
12:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T364299)', diff saved to https://phabricator.wikimedia.org/P64441 and previous config saved to /var/cache/conftool/dbconfig/20240609-120817-marostegui.json
12:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1248.eqiad.wmnet with reason: Maintenance
12:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1248.eqiad.wmnet with reason: Maintenance
12:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64440 and previous config saved to /var/cache/conftool/dbconfig/20240609-120753-marostegui.json
12:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T364069)', diff saved to https://phabricator.wikimedia.org/P64439 and previous config saved to /var/cache/conftool/dbconfig/20240609-120400-marostegui.json
12:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
12:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
11:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P64438 and previous config saved to /var/cache/conftool/dbconfig/20240609-115245-marostegui.json
11:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P64437 and previous config saved to /var/cache/conftool/dbconfig/20240609-113737-marostegui.json
11:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64436 and previous config saved to /var/cache/conftool/dbconfig/20240609-112229-marostegui.json
11:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
11:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
11:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T352010)', diff saved to https://phabricator.wikimedia.org/P64435 and previous config saved to /var/cache/conftool/dbconfig/20240609-111945-ladsgroup.json
11:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P64434 and previous config saved to /var/cache/conftool/dbconfig/20240609-110437-ladsgroup.json
10:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P64433 and previous config saved to /var/cache/conftool/dbconfig/20240609-104929-ladsgroup.json
10:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T352010)', diff saved to https://phabricator.wikimedia.org/P64432 and previous config saved to /var/cache/conftool/dbconfig/20240609-103421-ladsgroup.json
09:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
09:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
09:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T364069)', diff saved to https://phabricator.wikimedia.org/P64431 and previous config saved to /var/cache/conftool/dbconfig/20240609-095854-marostegui.json
09:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P64430 and previous config saved to /var/cache/conftool/dbconfig/20240609-094346-marostegui.json
09:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P64429 and previous config saved to /var/cache/conftool/dbconfig/20240609-092837-marostegui.json
09:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T364069)', diff saved to https://phabricator.wikimedia.org/P64428 and previous config saved to /var/cache/conftool/dbconfig/20240609-091329-marostegui.json
08:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T364069)', diff saved to https://phabricator.wikimedia.org/P64427 and previous config saved to /var/cache/conftool/dbconfig/20240609-080149-marostegui.json
08:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
08:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
08:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64426 and previous config saved to /var/cache/conftool/dbconfig/20240609-080125-marostegui.json
07:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64425 and previous config saved to /var/cache/conftool/dbconfig/20240609-075533-marostegui.json
07:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
07:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
07:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1221.eqiad.wmnet with reason: Maintenance
07:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1221.eqiad.wmnet with reason: Maintenance
07:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P64424 and previous config saved to /var/cache/conftool/dbconfig/20240609-074617-marostegui.json
07:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P64423 and previous config saved to /var/cache/conftool/dbconfig/20240609-073109-marostegui.json
07:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64422 and previous config saved to /var/cache/conftool/dbconfig/20240609-071601-marostegui.json
06:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T352010)', diff saved to https://phabricator.wikimedia.org/P64421 and previous config saved to /var/cache/conftool/dbconfig/20240609-064733-ladsgroup.json
06:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
06:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
06:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T352010)', diff saved to https://phabricator.wikimedia.org/P64420 and previous config saved to /var/cache/conftool/dbconfig/20240609-064709-ladsgroup.json
06:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T352010)', diff saved to https://phabricator.wikimedia.org/P64419 and previous config saved to /var/cache/conftool/dbconfig/20240609-063607-ladsgroup.json
06:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
06:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
06:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T352010)', diff saved to https://phabricator.wikimedia.org/P64418 and previous config saved to /var/cache/conftool/dbconfig/20240609-063543-ladsgroup.json
06:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P64417 and previous config saved to /var/cache/conftool/dbconfig/20240609-063201-ladsgroup.json
06:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P64416 and previous config saved to /var/cache/conftool/dbconfig/20240609-062033-ladsgroup.json
06:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P64415 and previous config saved to /var/cache/conftool/dbconfig/20240609-061653-ladsgroup.json
06:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P64414 and previous config saved to /var/cache/conftool/dbconfig/20240609-060525-ladsgroup.json
06:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T352010)', diff saved to https://phabricator.wikimedia.org/P64413 and previous config saved to /var/cache/conftool/dbconfig/20240609-060146-ladsgroup.json
05:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T352010)', diff saved to https://phabricator.wikimedia.org/P64412 and previous config saved to /var/cache/conftool/dbconfig/20240609-055017-ladsgroup.json
05:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64411 and previous config saved to /var/cache/conftool/dbconfig/20240609-054833-marostegui.json
05:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
05:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
05:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T364069)', diff saved to https://phabricator.wikimedia.org/P64410 and previous config saved to /var/cache/conftool/dbconfig/20240609-054809-marostegui.json
05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P64409 and previous config saved to /var/cache/conftool/dbconfig/20240609-053301-marostegui.json
05:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T352010)', diff saved to https://phabricator.wikimedia.org/P64408 and previous config saved to /var/cache/conftool/dbconfig/20240609-052358-ladsgroup.json
05:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
05:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
05:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T352010)', diff saved to https://phabricator.wikimedia.org/P64407 and previous config saved to /var/cache/conftool/dbconfig/20240609-052334-ladsgroup.json
05:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P64406 and previous config saved to /var/cache/conftool/dbconfig/20240609-051753-marostegui.json
05:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P64405 and previous config saved to /var/cache/conftool/dbconfig/20240609-050826-ladsgroup.json
05:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T364069)', diff saved to https://phabricator.wikimedia.org/P64404 and previous config saved to /var/cache/conftool/dbconfig/20240609-050245-marostegui.json
04:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P64403 and previous config saved to /var/cache/conftool/dbconfig/20240609-045319-ladsgroup.json
04:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T352010)', diff saved to https://phabricator.wikimedia.org/P64402 and previous config saved to /var/cache/conftool/dbconfig/20240609-043811-ladsgroup.json
02:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T364069)', diff saved to https://phabricator.wikimedia.org/P64401 and previous config saved to /var/cache/conftool/dbconfig/20240609-025921-marostegui.json
02:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
02:59 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
02:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64400 and previous config saved to /var/cache/conftool/dbconfig/20240609-025856-marostegui.json
02:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P64399 and previous config saved to /var/cache/conftool/dbconfig/20240609-024349-marostegui.json
02:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P64398 and previous config saved to /var/cache/conftool/dbconfig/20240609-022840-marostegui.json
02:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64397 and previous config saved to /var/cache/conftool/dbconfig/20240609-021333-marostegui.json
02:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T352010)', diff saved to https://phabricator.wikimedia.org/P64396 and previous config saved to /var/cache/conftool/dbconfig/20240609-020120-ladsgroup.json
02:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
02:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
01:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
01:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
01:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T364299)', diff saved to https://phabricator.wikimedia.org/P64395 and previous config saved to /var/cache/conftool/dbconfig/20240609-012432-marostegui.json
01:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P64394 and previous config saved to /var/cache/conftool/dbconfig/20240609-010922-marostegui.json
00:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P64393 and previous config saved to /var/cache/conftool/dbconfig/20240609-005414-marostegui.json
00:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T364299)', diff saved to https://phabricator.wikimedia.org/P64392 and previous config saved to /var/cache/conftool/dbconfig/20240609-003906-marostegui.json
00:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64391 and previous config saved to /var/cache/conftool/dbconfig/20240609-000718-marostegui.json
00:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
00:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
00:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
00:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
00:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T364069)', diff saved to https://phabricator.wikimedia.org/P64390 and previous config saved to /var/cache/conftool/dbconfig/20240609-000640-marostegui.json

2024-06-08

23:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P64389 and previous config saved to /var/cache/conftool/dbconfig/20240608-235132-marostegui.json
23:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P64388 and previous config saved to /var/cache/conftool/dbconfig/20240608-233623-marostegui.json
23:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T364069)', diff saved to https://phabricator.wikimedia.org/P64387 and previous config saved to /var/cache/conftool/dbconfig/20240608-232115-marostegui.json
22:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T352010)', diff saved to https://phabricator.wikimedia.org/P64386 and previous config saved to /var/cache/conftool/dbconfig/20240608-222832-ladsgroup.json
22:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
22:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
22:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228 (T352010)', diff saved to https://phabricator.wikimedia.org/P64385 and previous config saved to /var/cache/conftool/dbconfig/20240608-222808-ladsgroup.json
22:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228', diff saved to https://phabricator.wikimedia.org/P64384 and previous config saved to /var/cache/conftool/dbconfig/20240608-221259-ladsgroup.json
21:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228', diff saved to https://phabricator.wikimedia.org/P64383 and previous config saved to /var/cache/conftool/dbconfig/20240608-215751-ladsgroup.json
21:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228 (T352010)', diff saved to https://phabricator.wikimedia.org/P64382 and previous config saved to /var/cache/conftool/dbconfig/20240608-214243-ladsgroup.json
21:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T364299)', diff saved to https://phabricator.wikimedia.org/P64381 and previous config saved to /var/cache/conftool/dbconfig/20240608-212701-marostegui.json
21:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1249.eqiad.wmnet with reason: Maintenance
21:26 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1249.eqiad.wmnet with reason: Maintenance
21:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T364299)', diff saved to https://phabricator.wikimedia.org/P64380 and previous config saved to /var/cache/conftool/dbconfig/20240608-212637-marostegui.json
21:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T364069)', diff saved to https://phabricator.wikimedia.org/P64379 and previous config saved to /var/cache/conftool/dbconfig/20240608-211527-marostegui.json
21:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
21:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
21:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T364069)', diff saved to https://phabricator.wikimedia.org/P64378 and previous config saved to /var/cache/conftool/dbconfig/20240608-211503-marostegui.json
21:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P64377 and previous config saved to /var/cache/conftool/dbconfig/20240608-211128-marostegui.json
20:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P64376 and previous config saved to /var/cache/conftool/dbconfig/20240608-205955-marostegui.json
20:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P64375 and previous config saved to /var/cache/conftool/dbconfig/20240608-205618-marostegui.json
20:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P64374 and previous config saved to /var/cache/conftool/dbconfig/20240608-204447-marostegui.json
20:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T364299)', diff saved to https://phabricator.wikimedia.org/P64373 and previous config saved to /var/cache/conftool/dbconfig/20240608-204106-marostegui.json
20:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T364069)', diff saved to https://phabricator.wikimedia.org/P64372 and previous config saved to /var/cache/conftool/dbconfig/20240608-202939-marostegui.json
20:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T352010)', diff saved to https://phabricator.wikimedia.org/P64371 and previous config saved to /var/cache/conftool/dbconfig/20240608-202016-ladsgroup.json
20:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
20:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
20:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
20:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
20:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T352010)', diff saved to https://phabricator.wikimedia.org/P64370 and previous config saved to /var/cache/conftool/dbconfig/20240608-201948-ladsgroup.json
20:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P64369 and previous config saved to /var/cache/conftool/dbconfig/20240608-200440-ladsgroup.json
19:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P64368 and previous config saved to /var/cache/conftool/dbconfig/20240608-194932-ladsgroup.json
19:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T352010)', diff saved to https://phabricator.wikimedia.org/P64367 and previous config saved to /var/cache/conftool/dbconfig/20240608-193424-ladsgroup.json
18:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T352010)', diff saved to https://phabricator.wikimedia.org/P64366 and previous config saved to /var/cache/conftool/dbconfig/20240608-182811-ladsgroup.json
18:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
18:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
18:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T352010)', diff saved to https://phabricator.wikimedia.org/P64365 and previous config saved to /var/cache/conftool/dbconfig/20240608-182747-ladsgroup.json
18:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2129 (T364069)', diff saved to https://phabricator.wikimedia.org/P64364 and previous config saved to /var/cache/conftool/dbconfig/20240608-181559-marostegui.json
18:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
18:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
18:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T364069)', diff saved to https://phabricator.wikimedia.org/P64363 and previous config saved to /var/cache/conftool/dbconfig/20240608-181536-marostegui.json
18:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P64362 and previous config saved to /var/cache/conftool/dbconfig/20240608-181238-ladsgroup.json
18:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P64361 and previous config saved to /var/cache/conftool/dbconfig/20240608-180027-marostegui.json
17:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P64360 and previous config saved to /var/cache/conftool/dbconfig/20240608-175730-ladsgroup.json
17:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P64359 and previous config saved to /var/cache/conftool/dbconfig/20240608-174519-marostegui.json
17:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T352010)', diff saved to https://phabricator.wikimedia.org/P64358 and previous config saved to /var/cache/conftool/dbconfig/20240608-174222-ladsgroup.json
17:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T364069)', diff saved to https://phabricator.wikimedia.org/P64357 and previous config saved to /var/cache/conftool/dbconfig/20240608-173011-marostegui.json
17:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T364299)', diff saved to https://phabricator.wikimedia.org/P64356 and previous config saved to /var/cache/conftool/dbconfig/20240608-171628-marostegui.json
17:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1247.eqiad.wmnet with reason: Maintenance
17:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1247.eqiad.wmnet with reason: Maintenance
15:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T364069)', diff saved to https://phabricator.wikimedia.org/P64355 and previous config saved to /var/cache/conftool/dbconfig/20240608-152142-marostegui.json
15:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
15:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
14:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1245.eqiad.wmnet with reason: Maintenance
14:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1245.eqiad.wmnet with reason: Maintenance
14:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T364299)', diff saved to https://phabricator.wikimedia.org/P64354 and previous config saved to /var/cache/conftool/dbconfig/20240608-144229-marostegui.json
14:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P64353 and previous config saved to /var/cache/conftool/dbconfig/20240608-142721-marostegui.json
14:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1228 (T352010)', diff saved to https://phabricator.wikimedia.org/P64352 and previous config saved to /var/cache/conftool/dbconfig/20240608-141514-ladsgroup.json
14:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1228.eqiad.wmnet with reason: Maintenance
14:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1228.eqiad.wmnet with reason: Maintenance
14:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T352010)', diff saved to https://phabricator.wikimedia.org/P64351 and previous config saved to /var/cache/conftool/dbconfig/20240608-141450-ladsgroup.json
14:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P64350 and previous config saved to /var/cache/conftool/dbconfig/20240608-141212-marostegui.json
13:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P64349 and previous config saved to /var/cache/conftool/dbconfig/20240608-135942-ladsgroup.json
13:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
13:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
13:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T364299)', diff saved to https://phabricator.wikimedia.org/P64348 and previous config saved to /var/cache/conftool/dbconfig/20240608-135704-marostegui.json
13:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P64347 and previous config saved to /var/cache/conftool/dbconfig/20240608-134434-ladsgroup.json
13:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
13:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
13:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T352010)', diff saved to https://phabricator.wikimedia.org/P64346 and previous config saved to /var/cache/conftool/dbconfig/20240608-134110-ladsgroup.json
13:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T352010)', diff saved to https://phabricator.wikimedia.org/P64345 and previous config saved to /var/cache/conftool/dbconfig/20240608-132926-ladsgroup.json
13:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P64344 and previous config saved to /var/cache/conftool/dbconfig/20240608-132602-ladsgroup.json
13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P64343 and previous config saved to /var/cache/conftool/dbconfig/20240608-131054-ladsgroup.json
12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T352010)', diff saved to https://phabricator.wikimedia.org/P64342 and previous config saved to /var/cache/conftool/dbconfig/20240608-125546-ladsgroup.json
11:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T352010)', diff saved to https://phabricator.wikimedia.org/P64341 and previous config saved to /var/cache/conftool/dbconfig/20240608-113928-ladsgroup.json
11:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
11:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
11:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T352010)', diff saved to https://phabricator.wikimedia.org/P64340 and previous config saved to /var/cache/conftool/dbconfig/20240608-113905-ladsgroup.json
11:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P64339 and previous config saved to /var/cache/conftool/dbconfig/20240608-112357-ladsgroup.json
11:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P64338 and previous config saved to /var/cache/conftool/dbconfig/20240608-110849-ladsgroup.json
10:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T352010)', diff saved to https://phabricator.wikimedia.org/P64337 and previous config saved to /var/cache/conftool/dbconfig/20240608-105341-ladsgroup.json
10:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T364299)', diff saved to https://phabricator.wikimedia.org/P64336 and previous config saved to /var/cache/conftool/dbconfig/20240608-105032-marostegui.json
10:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1244.eqiad.wmnet with reason: Maintenance
10:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1244.eqiad.wmnet with reason: Maintenance
10:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T364299)', diff saved to https://phabricator.wikimedia.org/P64335 and previous config saved to /var/cache/conftool/dbconfig/20240608-105008-marostegui.json
10:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P64334 and previous config saved to /var/cache/conftool/dbconfig/20240608-103501-marostegui.json
10:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P64333 and previous config saved to /var/cache/conftool/dbconfig/20240608-101953-marostegui.json
10:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T364299)', diff saved to https://phabricator.wikimedia.org/P64332 and previous config saved to /var/cache/conftool/dbconfig/20240608-100443-marostegui.json
06:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T364299)', diff saved to https://phabricator.wikimedia.org/P64331 and previous config saved to /var/cache/conftool/dbconfig/20240608-064353-marostegui.json
06:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1243.eqiad.wmnet with reason: Maintenance
06:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1243.eqiad.wmnet with reason: Maintenance
06:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T364299)', diff saved to https://phabricator.wikimedia.org/P64330 and previous config saved to /var/cache/conftool/dbconfig/20240608-064328-marostegui.json
06:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P64329 and previous config saved to /var/cache/conftool/dbconfig/20240608-062820-marostegui.json
06:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P64328 and previous config saved to /var/cache/conftool/dbconfig/20240608-061313-marostegui.json
05:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T364299)', diff saved to https://phabricator.wikimedia.org/P64327 and previous config saved to /var/cache/conftool/dbconfig/20240608-055804-marostegui.json
05:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T352010)', diff saved to https://phabricator.wikimedia.org/P64326 and previous config saved to /var/cache/conftool/dbconfig/20240608-054609-ladsgroup.json
05:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
05:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
05:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T352010)', diff saved to https://phabricator.wikimedia.org/P64325 and previous config saved to /var/cache/conftool/dbconfig/20240608-054545-ladsgroup.json
05:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P64324 and previous config saved to /var/cache/conftool/dbconfig/20240608-053037-ladsgroup.json
05:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T352010)', diff saved to https://phabricator.wikimedia.org/P64323 and previous config saved to /var/cache/conftool/dbconfig/20240608-052817-ladsgroup.json
05:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
05:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
05:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64322 and previous config saved to /var/cache/conftool/dbconfig/20240608-052753-ladsgroup.json
05:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P64321 and previous config saved to /var/cache/conftool/dbconfig/20240608-051529-ladsgroup.json
05:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P64320 and previous config saved to /var/cache/conftool/dbconfig/20240608-051244-ladsgroup.json
05:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T352010)', diff saved to https://phabricator.wikimedia.org/P64319 and previous config saved to /var/cache/conftool/dbconfig/20240608-050021-ladsgroup.json
04:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P64318 and previous config saved to /var/cache/conftool/dbconfig/20240608-045736-ladsgroup.json
04:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64317 and previous config saved to /var/cache/conftool/dbconfig/20240608-044228-ladsgroup.json
02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T352010)', diff saved to https://phabricator.wikimedia.org/P64316 and previous config saved to /var/cache/conftool/dbconfig/20240608-024534-ladsgroup.json
02:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
02:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T352010)', diff saved to https://phabricator.wikimedia.org/P64315 and previous config saved to /var/cache/conftool/dbconfig/20240608-024511-ladsgroup.json
02:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T364299)', diff saved to https://phabricator.wikimedia.org/P64314 and previous config saved to /var/cache/conftool/dbconfig/20240608-024455-marostegui.json
02:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1242.eqiad.wmnet with reason: Maintenance
02:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1242.eqiad.wmnet with reason: Maintenance
02:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T364299)', diff saved to https://phabricator.wikimedia.org/P64313 and previous config saved to /var/cache/conftool/dbconfig/20240608-024431-marostegui.json
02:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T352010)', diff saved to https://phabricator.wikimedia.org/P64312 and previous config saved to /var/cache/conftool/dbconfig/20240608-023735-ladsgroup.json
02:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
02:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
02:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T352010)', diff saved to https://phabricator.wikimedia.org/P64311 and previous config saved to /var/cache/conftool/dbconfig/20240608-023711-ladsgroup.json
02:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P64310 and previous config saved to /var/cache/conftool/dbconfig/20240608-023003-ladsgroup.json
02:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P64309 and previous config saved to /var/cache/conftool/dbconfig/20240608-022923-marostegui.json
02:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P64308 and previous config saved to /var/cache/conftool/dbconfig/20240608-022203-ladsgroup.json
02:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P64307 and previous config saved to /var/cache/conftool/dbconfig/20240608-021455-ladsgroup.json
02:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P64306 and previous config saved to /var/cache/conftool/dbconfig/20240608-021415-marostegui.json
02:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P64305 and previous config saved to /var/cache/conftool/dbconfig/20240608-020655-ladsgroup.json
01:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T352010)', diff saved to https://phabricator.wikimedia.org/P64304 and previous config saved to /var/cache/conftool/dbconfig/20240608-015947-ladsgroup.json
01:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T364299)', diff saved to https://phabricator.wikimedia.org/P64303 and previous config saved to /var/cache/conftool/dbconfig/20240608-015906-marostegui.json
01:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T352010)', diff saved to https://phabricator.wikimedia.org/P64302 and previous config saved to /var/cache/conftool/dbconfig/20240608-015147-ladsgroup.json

2024-06-07

22:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T364299)', diff saved to https://phabricator.wikimedia.org/P64301 and previous config saved to /var/cache/conftool/dbconfig/20240607-224306-marostegui.json
22:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1241.eqiad.wmnet with reason: Maintenance
22:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1241.eqiad.wmnet with reason: Maintenance
22:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T364299)', diff saved to https://phabricator.wikimedia.org/P64300 and previous config saved to /var/cache/conftool/dbconfig/20240607-224242-marostegui.json
22:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
22:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
22:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T364069)', diff saved to https://phabricator.wikimedia.org/P64299 and previous config saved to /var/cache/conftool/dbconfig/20240607-223300-marostegui.json
22:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P64298 and previous config saved to /var/cache/conftool/dbconfig/20240607-222734-marostegui.json
22:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P64297 and previous config saved to /var/cache/conftool/dbconfig/20240607-221752-marostegui.json
22:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P64296 and previous config saved to /var/cache/conftool/dbconfig/20240607-221224-marostegui.json
22:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P64295 and previous config saved to /var/cache/conftool/dbconfig/20240607-220244-marostegui.json
21:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T364299)', diff saved to https://phabricator.wikimedia.org/P64294 and previous config saved to /var/cache/conftool/dbconfig/20240607-215716-marostegui.json
21:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T364069)', diff saved to https://phabricator.wikimedia.org/P64293 and previous config saved to /var/cache/conftool/dbconfig/20240607-214736-marostegui.json
21:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64292 and previous config saved to /var/cache/conftool/dbconfig/20240607-211842-ladsgroup.json
21:18 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
21:18 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
21:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64291 and previous config saved to /var/cache/conftool/dbconfig/20240607-211818-ladsgroup.json
21:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P64290 and previous config saved to /var/cache/conftool/dbconfig/20240607-210310-ladsgroup.json
20:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P64289 and previous config saved to /var/cache/conftool/dbconfig/20240607-204801-ladsgroup.json
20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64288 and previous config saved to /var/cache/conftool/dbconfig/20240607-203253-ladsgroup.json
19:42 dduvall@deploy1002: Finished scap: Backport for mediawiki.diff: Fix color regression and also use one more token (T366845) (duration: 16m 10s)
19:33 dduvall@deploy1002: dduvall: Continuing with sync
19:28 dduvall@deploy1002: dduvall: Backport for mediawiki.diff: Fix color regression and also use one more token (T366845) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:26 dduvall@deploy1002: Started scap: Backport for mediawiki.diff: Fix color regression and also use one more token (T366845)
19:25 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
19:25 eevans@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
19:07 cdanis@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
19:06 cdanis@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
18:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T364299)', diff saved to https://phabricator.wikimedia.org/P64287 and previous config saved to /var/cache/conftool/dbconfig/20240607-184232-marostegui.json
18:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1238.eqiad.wmnet with reason: Maintenance
18:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1238.eqiad.wmnet with reason: Maintenance
18:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64286 and previous config saved to /var/cache/conftool/dbconfig/20240607-184208-marostegui.json
18:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P64285 and previous config saved to /var/cache/conftool/dbconfig/20240607-182700-marostegui.json
18:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P64284 and previous config saved to /var/cache/conftool/dbconfig/20240607-181151-marostegui.json
18:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T352010)', diff saved to https://phabricator.wikimedia.org/P64283 and previous config saved to /var/cache/conftool/dbconfig/20240607-181021-ladsgroup.json
18:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
18:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
18:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T352010)', diff saved to https://phabricator.wikimedia.org/P64282 and previous config saved to /var/cache/conftool/dbconfig/20240607-180958-ladsgroup.json
17:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64281 and previous config saved to /var/cache/conftool/dbconfig/20240607-175643-marostegui.json
17:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P64280 and previous config saved to /var/cache/conftool/dbconfig/20240607-175450-ladsgroup.json
17:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P64279 and previous config saved to /var/cache/conftool/dbconfig/20240607-173942-ladsgroup.json
17:31 topranks: resetting line card 1/0 on cr2-codfw to enable new 100G link to ssw1-d8-codfw T364095
17:28 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on cloudsw1-b1-codfw.mgmt,cr2-eqord,pfw3-codfw with reason: bouncing fpc 1 pic 0 on cr2-codfw
17:28 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on cloudsw1-b1-codfw.mgmt,cr2-eqord,pfw3-codfw with reason: bouncing fpc 1 pic 0 on cr2-codfw
17:24 topranks: re-route traffic from cr2-eqord away from circuit to cr2-codfw to allow for line card reset T364095
17:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T352010)', diff saved to https://phabricator.wikimedia.org/P64278 and previous config saved to /var/cache/conftool/dbconfig/20240607-172432-ladsgroup.json
17:23 topranks: disable IP transit to Lumen AS3356 from cr2-eqiad to allow line card reset T364095
17:12 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt with reason: bouncing fpc 1 pic 0 on cr2-codfw
17:12 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt with reason: bouncing fpc 1 pic 0 on cr2-codfw
17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T352010)', diff saved to https://phabricator.wikimedia.org/P64277 and previous config saved to /var/cache/conftool/dbconfig/20240607-170634-ladsgroup.json
17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
17:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
17:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T352010)', diff saved to https://phabricator.wikimedia.org/P64276 and previous config saved to /var/cache/conftool/dbconfig/20240607-170555-ladsgroup.json
16:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64275 and previous config saved to /var/cache/conftool/dbconfig/20240607-165616-marostegui.json
16:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
16:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
16:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1221.eqiad.wmnet with reason: Maintenance
16:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1221.eqiad.wmnet with reason: Maintenance
16:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T364299)', diff saved to https://phabricator.wikimedia.org/P64274 and previous config saved to /var/cache/conftool/dbconfig/20240607-165533-marostegui.json
16:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P64273 and previous config saved to /var/cache/conftool/dbconfig/20240607-165047-ladsgroup.json
16:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P64272 and previous config saved to /var/cache/conftool/dbconfig/20240607-164025-marostegui.json
16:38 cdobbins@cumin1002: conftool action : set/pooled=yes; selector: name=4048.ulsfo.wmnet
16:36 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=cp4048.ulsfo.wmnet
16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P64271 and previous config saved to /var/cache/conftool/dbconfig/20240607-163539-ladsgroup.json
16:32 topranks: enabling new transport circuit from cr1-drmrs to cr2-eqiad T343385
16:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P64270 and previous config saved to /var/cache/conftool/dbconfig/20240607-162516-marostegui.json
16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T352010)', diff saved to https://phabricator.wikimedia.org/P64269 and previous config saved to /var/cache/conftool/dbconfig/20240607-162031-ladsgroup.json
16:19 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
16:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T364299)', diff saved to https://phabricator.wikimedia.org/P64268 and previous config saved to /var/cache/conftool/dbconfig/20240607-161007-marostegui.json
16:08 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
16:07 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:07 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for moved telxius transpoort eqiad drmrs - cmooney@cumin1002"
16:06 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
16:06 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for moved telxius transpoort eqiad drmrs - cmooney@cumin1002"
16:05 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
16:03 cmooney@cumin1002: START - Cookbook sre.dns.netbox
15:59 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
15:59 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
15:53 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:53 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merging pending cr2-codfw changes - sukhe@cumin1002"
15:52 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merging pending cr2-codfw changes - sukhe@cumin1002"
15:45 sukhe@cumin1002: START - Cookbook sre.dns.netbox
15:37 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
15:35 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
15:34 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
15:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
15:30 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
15:30 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
15:25 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
15:24 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
15:24 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
15:24 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
15:24 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
15:23 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
15:14 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Apply update to Java 11 - eevans@cumin1002
15:10 topranks: disabling netbox service on primary netbox server netbox1001 to restore db from backup
15:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on netbox1002.eqiad.wmnet with reason: Restoring DB from backup on netboxdb1002
15:01 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on netbox1002.eqiad.wmnet with reason: Restoring DB from backup on netboxdb1002
14:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T352010)', diff saved to https://phabricator.wikimedia.org/P64267 and previous config saved to /var/cache/conftool/dbconfig/20240607-145937-ladsgroup.json
14:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
14:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
14:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T352010)', diff saved to https://phabricator.wikimedia.org/P64266 and previous config saved to /var/cache/conftool/dbconfig/20240607-145913-ladsgroup.json
14:55 topranks: enabling port et-1/0/2 for 100G mode on cr2-codfw T364095
14:53 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Apply update to Java 11 - eevans@cumin1002
14:46 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:46 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new entries for cr2-codfw peering to ssw1-d8-codfw - cmooney@cumin1002"
14:45 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new entries for cr2-codfw peering to ssw1-d8-codfw - cmooney@cumin1002"
14:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P64265 and previous config saved to /var/cache/conftool/dbconfig/20240607-144404-ladsgroup.json
14:43 cmooney@cumin1002: START - Cookbook sre.dns.netbox
14:39 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:39 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:38 jhathaway@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
14:38 jhathaway@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
14:37 jhathaway@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
14:37 jhathaway@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
14:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P64264 and previous config saved to /var/cache/conftool/dbconfig/20240607-142856-ladsgroup.json
14:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T352010)', diff saved to https://phabricator.wikimedia.org/P64263 and previous config saved to /var/cache/conftool/dbconfig/20240607-141349-ladsgroup.json
14:02 Emperor: restart swift-proxy on ms-fe1009 ms-fe1011 ms-fe1012 ms-fe1014 T360913
13:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T364069)', diff saved to https://phabricator.wikimedia.org/P64262 and previous config saved to /var/cache/conftool/dbconfig/20240607-132342-marostegui.json
13:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
13:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
13:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T364069)', diff saved to https://phabricator.wikimedia.org/P64261 and previous config saved to /var/cache/conftool/dbconfig/20240607-132319-marostegui.json
13:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P64260 and previous config saved to /var/cache/conftool/dbconfig/20240607-130811-marostegui.json
13:05 moritzm: uploaded wmf-laptop 1.0.0 to component/wmf-laptop for bookworm-wikimedia
13:04 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:04 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:02 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:01 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
13:01 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P64259 and previous config saved to /var/cache/conftool/dbconfig/20240607-125303-marostegui.json
12:49 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64258 and previous config saved to /var/cache/conftool/dbconfig/20240607-124641-ladsgroup.json
12:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
12:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
12:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64257 and previous config saved to /var/cache/conftool/dbconfig/20240607-124616-ladsgroup.json
12:44 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
12:44 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
12:41 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
12:40 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
12:38 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
12:38 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
12:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T364069)', diff saved to https://phabricator.wikimedia.org/P64256 and previous config saved to /var/cache/conftool/dbconfig/20240607-123754-marostegui.json
12:33 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
12:31 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
12:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64255 and previous config saved to /var/cache/conftool/dbconfig/20240607-123108-ladsgroup.json
12:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T364299)', diff saved to https://phabricator.wikimedia.org/P64254 and previous config saved to /var/cache/conftool/dbconfig/20240607-122413-marostegui.json
12:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1199.eqiad.wmnet with reason: Maintenance
12:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1199.eqiad.wmnet with reason: Maintenance
12:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T364299)', diff saved to https://phabricator.wikimedia.org/P64253 and previous config saved to /var/cache/conftool/dbconfig/20240607-122349-marostegui.json
12:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64252 and previous config saved to /var/cache/conftool/dbconfig/20240607-121559-ladsgroup.json
12:08 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply
12:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P64251 and previous config saved to /var/cache/conftool/dbconfig/20240607-120841-marostegui.json
12:08 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/ratelimit: apply
12:07 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply
12:07 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host phab1004.eqiad.wmnet
12:07 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/ratelimit: apply
12:07 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
12:07 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
12:01 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host phab1004.eqiad.wmnet
12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64250 and previous config saved to /var/cache/conftool/dbconfig/20240607-120051-ladsgroup.json
11:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P64249 and previous config saved to /var/cache/conftool/dbconfig/20240607-115333-marostegui.json
11:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
11:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
11:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1013.eqiad.wmnet
11:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T364299)', diff saved to https://phabricator.wikimedia.org/P64248 and previous config saved to /var/cache/conftool/dbconfig/20240607-113824-marostegui.json
11:36 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb1013.eqiad.wmnet
11:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet
11:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet
11:28 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
11:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1014.eqiad.wmnet
11:28 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
11:22 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb1014.eqiad.wmnet
11:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
11:12 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
11:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet
11:05 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet
11:05 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2002.wikimedia.org
11:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
11:01 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
11:01 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
11:00 jelto@cumin2002: START - Cookbook sre.hosts.reboot-single for host gitlab2002.wikimedia.org
11:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64246 and previous config saved to /var/cache/conftool/dbconfig/20240607-110025-ladsgroup.json
11:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
11:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
11:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T352010)', diff saved to https://phabricator.wikimedia.org/P64245 and previous config saved to /var/cache/conftool/dbconfig/20240607-110000-ladsgroup.json
10:57 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
10:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet
10:50 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
10:50 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host rdb2010.codfw.wmnet
10:50 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
10:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P64244 and previous config saved to /var/cache/conftool/dbconfig/20240607-104452-ladsgroup.json
10:33 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
10:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P64243 and previous config saved to /var/cache/conftool/dbconfig/20240607-102944-ladsgroup.json
10:23 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
10:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T352010)', diff saved to https://phabricator.wikimedia.org/P64242 and previous config saved to /var/cache/conftool/dbconfig/20240607-101436-ladsgroup.json
10:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-eqiad
09:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki1002.eqiad.wmnet
09:56 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
09:56 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
09:56 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
09:54 cgoubert@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
09:54 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
09:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-codfw
09:54 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
09:53 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
09:53 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
09:52 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
09:52 moritzm: powercycle pki1002
09:43 jynus: upgrading and restarting db1239 T360751
09:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki1002.eqiad.wmnet
09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2002.wikimedia.org
09:38 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
09:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2002.wikimedia.org
09:36 cgoubert@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
09:35 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
09:35 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
09:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
09:30 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
09:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
09:26 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
09:25 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
09:24 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
09:22 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
09:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2028.codfw.wmnet
09:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2028.codfw.wmnet
09:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T352010)', diff saved to https://phabricator.wikimedia.org/P64241 and previous config saved to /var/cache/conftool/dbconfig/20240607-091849-ladsgroup.json
09:18 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
09:18 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
09:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
09:11 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4049.ulsfo.wmnet
09:06 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2028.codfw.wmnet
09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2027.codfw.wmnet
09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2027.codfw.wmnet
09:03 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:03 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
08:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2027.codfw.wmnet
08:51 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2099.codfw.wmnet
08:51 jynus@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:51 jynus@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2099.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
08:50 jynus@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2099.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
08:49 taavi: import opentofu 1.7.2 to apt.wikimedia.org T365696
08:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2027.codfw.wmnet
08:48 jynus: reboot dbprov1001,1002,2001,2002
08:46 jynus@cumin1002: START - Cookbook sre.dns.netbox
08:41 jynus@cumin1002: START - Cookbook sre.hosts.decommission for hosts db2099.codfw.wmnet
08:40 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2098.codfw.wmnet
08:40 jynus@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:39 jynus@cumin1002: START - Cookbook sre.dns.netbox
08:39 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2097.codfw.wmnet
08:39 jynus@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:39 jynus@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2097.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
08:37 jynus@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2097.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
08:35 jynus@cumin1002: START - Cookbook sre.dns.netbox
08:19 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4049.ulsfo.wmnet
08:19 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4049.ulsfo.wmnet
08:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2025.codfw.wmnet
08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
08:15 jynus: deleted from zarcillo db2097, db2098, db2099 T362802 T366877 T362883
08:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
08:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2025.codfw.wmnet
08:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki-root1002.eqiad.wmnet
07:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T364299)', diff saved to https://phabricator.wikimedia.org/P64239 and previous config saved to /var/cache/conftool/dbconfig/20240607-075742-marostegui.json
07:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1190.eqiad.wmnet with reason: Maintenance
07:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1190.eqiad.wmnet with reason: Maintenance
07:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki-root1002.eqiad.wmnet
07:56 ryankemper@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
07:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host seaborgium.wikimedia.org
07:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host seaborgium.wikimedia.org
07:45 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2097.codfw.wmnet with reason: about to decommission
07:45 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2097.codfw.wmnet with reason: about to decommission
07:45 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2099.codfw.wmnet with reason: about to decommission
07:44 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2099.codfw.wmnet with reason: about to decommission
07:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast1003.wikimedia.org with OS bookworm
07:19 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2098.codfw.wmnet with reason: about to decommission
07:19 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2098.codfw.wmnet with reason: about to decommission
07:12 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
07:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast1003.wikimedia.org with reason: host reimage
07:07 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast1003.wikimedia.org with reason: host reimage
06:52 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast1003.wikimedia.org with OS bookworm
06:51 moritzm: reimaging bast1003 to bookworm
06:36 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
06:34 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
06:31 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
05:15 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
04:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
04:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
04:35 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
04:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T352010)', diff saved to https://phabricator.wikimedia.org/P64238 and previous config saved to /var/cache/conftool/dbconfig/20240607-043343-ladsgroup.json
04:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
04:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
04:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T352010)', diff saved to https://phabricator.wikimedia.org/P64237 and previous config saved to /var/cache/conftool/dbconfig/20240607-043320-ladsgroup.json
04:23 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
04:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P64236 and previous config saved to /var/cache/conftool/dbconfig/20240607-041812-ladsgroup.json
04:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P64235 and previous config saved to /var/cache/conftool/dbconfig/20240607-040302-ladsgroup.json
04:02 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
04:01 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
03:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T352010)', diff saved to https://phabricator.wikimedia.org/P64234 and previous config saved to /var/cache/conftool/dbconfig/20240607-034755-ladsgroup.json
03:35 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
03:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T352010)', diff saved to https://phabricator.wikimedia.org/P64233 and previous config saved to /var/cache/conftool/dbconfig/20240607-033141-ladsgroup.json
03:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
03:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
03:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64232 and previous config saved to /var/cache/conftool/dbconfig/20240607-033118-ladsgroup.json
03:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T364069)', diff saved to https://phabricator.wikimedia.org/P64231 and previous config saved to /var/cache/conftool/dbconfig/20240607-032809-marostegui.json
03:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
03:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
03:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T364069)', diff saved to https://phabricator.wikimedia.org/P64230 and previous config saved to /var/cache/conftool/dbconfig/20240607-032746-marostegui.json
03:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P64229 and previous config saved to /var/cache/conftool/dbconfig/20240607-031610-ladsgroup.json
03:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P64228 and previous config saved to /var/cache/conftool/dbconfig/20240607-031238-marostegui.json
03:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P64227 and previous config saved to /var/cache/conftool/dbconfig/20240607-030102-ladsgroup.json
02:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P64226 and previous config saved to /var/cache/conftool/dbconfig/20240607-025729-marostegui.json
02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64225 and previous config saved to /var/cache/conftool/dbconfig/20240607-024554-ladsgroup.json
02:44 ejegg: fundraising civicrm upgraded from 757f8528 to ebfbad86
02:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T364069)', diff saved to https://phabricator.wikimedia.org/P64224 and previous config saved to /var/cache/conftool/dbconfig/20240607-024221-marostegui.json
02:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T352010)', diff saved to https://phabricator.wikimedia.org/P64223 and previous config saved to /var/cache/conftool/dbconfig/20240607-021501-ladsgroup.json
02:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
02:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
02:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
02:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
02:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T352010)', diff saved to https://phabricator.wikimedia.org/P64222 and previous config saved to /var/cache/conftool/dbconfig/20240607-021418-ladsgroup.json
01:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P64221 and previous config saved to /var/cache/conftool/dbconfig/20240607-015910-ladsgroup.json
01:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P64220 and previous config saved to /var/cache/conftool/dbconfig/20240607-014403-ladsgroup.json
afk: fundraising civicrm upgraded from 286bd2b8 to 757f8528
01:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T352010)', diff saved to https://phabricator.wikimedia.org/P64219 and previous config saved to /var/cache/conftool/dbconfig/20240607-012855-ladsgroup.json
01:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
01:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
01:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T352010)', diff saved to https://phabricator.wikimedia.org/P64218 and previous config saved to /var/cache/conftool/dbconfig/20240607-011438-ladsgroup.json
00:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P64217 and previous config saved to /var/cache/conftool/dbconfig/20240607-005930-ladsgroup.json
00:55 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
00:55 ryankemper@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
00:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P64216 and previous config saved to /var/cache/conftool/dbconfig/20240607-004423-ladsgroup.json
00:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T352010)', diff saved to https://phabricator.wikimedia.org/P64215 and previous config saved to /var/cache/conftool/dbconfig/20240607-002915-ladsgroup.json
00:23 bd808@deploy1002: Finished scap: Backport for Revert "wikitech: Replace OSM class in Gerrit blocking hook" (duration: 11m 24s)
00:15 bd808@deploy1002: bd808 and trainbranchbot: Continuing with sync
00:14 bd808@deploy1002: bd808 and trainbranchbot: Backport for Revert "wikitech: Replace OSM class in Gerrit blocking hook" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
00:12 bd808@deploy1002: Started scap: Backport for Revert "wikitech: Replace OSM class in Gerrit blocking hook"

2024-06-06

23:32 bd808@deploy1002: Finished scap: Backport for wikitech: Replace OSM class in Gerrit blocking hook (T161553) (duration: 11m 24s)
23:23 bd808@deploy1002: taavi and bd808: Continuing with sync
23:23 bd808@deploy1002: taavi and bd808: Backport for wikitech: Replace OSM class in Gerrit blocking hook (T161553) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:20 bd808@deploy1002: Started scap: Backport for wikitech: Replace OSM class in Gerrit blocking hook (T161553)
23:16 bd808@deploy1002: Finished scap: Backport for wikitech: Update Phabricator Conduit calls to disable/enable users (T366587) (duration: 12m 01s)
23:07 bd808@deploy1002: bd808: Continuing with sync
23:06 bd808@deploy1002: bd808: Backport for wikitech: Update Phabricator Conduit calls to disable/enable users (T366587) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:04 bd808@deploy1002: Started scap: Backport for wikitech: Update Phabricator Conduit calls to disable/enable users (T366587)
21:46 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
21:27 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
21:10 jdrewniak@deploy1002: Finished scap: Backport for Disable font size options on specified pages for all wikis (T366625) (duration: 12m 50s)
21:01 jdrewniak@deploy1002: jdrewniak and toyofuku: Continuing with sync
21:00 jdrewniak@deploy1002: jdrewniak and toyofuku: Backport for Disable font size options on specified pages for all wikis (T366625) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:57 jdrewniak@deploy1002: Started scap: Backport for Disable font size options on specified pages for all wikis (T366625)
20:54 urbanecm@deploy1002: Finished scap: Backport for testwiki: Enable CommunityConfiguration (T360954) (duration: 12m 09s)
20:50 urbanecm: mwscript extensions/GrowthExperiments/maintenance/migrateCommunityConfig.php --wiki=testwiki # T360954
20:46 urbanecm@deploy1002: urbanecm: Continuing with sync
20:44 urbanecm@deploy1002: urbanecm: Backport for testwiki: Enable CommunityConfiguration (T360954) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:42 urbanecm@deploy1002: Started scap: Backport for testwiki: Enable CommunityConfiguration (T360954)
20:41 urbanecm@deploy1002: Finished scap: Backport for [mswiktionary] Rename namespace "Wiktionary" to "Wikikamus" (T366549), Improve navigation link handling in CommunityConfiguration (T364938 T365504 T360954), Drop logging level for unsupported providers to DEBUG (T366519 T360954) (duration: 19m 42s)
20:33 urbanecm@deploy1002: urbanecm and sgimeno and gergesshamon: Continuing with sync
20:32 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
20:31 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
20:30 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
20:29 ejegg: fundraising civicrm upgraded from 71ed6bed to 286bd2b8
20:28 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
20:26 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
20:26 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
20:24 urbanecm@deploy1002: urbanecm and sgimeno and gergesshamon: Backport for [mswiktionary] Rename namespace "Wiktionary" to "Wikikamus" (T366549), Improve navigation link handling in CommunityConfiguration (T364938 T365504 T360954), Drop logging level for unsupported providers to DEBUG (T366519 T360954) synced to the testservers (https://wikitech.wikimedia.org/wiki
20:22 urbanecm@deploy1002: Started scap: Backport for [mswiktionary] Rename namespace "Wiktionary" to "Wikikamus" (T366549), Improve navigation link handling in CommunityConfiguration (T364938 T365504 T360954), Drop logging level for unsupported providers to DEBUG (T366519 T360954)
20:21 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
20:20 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
20:20 urbanecm@deploy1002: Finished scap: Backport for Assign applychangetags right to group "all" on plwiktionary (T363638), InitialiseSettings: Enable AutoModerator on trwiki (T362622), InitaliseSettings-labs: Deploy Automoderator patroller workstream survey to cawiki (T362969) (duration: 14m 10s)
20:19 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
20:18 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
20:13 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
20:13 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
20:11 urbanecm@deploy1002: wargo and urbanecm and jsn and kgraessle: Continuing with sync
20:08 urbanecm@deploy1002: wargo and urbanecm and jsn and kgraessle: Backport for Assign applychangetags right to group "all" on plwiktionary (T363638), InitialiseSettings: Enable AutoModerator on trwiki (T362622), InitaliseSettings-labs: Deploy Automoderator patroller workstream survey to cawiki (T362969) synced to the testservers (https://wikitech.wikimedia.org/wiki
20:06 urbanecm@deploy1002: Started scap: Backport for Assign applychangetags right to group "all" on plwiktionary (T363638), InitialiseSettings: Enable AutoModerator on trwiki (T362622), InitaliseSettings-labs: Deploy Automoderator patroller workstream survey to cawiki (T362969)
20:02 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
19:31 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@a8843e6]: Deploying latest DAGs to the analytics Airflow instance. T358707. (duration: 00m 26s)
19:30 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@a8843e6]: Deploying latest DAGs to the analytics Airflow instance. T358707.
18:29 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.8 refs T361402
18:17 thcipriani@deploy1002: Finished deploy [releng/jenkins-deploy@3be9893] (releasing): (no justification provided) (duration: 00m 43s)
18:17 thcipriani@deploy1002: Started deploy [releng/jenkins-deploy@3be9893] (releasing): (no justification provided)
17:57 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
17:57 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - kamila@cumin1002"
17:56 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - kamila@cumin1002"
17:48 topranks: re-enabling pybal on lvs1017 after cable move T366361
17:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T364069)', diff saved to https://phabricator.wikimedia.org/P64211 and previous config saved to /var/cache/conftool/dbconfig/20240606-173121-marostegui.json
17:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
17:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
17:26 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on lvs1017.eqiad.wmnet with reason: moving lvs1017 link back to ssw1-e1-codfw
17:26 topranks: disabling pybal on lvs1017 to move traffic to lvs1020 in advance of cable move T366361
17:26 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on lvs1017.eqiad.wmnet with reason: moving lvs1017 link back to ssw1-e1-codfw
17:23 topranks: re-enabling pybal on lvs1018 after cable move T366361
17:15 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on lvs1018.eqiad.wmnet with reason: moving lvs1018 link back to ssw1-e1-codfw
17:15 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on lvs1018.eqiad.wmnet with reason: moving lvs1018 link back to ssw1-e1-codfw
17:15 cmooney@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 0:20:00 on lvs1019.eqiad.wmnet with reason: moving lvs1018 link back to ssw1-e1-codfw
17:14 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on lvs1019.eqiad.wmnet with reason: moving lvs1018 link back to ssw1-e1-codfw
17:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T352010)', diff saved to https://phabricator.wikimedia.org/P64210 and previous config saved to /var/cache/conftool/dbconfig/20240606-171359-ladsgroup.json
17:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
17:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T352010)', diff saved to https://phabricator.wikimedia.org/P64209 and previous config saved to /var/cache/conftool/dbconfig/20240606-171336-ladsgroup.json
17:11 topranks: disabling pybal on lvs1018 to move traffic to lvs1020 in advance of cable move T366361
17:11 topranks: re-enabling pybal on lvs1019 after cable move T366361
16:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P64208 and previous config saved to /var/cache/conftool/dbconfig/20240606-165828-ladsgroup.json
16:52 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on lvs1019.eqiad.wmnet with reason: moving lvs1019 link back to ssw1-f1-codfw
16:51 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on lvs1019.eqiad.wmnet with reason: moving lvs1019 link back to ssw1-f1-codfw
16:50 topranks: disabling pybal on lvs1019 to move traffic to lvs1020 in advance of cable move T366361
16:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P64207 and previous config saved to /var/cache/conftool/dbconfig/20240606-164320-ladsgroup.json
16:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T352010)', diff saved to https://phabricator.wikimedia.org/P64206 and previous config saved to /var/cache/conftool/dbconfig/20240606-162812-ladsgroup.json
16:28 hashar@deploy1002: Finished deploy [integration/docroot@eee90e6]: (no justification provided) (duration: 00m 05s)
16:28 hashar@deploy1002: Started deploy [integration/docroot@eee90e6]: (no justification provided)
16:25 dancy@deploy1002: Installation of scap version "4.86.1" completed for 285 hosts
16:25 dancy@deploy1002: Installing scap version "4.86.1" for 285 hosts
16:24 dancy@deploy1002: Installing scap version "4.86.1" for 286 hosts
16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T352010)', diff saved to https://phabricator.wikimedia.org/P64205 and previous config saved to /var/cache/conftool/dbconfig/20240606-161338-ladsgroup.json
16:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
16:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T352010)', diff saved to https://phabricator.wikimedia.org/P64204 and previous config saved to /var/cache/conftool/dbconfig/20240606-161312-ladsgroup.json
16:10 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: reimage still running
16:10 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: reimage still running
16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T352010)', diff saved to https://phabricator.wikimedia.org/P64203 and previous config saved to /var/cache/conftool/dbconfig/20240606-160028-ladsgroup.json
16:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
16:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T352010)', diff saved to https://phabricator.wikimedia.org/P64202 and previous config saved to /var/cache/conftool/dbconfig/20240606-160004-ladsgroup.json
15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P64201 and previous config saved to /var/cache/conftool/dbconfig/20240606-155804-ladsgroup.json
15:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P64199 and previous config saved to /var/cache/conftool/dbconfig/20240606-154457-ladsgroup.json
15:44 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
15:42 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
15:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P64198 and previous config saved to /var/cache/conftool/dbconfig/20240606-154255-ladsgroup.json
15:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64197 and previous config saved to /var/cache/conftool/dbconfig/20240606-154028-ladsgroup.json
15:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
15:40 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
15:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
15:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T352010)', diff saved to https://phabricator.wikimedia.org/P64196 and previous config saved to /var/cache/conftool/dbconfig/20240606-154004-ladsgroup.json
15:38 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
15:38 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
15:37 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
15:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T360332)', diff saved to https://phabricator.wikimedia.org/P64195 and previous config saved to /var/cache/conftool/dbconfig/20240606-153730-arnaudb.json
15:29 topranks: rebooting ssw1-f1-eqiad to install new JunOS release T366361
15:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P64194 and previous config saved to /var/cache/conftool/dbconfig/20240606-152949-ladsgroup.json
15:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T352010)', diff saved to https://phabricator.wikimedia.org/P64193 and previous config saved to /var/cache/conftool/dbconfig/20240606-152747-ladsgroup.json
15:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P64192 and previous config saved to /var/cache/conftool/dbconfig/20240606-152456-ladsgroup.json
15:23 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "moved wikikube-ctrl1001 to a new rack - kamila@cumin1002 - T366204"
15:23 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
15:23 jforrester@deploy1002: Finished scap: Backport for Revert "commonswiki: Enable numeric wgCategoryCollation" (T366809) (duration: 13m 58s)
15:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P64191 and previous config saved to /var/cache/conftool/dbconfig/20240606-152222-arnaudb.json
15:19 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
15:18 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "moved wikikube-ctrl1001 to a new rack - kamila@cumin1002 - T366204"
15:16 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
15:16 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
15:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T352010)', diff saved to https://phabricator.wikimedia.org/P64190 and previous config saved to /var/cache/conftool/dbconfig/20240606-151440-ladsgroup.json
15:14 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
15:12 jforrester@deploy1002: jforrester: Continuing with sync
15:11 jforrester@deploy1002: jforrester: Backport for Revert "commonswiki: Enable numeric wgCategoryCollation" (T366809) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
15:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P64189 and previous config saved to /var/cache/conftool/dbconfig/20240606-150948-ladsgroup.json
15:09 jforrester@deploy1002: Started scap: Backport for Revert "commonswiki: Enable numeric wgCategoryCollation" (T366809)
15:08 jforrester@deploy1002: Finished scap: Backport for Add wikilambda-edit-monolingual-text-placeholder message to extension.json (T359782) (duration: 12m 05s)
15:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P64188 and previous config saved to /var/cache/conftool/dbconfig/20240606-150714-arnaudb.json
15:04 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on ssw1-e1-eqiad.mgmt with reason: upgrading spine switches eqiad rows e and f
15:04 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on ssw1-e1-eqiad.mgmt with reason: upgrading spine switches eqiad rows e and f
14:59 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on 15 hosts with reason: upgrading spine switches eqiad rows e and f
14:59 jforrester@deploy1002: jforrester: Continuing with sync
14:59 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on 15 hosts with reason: upgrading spine switches eqiad rows e and f
14:58 jforrester@deploy1002: jforrester: Backport for Add wikilambda-edit-monolingual-text-placeholder message to extension.json (T359782) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:58 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
14:58 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
14:56 topranks: disable ssw1-f1-eqiad leaf-facing ports in advance of upgrade T366361
14:56 jforrester@deploy1002: Started scap: Backport for Add wikilambda-edit-monolingual-text-placeholder message to extension.json (T359782)
14:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T352010)', diff saved to https://phabricator.wikimedia.org/P64187 and previous config saved to /var/cache/conftool/dbconfig/20240606-145440-ladsgroup.json
14:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T360332)', diff saved to https://phabricator.wikimedia.org/P64186 and previous config saved to /var/cache/conftool/dbconfig/20240606-145205-arnaudb.json
14:51 elukey: kill sessionstore pod running on mw1390.eqiad.wmnet (no dedicated='kask' taint)
14:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T360332)', diff saved to https://phabricator.wikimedia.org/P64185 and previous config saved to /var/cache/conftool/dbconfig/20240606-144943-arnaudb.json
14:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1209.eqiad.wmnet with reason: Maintenance
14:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1209.eqiad.wmnet with reason: Maintenance
14:43 sukhe: sudo cumin -b1 -s60 'A:cp and A:eqsin' 'run-puppet-agent --enable "merging CR 1038881"'
14:25 TheresNoTime: close UTC afternoon backport window
14:18 hashar@deploy1002: Finished deploy [integration/docroot@eee90e6]: Build dependencies updates (duration: 00m 10s)
14:18 hashar@deploy1002: Started deploy [integration/docroot@eee90e6]: Build dependencies updates
14:17 hashar@deploy1002: Finished deploy [integration/docroot@eee90e6]: Build dependencies updates (duration: 00m 09s)
14:17 hashar@deploy1002: Started deploy [integration/docroot@eee90e6]: Build dependencies updates
14:17 samtar@deploy1002: Finished scap: Backport for commonswiki: Enable numeric wgCategoryCollation (T362494), Add project namespace alias for Azerbaijani Wikisource (T365966) (duration: 12m 58s)
14:15 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ssw1-f1-eqiad,ssw1-f1-eqiad IPv6,ssw1-f1-eqiad.mgmt with reason: upgrading spine switches eqiad rows e and f
14:15 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ssw1-f1-eqiad,ssw1-f1-eqiad IPv6,ssw1-f1-eqiad.mgmt with reason: upgrading spine switches eqiad rows e and f
14:14 topranks: disabling BGP on cr2-eqiad towards ssw1-f1-eqiad prior to upgrade of ssw later T366361
14:14 ChrisDobbins901_: sudo cumin 'A:cp and A:eqsin' 'disable-puppet "merging CR 1038881"'
14:08 samtar@deploy1002: samtar and anzx and nmw03: Continuing with sync
14:07 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet
14:06 samtar@deploy1002: samtar and anzx and nmw03: Backport for commonswiki: Enable numeric wgCategoryCollation (T362494), Add project namespace alias for Azerbaijani Wikisource (T365966) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:06 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4050.ulsfo.wmnet
14:05 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: host reimage
14:04 samtar@deploy1002: Started scap: Backport for commonswiki: Enable numeric wgCategoryCollation (T362494), Add project namespace alias for Azerbaijani Wikisource (T365966)
14:02 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: host reimage
14:00 kartik@deploy1002: Finished scap: Backport for CX: Fix translation container max width for large screens (T366374) (duration: 13m 11s)
13:57 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4050.ulsfo.wmnet
13:56 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4050.ulsfo.wmnet
13:52 kartik@deploy1002: kartik: Continuing with sync
13:50 kartik@deploy1002: kartik: Backport for CX: Fix translation container max width for large screens (T366374) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:47 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
13:47 kartik@deploy1002: Started scap: Backport for CX: Fix translation container max width for large screens (T366374)
13:46 samtar@deploy1002: Finished scap: Backport for [mswiktionary] Change the default Sitename value to Wikikamus (T366549) (duration: 16m 05s)
13:45 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host wikikube-ctrl1001.eqiad.wmnet
13:44 kamila@cumin1002: START - Cookbook sre.hosts.dhcp for host wikikube-ctrl1001.eqiad.wmnet
13:44 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host wikikube-ctrl1001.eqiad.wmnet
13:37 samtar@deploy1002: samtar and gergesshamon: Continuing with sync
13:32 samtar@deploy1002: samtar and gergesshamon: Backport for [mswiktionary] Change the default Sitename value to Wikikamus (T366549) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:30 samtar@deploy1002: Started scap: Backport for [mswiktionary] Change the default Sitename value to Wikikamus (T366549)
13:28 samtar@deploy1002: Finished scap: Backport for Activate campaignEvents extension on Igbo wiki. (T363199) (duration: 14m 07s)
13:19 samtar@deploy1002: mhorsey and samtar: Continuing with sync
13:16 samtar@deploy1002: mhorsey and samtar: Backport for Activate campaignEvents extension on Igbo wiki. (T363199) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:15 samtar@deploy1002: Started scap: Backport for Activate campaignEvents extension on Igbo wiki. (T363199)
13:11 taavi: taavi@deploy1002 ~ $ sudo kill 32174 # kill forgotten scap sync-world process
13:08 klausman@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad
12:57 vgutierrez: repool text@cofw with IPIP encapsulation enabled - T366466
12:56 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
12:56 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
12:50 vgutierrez: rolling restart of pybal on lvs2014 and lvs2011 - T366466
12:44 topranks: disabling PyBal on lvs1019 to allow for cable move T366361
12:40 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4051.ulsfo.wmnet
12:39 topranks: rebooting ssw1-e1-eqiad to upgrade JunOS
12:39 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4051.ulsfo.wmnet
12:33 topranks: disabling BGP to ssw1-e1-eqiad from cr1-eqiad in advance of upgrade T366361
12:33 vgutierrez: depool text@codfw before enabling IPIP encapsulation - T366466
12:29 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4051.ulsfo.wmnet
12:28 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4051.ulsfo.wmnet
12:25 topranks: disabling PyBal on lvs1018 to allow for cable move T366361
12:25 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1018.eqiad.wmnet with reason: moving lvs1018 link to row E from spine to leaf
12:25 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1018.eqiad.wmnet with reason: moving lvs1018 link to row E from spine to leaf
12:24 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1017.eqiad.wmnet
12:24 cmooney@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs1017.eqiad.wmnet
12:21 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
12:21 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
12:14 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on 18 hosts with reason: upgrading spine switches eqiad rows e and f
12:14 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on 18 hosts with reason: upgrading spine switches eqiad rows e and f
11:56 topranks: disabling PyBal on lvs1017 to allow for cable move T366361
11:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1017.eqiad.wmnet with reason: moving lvs1017 link to row E from spine to leaf
11:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1017.eqiad.wmnet with reason: moving lvs1017 link to row E from spine to leaf
11:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-codfw
11:27 effie: kicking off k8s eqiad restarts - T366555
11:25 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
11:15 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
11:09 klausman@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
11:05 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
10:58 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
10:47 sfaci@deploy1002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
10:45 sfaci@deploy1002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
10:45 sfaci@deploy1002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
10:43 sfaci@deploy1002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
10:41 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
10:41 sfaci@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
10:40 sfaci@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
10:40 sfaci@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
10:38 sfaci@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
10:37 sfaci@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
10:35 sfaci@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
10:27 sfaci@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
10:26 sfaci@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
10:11 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64181 and previous config saved to /var/cache/conftool/dbconfig/20240606-100747-arnaudb.json
09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64180 and previous config saved to /var/cache/conftool/dbconfig/20240606-095240-arnaudb.json
09:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
09:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
09:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T364069)', diff saved to https://phabricator.wikimedia.org/P64179 and previous config saved to /var/cache/conftool/dbconfig/20240606-095053-marostegui.json
09:47 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2004.codfw.wmnet
09:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64178 and previous config saved to /var/cache/conftool/dbconfig/20240606-093734-arnaudb.json
09:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P64177 and previous config saved to /var/cache/conftool/dbconfig/20240606-093545-marostegui.json
09:33 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2004.codfw.wmnet
09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2003.codfw.wmnet
09:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64176 and previous config saved to /var/cache/conftool/dbconfig/20240606-092228-arnaudb.json
09:22 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
09:20 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
09:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P64175 and previous config saved to /var/cache/conftool/dbconfig/20240606-092037-marostegui.json
09:20 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
09:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet
09:17 cgoubert@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-codfw
09:17 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
09:15 stevemunene@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:13 stevemunene@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
09:12 stevemunene@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
09:11 stevemunene@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
09:08 mvernon@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be1004.eqiad.wmnet
09:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64174 and previous config saved to /var/cache/conftool/dbconfig/20240606-090722-arnaudb.json
09:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T364069)', diff saved to https://phabricator.wikimedia.org/P64173 and previous config saved to /var/cache/conftool/dbconfig/20240606-090529-marostegui.json
09:01 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2002.codfw.wmnet
09:01 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
09:01 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
08:57 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
08:56 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host thanos-be1004.eqiad.wmnet
08:56 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
08:52 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
08:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 5%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64172 and previous config saved to /var/cache/conftool/dbconfig/20240606-085216-arnaudb.json
08:52 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
08:50 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1031.eqiad.wmnet
08:47 mvernon@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be1003.eqiad.wmnet
08:44 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1031.eqiad.wmnet
08:44 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:43 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2002.codfw.wmnet
08:40 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2001.codfw.wmnet
08:39 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
08:39 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
08:38 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
08:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 2%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64171 and previous config saved to /var/cache/conftool/dbconfig/20240606-083710-arnaudb.json
08:36 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
08:35 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:35 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
08:19 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
08:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T364299)', diff saved to https://phabricator.wikimedia.org/P64167 and previous config saved to /var/cache/conftool/dbconfig/20240606-081753-marostegui.json
08:14 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
08:14 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
08:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P64166 and previous config saved to /var/cache/conftool/dbconfig/20240606-081412-ladsgroup.json
08:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P64165 and previous config saved to /var/cache/conftool/dbconfig/20240606-080245-marostegui.json
08:02 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host thanos-be1002.eqiad.wmnet
08:01 mvernon@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be1001.eqiad.wmnet
08:00 urbanecm@deploy1002: Started scap: Backport for Add throttle exception for an upcoming workshop (T366748)
07:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P64164 and previous config saved to /var/cache/conftool/dbconfig/20240606-075904-ladsgroup.json
07:50 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host thanos-be1001.eqiad.wmnet
07:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P64163 and previous config saved to /var/cache/conftool/dbconfig/20240606-074737-marostegui.json
07:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T352010)', diff saved to https://phabricator.wikimedia.org/P64162 and previous config saved to /var/cache/conftool/dbconfig/20240606-074356-ladsgroup.json
07:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T364299)', diff saved to https://phabricator.wikimedia.org/P64161 and previous config saved to /var/cache/conftool/dbconfig/20240606-073229-marostegui.json
07:30 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
07:06 hashar: Restarting Gerrit
07:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T352010)', diff saved to https://phabricator.wikimedia.org/P64160 and previous config saved to /var/cache/conftool/dbconfig/20240606-070558-ladsgroup.json
07:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
07:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
06:56 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1034.eqiad.wmnet
06:49 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1034.eqiad.wmnet
05:40 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
05:21 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
05:20 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
05:04 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
05:02 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
04:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T364299)', diff saved to https://phabricator.wikimedia.org/P64159 and previous config saved to /var/cache/conftool/dbconfig/20240606-041714-marostegui.json
04:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2219.codfw.wmnet with reason: Maintenance
04:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2219.codfw.wmnet with reason: Maintenance
04:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T364299)', diff saved to https://phabricator.wikimedia.org/P64158 and previous config saved to /var/cache/conftool/dbconfig/20240606-041650-marostegui.json
04:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P64157 and previous config saved to /var/cache/conftool/dbconfig/20240606-040142-marostegui.json
03:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1193 (T352010)', diff saved to https://phabricator.wikimedia.org/P64156 and previous config saved to /var/cache/conftool/dbconfig/20240606-034732-ladsgroup.json
03:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
03:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
03:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T352010)', diff saved to https://phabricator.wikimedia.org/P64155 and previous config saved to /var/cache/conftool/dbconfig/20240606-034709-ladsgroup.json
03:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P64154 and previous config saved to /var/cache/conftool/dbconfig/20240606-034635-marostegui.json
03:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P64153 and previous config saved to /var/cache/conftool/dbconfig/20240606-033201-ladsgroup.json
03:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T364299)', diff saved to https://phabricator.wikimedia.org/P64152 and previous config saved to /var/cache/conftool/dbconfig/20240606-033125-marostegui.json
03:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T352010)', diff saved to https://phabricator.wikimedia.org/P64151 and previous config saved to /var/cache/conftool/dbconfig/20240606-032907-ladsgroup.json
03:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
03:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
03:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T352010)', diff saved to https://phabricator.wikimedia.org/P64150 and previous config saved to /var/cache/conftool/dbconfig/20240606-032844-ladsgroup.json
03:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P64149 and previous config saved to /var/cache/conftool/dbconfig/20240606-031653-ladsgroup.json
03:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P64148 and previous config saved to /var/cache/conftool/dbconfig/20240606-031336-ladsgroup.json
03:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T352010)', diff saved to https://phabricator.wikimedia.org/P64147 and previous config saved to /var/cache/conftool/dbconfig/20240606-030145-ladsgroup.json
02:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P64146 and previous config saved to /var/cache/conftool/dbconfig/20240606-025828-ladsgroup.json
02:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T352010)', diff saved to https://phabricator.wikimedia.org/P64145 and previous config saved to /var/cache/conftool/dbconfig/20240606-024321-ladsgroup.json
01:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T364069)', diff saved to https://phabricator.wikimedia.org/P64144 and previous config saved to /var/cache/conftool/dbconfig/20240606-012208-marostegui.json
01:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
01:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
01:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T364069)', diff saved to https://phabricator.wikimedia.org/P64143 and previous config saved to /var/cache/conftool/dbconfig/20240606-012144-marostegui.json
01:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P64142 and previous config saved to /var/cache/conftool/dbconfig/20240606-010636-marostegui.json
00:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P64141 and previous config saved to /var/cache/conftool/dbconfig/20240606-005128-marostegui.json
00:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T364069)', diff saved to https://phabricator.wikimedia.org/P64140 and previous config saved to /var/cache/conftool/dbconfig/20240606-003620-marostegui.json
00:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T364299)', diff saved to https://phabricator.wikimedia.org/P64139 and previous config saved to /var/cache/conftool/dbconfig/20240606-003232-marostegui.json
00:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2210.codfw.wmnet with reason: Maintenance
00:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2210.codfw.wmnet with reason: Maintenance
00:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T364299)', diff saved to https://phabricator.wikimedia.org/P64138 and previous config saved to /var/cache/conftool/dbconfig/20240606-003208-marostegui.json
00:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P64137 and previous config saved to /var/cache/conftool/dbconfig/20240606-001700-marostegui.json
00:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P64136 and previous config saved to /var/cache/conftool/dbconfig/20240606-000151-marostegui.json

2024-06-05

23:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T364299)', diff saved to https://phabricator.wikimedia.org/P64135 and previous config saved to /var/cache/conftool/dbconfig/20240605-234643-marostegui.json
23:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
23:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
23:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T352010)', diff saved to https://phabricator.wikimedia.org/P64134 and previous config saved to /var/cache/conftool/dbconfig/20240605-232926-ladsgroup.json
23:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
23:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
22:54 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
22:50 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin2002 - T366555
22:44 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
22:03 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Hail mary - eevans@cumin1002
21:43 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Hail mary - eevans@cumin1002
21:42 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin2002 - T366555
21:42 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin2002 - T366555
21:36 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin2002 - T366555
21:18 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
21:08 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
21:02 jhathaway@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host mx-in2001.wikimedia.org
21:02 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mx-in2001.wikimedia.org with OS bookworm
20:45 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mx-in2001.wikimedia.org with reason: host reimage
20:42 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mx-in2001.wikimedia.org with reason: host reimage
20:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T364299)', diff saved to https://phabricator.wikimedia.org/P64133 and previous config saved to /var/cache/conftool/dbconfig/20240605-202949-marostegui.json
20:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2206.codfw.wmnet with reason: Maintenance
20:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2206.codfw.wmnet with reason: Maintenance
20:26 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host mx-in2001.wikimedia.org with OS bookworm
20:26 jhathaway@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM mx-in2001.wikimedia.org - jhathaway@cumin1002"
20:25 urbanecm@deploy1002: Finished scap: Backport for [CheckUser] Stop writing old for event tables migration on group0 (T360685), Growth: Use `growthexperiments` DB list for enabling GrowthExperiments (T364892), [Beta] Enable CommunityConfiguration extension in all wikis (T364892) (duration: 22m 04s)
20:25 jhathaway@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM mx-in2001.wikimedia.org - jhathaway@cumin1002"
20:25 jhathaway@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mx-in2001.wikimedia.org on all recursors
20:25 jhathaway@cumin1002: START - Cookbook sre.dns.wipe-cache mx-in2001.wikimedia.org on all recursors
20:25 jhathaway@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:25 jhathaway@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM mx-in2001.wikimedia.org - jhathaway@cumin1002"
20:24 jhathaway@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM mx-in2001.wikimedia.org - jhathaway@cumin1002"
20:22 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
20:21 jhathaway@cumin1002: START - Cookbook sre.dns.netbox
20:21 jhathaway@cumin1002: START - Cookbook sre.ganeti.makevm for new host mx-in2001.wikimedia.org
20:18 jhathaway@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host mx-in1001.wikimedia.org
20:18 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mx-in1001.wikimedia.org with OS bookworm
20:16 urbanecm@deploy1002: urbanecm and sgimeno and dreamyjazz: Continuing with sync
20:12 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
20:06 ejegg: payments-wiki upgraded from c255fda8 to 82a5e588
20:06 urbanecm@deploy1002: urbanecm and sgimeno and dreamyjazz: Backport for [CheckUser] Stop writing old for event tables migration on group0 (T360685), Growth: Use `growthexperiments` DB list for enabling GrowthExperiments (T364892), [Beta] Enable CommunityConfiguration extension in all wikis (T364892) synced to the testservers (https://wikitech.wikimedia.org/wiki/M
20:03 urbanecm@deploy1002: Started scap: Backport for [CheckUser] Stop writing old for event tables migration on group0 (T360685), Growth: Use `growthexperiments` DB list for enabling GrowthExperiments (T364892), [Beta] Enable CommunityConfiguration extension in all wikis (T364892)
20:02 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mx-in1001.wikimedia.org with reason: host reimage
19:57 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mx-in1001.wikimedia.org with reason: host reimage
19:47 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host mx-in1001.wikimedia.org with OS bookworm
19:45 jhathaway@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM mx-in1001.wikimedia.org - jhathaway@cumin1002"
19:44 jhathaway@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM mx-in1001.wikimedia.org - jhathaway@cumin1002"
19:43 jhathaway@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mx-in1001.wikimedia.org on all recursors
19:43 jhathaway@cumin1002: START - Cookbook sre.dns.wipe-cache mx-in1001.wikimedia.org on all recursors
19:43 jhathaway@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:43 jhathaway@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM mx-in1001.wikimedia.org - jhathaway@cumin1002"
19:38 jhathaway@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM mx-in1001.wikimedia.org - jhathaway@cumin1002"
19:36 jhathaway@cumin1002: START - Cookbook sre.dns.netbox
19:36 jhathaway@cumin1002: START - Cookbook sre.ganeti.makevm for new host mx-in1001.wikimedia.org
19:27 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
19:09 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
18:58 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
18:53 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.8 refs T361402
18:53 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
18:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64132 and previous config saved to /var/cache/conftool/dbconfig/20240605-184250-ladsgroup.json
18:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P64131 and previous config saved to /var/cache/conftool/dbconfig/20240605-182742-ladsgroup.json
18:13 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
18:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P64130 and previous config saved to /var/cache/conftool/dbconfig/20240605-181234-ladsgroup.json
18:12 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
18:11 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1001.eqiad.wmnet
18:07 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1001.eqiad.wmnet
18:06 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
17:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64129 and previous config saved to /var/cache/conftool/dbconfig/20240605-175725-ladsgroup.json
17:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64128 and previous config saved to /var/cache/conftool/dbconfig/20240605-175503-ladsgroup.json
17:50 kamila@cumin1002: START - Cookbook sre.hosts.dhcp for host wikikube-ctrl1001.eqiad.wmnet
17:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet with reason: Maintenance
17:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2199.codfw.wmnet with reason: Maintenance
17:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T364299)', diff saved to https://phabricator.wikimedia.org/P64127 and previous config saved to /var/cache/conftool/dbconfig/20240605-174724-marostegui.json
17:42 ladsgroup@deploy1002: Finished scap: Backport for Stop writing to pagelinks old columns in enwiki (T352010) (duration: 12m 19s)
17:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P64126 and previous config saved to /var/cache/conftool/dbconfig/20240605-173954-ladsgroup.json
17:33 ladsgroup@deploy1002: ladsgroup: Continuing with sync
17:32 ladsgroup@deploy1002: ladsgroup: Backport for Stop writing to pagelinks old columns in enwiki (T352010) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P64125 and previous config saved to /var/cache/conftool/dbconfig/20240605-173216-marostegui.json
17:31 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
17:29 ladsgroup@deploy1002: Started scap: Backport for Stop writing to pagelinks old columns in enwiki (T352010)
17:27 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001']
17:24 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
17:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P64124 and previous config saved to /var/cache/conftool/dbconfig/20240605-172446-ladsgroup.json
17:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P64123 and previous config saved to /var/cache/conftool/dbconfig/20240605-171708-marostegui.json
17:13 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
17:12 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
17:10 jhathaway: phabricator email now egressing via mx-out{1001,2001}.wikimedia.org, which should solve the SPF warnings in your inbox
17:10 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1033.eqiad.wmnet
17:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64122 and previous config saved to /var/cache/conftool/dbconfig/20240605-170938-ladsgroup.json
17:06 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1007.eqiad.wmnet with reason: decom T353785
17:06 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1033.eqiad.wmnet
17:06 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1007.eqiad.wmnet with reason: decom T353785
17:05 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1006.eqiad.wmnet with reason: decom T353785
17:05 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1006.eqiad.wmnet with reason: decom T353785
17:04 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
17:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T364299)', diff saved to https://phabricator.wikimedia.org/P64121 and previous config saved to /var/cache/conftool/dbconfig/20240605-170200-marostegui.json
16:56 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001']
16:56 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1005.eqiad.wmnet with reason: decom T353785
16:56 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1005.eqiad.wmnet with reason: decom T353785
16:54 mutante: downtimed stat1004 for 10 days to avoid alerting spam during decom process - T353785
16:53 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1004.eqiad.wmnet with reason: decom T353785
16:53 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1004.eqiad.wmnet with reason: decom T353785
16:52 ladsgroup@deploy1002: Finished scap: Backport for Bump XML dump schema to version 0.11 (T365155) (duration: 18m 23s)
16:48 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
16:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64120 and previous config saved to /var/cache/conftool/dbconfig/20240605-164635-ladsgroup.json
16:46 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
16:45 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
16:43 ladsgroup@deploy1002: ladsgroup and dr0ptp4kt: Continuing with sync
16:40 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage1003.eqiad.wmnet
16:38 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
16:36 ladsgroup@deploy1002: ladsgroup and dr0ptp4kt: Backport for Bump XML dump schema to version 0.11 (T365155) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:34 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
16:34 ladsgroup@deploy1002: Started scap: Backport for Bump XML dump schema to version 0.11 (T365155)
16:32 jayme@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubestage1003.eqiad.wmnet
16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64119 and previous config saved to /var/cache/conftool/dbconfig/20240605-163129-ladsgroup.json
16:20 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
16:18 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
16:18 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1032.eqiad.wmnet
16:18 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
16:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: Maint over', diff saved to https://phabricator.wikimedia.org/P64118 and previous config saved to /var/cache/conftool/dbconfig/20240605-161622-ladsgroup.json
16:16 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
16:15 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:14 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:12 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1032.eqiad.wmnet
16:11 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
16:10 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
16:10 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
16:10 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
16:08 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
16:05 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:05 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:01 aokoth@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
16:01 aokoth@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
16:01 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
16:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64117 and previous config saved to /var/cache/conftool/dbconfig/20240605-160116-ladsgroup.json
15:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T352010)', diff saved to https://phabricator.wikimedia.org/P64116 and previous config saved to /var/cache/conftool/dbconfig/20240605-155955-ladsgroup.json
15:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
15:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
15:59 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1082.eqiad.wmnet
15:58 aokoth@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
15:58 aokoth@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
15:57 aokoth@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
15:56 aokoth@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
15:51 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1082.eqiad.wmnet
15:51 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1081.eqiad.wmnet
15:51 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
15:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T352010)', diff saved to https://phabricator.wikimedia.org/P64115 and previous config saved to /var/cache/conftool/dbconfig/20240605-155023-ladsgroup.json
15:46 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
15:44 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1081.eqiad.wmnet
15:43 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1080.eqiad.wmnet
15:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb1002.eqiad.wmnet
15:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2080.codfw.wmnet
15:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb1002.eqiad.wmnet
15:37 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1080.eqiad.wmnet
15:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2002.codfw.wmnet
15:37 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1079.eqiad.wmnet
15:36 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2080.codfw.wmnet
15:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2079.codfw.wmnet
15:34 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
15:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb2002.codfw.wmnet
15:32 moritzm: rebalancing drmrs Ganeti clusters
15:30 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001']
15:29 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1079.eqiad.wmnet
15:28 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1078.eqiad.wmnet
15:28 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2079.codfw.wmnet
15:27 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2078.codfw.wmnet
15:26 sukhe@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2003.codfw.wmnet
15:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ping1004.eqiad.wmnet
15:25 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ping1004.eqiad.wmnet with OS bookworm
15:24 sukhe@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2003.codfw.wmnet
15:21 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1078.eqiad.wmnet
15:20 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1077.eqiad.wmnet
15:19 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2078.codfw.wmnet
15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2077.codfw.wmnet
15:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
15:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
15:13 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1077.eqiad.wmnet
15:12 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2077.codfw.wmnet
15:10 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
15:10 kamila@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-ctrl1001']
15:09 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
15:09 jnuche@deploy1002: Installation of scap version "4.86.0" completed for 285 hosts
15:08 jnuche@deploy1002: Installing scap version "4.86.0" for 285 hosts
15:07 jnuche@deploy1002: Installing scap version "4.86.0" for 286 hosts
15:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T364069)', diff saved to https://phabricator.wikimedia.org/P64114 and previous config saved to /var/cache/conftool/dbconfig/20240605-150605-marostegui.json
15:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
15:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
15:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T364069)', diff saved to https://phabricator.wikimedia.org/P64113 and previous config saved to /var/cache/conftool/dbconfig/20240605-150542-marostegui.json
15:05 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
15:04 vgutierrez: repool text@eqsin with IPIP encapsulation enabled - T366466
15:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
15:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2002.codfw.wmnet
15:01 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
14:59 cwhite@deploy1002: Finished scap: Backport for MWMultiVersion: Fix "Undefined index: PATH_INFO" warnings (T366657) (duration: 12m 32s)
14:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T352010)', diff saved to https://phabricator.wikimedia.org/P64112 and previous config saved to /var/cache/conftool/dbconfig/20240605-145757-ladsgroup.json
14:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
14:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
14:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T352010)', diff saved to https://phabricator.wikimedia.org/P64111 and previous config saved to /var/cache/conftool/dbconfig/20240605-145735-ladsgroup.json
14:55 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
14:55 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
14:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2002.codfw.wmnet
14:55 vgutierrez: rolling restart of pybal on lvs5006 and lvs5004 - T366466
14:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P64110 and previous config saved to /var/cache/conftool/dbconfig/20240605-145034-marostegui.json
14:50 cwhite@deploy1002: matmarex and cwhite: Continuing with sync
14:49 cwhite@deploy1002: matmarex and cwhite: Backport for MWMultiVersion: Fix "Undefined index: PATH_INFO" warnings (T366657) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host serpens.wikimedia.org
14:46 cwhite@deploy1002: Started scap: Backport for MWMultiVersion: Fix "Undefined index: PATH_INFO" warnings (T366657)
14:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host serpens.wikimedia.org
14:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P64109 and previous config saved to /var/cache/conftool/dbconfig/20240605-144227-ladsgroup.json
14:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P64108 and previous config saved to /var/cache/conftool/dbconfig/20240605-143526-marostegui.json
14:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6002.drmrs.wmnet
14:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
14:29 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
14:28 vgutierrez: depool text@eqsin before enabling IPIP encapsulation - T366466
14:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P64107 and previous config saved to /var/cache/conftool/dbconfig/20240605-142718-ladsgroup.json
14:23 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1076.eqiad.wmnet
14:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2076.codfw.wmnet
14:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
14:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T364069)', diff saved to https://phabricator.wikimedia.org/P64106 and previous config saved to /var/cache/conftool/dbconfig/20240605-142018-marostegui.json
14:15 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2076.codfw.wmnet
14:15 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1076.eqiad.wmnet
14:13 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1075.eqiad.wmnet
14:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2075.codfw.wmnet
14:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T352010)', diff saved to https://phabricator.wikimedia.org/P64105 and previous config saved to /var/cache/conftool/dbconfig/20240605-141210-ladsgroup.json
14:10 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:10 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:07 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
14:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2075.codfw.wmnet
14:05 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1075.eqiad.wmnet
14:04 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ping1004.eqiad.wmnet with OS bookworm
14:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ping1004.eqiad.wmnet - jmm@cumin2002"
14:02 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:02 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:00 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ping1004.eqiad.wmnet - jmm@cumin2002"
14:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6002.drmrs.wmnet
14:00 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2074.codfw.wmnet
14:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping1004.eqiad.wmnet on all recursors
14:00 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping1004.eqiad.wmnet on all recursors
14:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping1004.eqiad.wmnet - jmm@cumin2002"
13:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6001.drmrs.wmnet
13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
13:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping1004.eqiad.wmnet - jmm@cumin2002"
13:54 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1074.eqiad.wmnet
13:52 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus3003.esams.wmnet
13:52 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2074.codfw.wmnet
13:52 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2073.codfw.wmnet
13:52 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus5002.eqsin.wmnet
13:52 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus4002.ulsfo.wmnet
13:51 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
13:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
13:48 inflatador: bking@an-db1001 install python3-psycopg2 pkg T363001
13:48 daniel@deploy1002: Finished scap: Backport for Set LinterParseOnDerivedDataUpdate to false (T361013) (duration: 17m 50s)
13:48 jmm@cumin2002: START - Cookbook sre.dns.netbox
13:48 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping1004.eqiad.wmnet
13:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
13:47 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
13:46 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus3003.esams.wmnet
13:46 elukey: factory reset for sretest1001 to test the new provision cookbook - T365372
13:46 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus4002.ulsfo.wmnet
13:46 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2073.codfw.wmnet
13:46 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus5002.eqsin.wmnet
13:46 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1074.eqiad.wmnet
13:45 inflatador: bking@an-db1001 install acl pkg T363001
13:43 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1073.eqiad.wmnet
13:43 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus6002.drmrs.wmnet
13:43 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus7001.magru.wmnet
13:40 daniel@deploy1002: daniel: Continuing with sync
13:39 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus6002.drmrs.wmnet
13:37 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6001.drmrs.wmnet
13:37 filippo@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host graphite1005.eqiad.wmnet
13:37 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus7001.magru.wmnet
13:37 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2072.codfw.wmnet
13:36 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1073.eqiad.wmnet
13:35 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1072.eqiad.wmnet
13:34 daniel@deploy1002: daniel: Backport for Set LinterParseOnDerivedDataUpdate to false (T361013) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:34 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
13:30 daniel@deploy1002: Started scap: Backport for Set LinterParseOnDerivedDataUpdate to false (T361013)
13:29 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2072.codfw.wmnet
13:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2071.codfw.wmnet
13:27 elukey: systemctl reset-failed [email protected] redis-instance-tcp_6380.service on netbox[12]002 + apt-get purge of redis-server and prometheus-redis-exporter packages to clean up stale configs (no local redis is used)
13:27 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1072.eqiad.wmnet
13:26 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1071.eqiad.wmnet
13:26 dreamyjazz@deploy1002: Finished scap: Backport for Follow-up: Don't run interact with block buttons if they don't exist (T329493) (duration: 11m 39s)
13:25 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host graphite1005.eqiad.wmnet
13:21 fabfur: enable magru DC after applying IPIP encapsulation patches (T366466)
13:20 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2071.codfw.wmnet
13:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2070.codfw.wmnet
13:17 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
13:17 dreamyjazz@deploy1002: dreamyjazz: Backport for Follow-up: Don't run interact with block buttons if they don't exist (T329493) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2179 (T364299)', diff saved to https://phabricator.wikimedia.org/P64104 and previous config saved to /var/cache/conftool/dbconfig/20240605-131647-marostegui.json
13:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2179.codfw.wmnet with reason: Maintenance
13:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2179.codfw.wmnet with reason: Maintenance
13:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T364299)', diff saved to https://phabricator.wikimedia.org/P64103 and previous config saved to /var/cache/conftool/dbconfig/20240605-131623-marostegui.json
13:14 dreamyjazz@deploy1002: Started scap: Backport for Follow-up: Don't run interact with block buttons if they don't exist (T329493)
13:13 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2070.codfw.wmnet
13:13 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1071.eqiad.wmnet
13:13 dreamyjazz@deploy1002: Finished scap: Backport for [CheckUser] Stop writing old for event table migration on testwiki (T360686) (duration: 19m 13s)
13:10 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:aux-worker
13:06 fabfur: restarting pybal on lvs7001/lvs7003 to appy IPIP conf (T366466)
13:04 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
13:03 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1070.eqiad.wmnet
13:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2069.codfw.wmnet
13:02 dreamyjazz@deploy1002: dreamyjazz: Backport for [CheckUser] Stop writing old for event table migration on testwiki (T360686) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P64102 and previous config saved to /var/cache/conftool/dbconfig/20240605-130115-marostegui.json
12:56 elukey@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:aux-worker
12:55 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1070.eqiad.wmnet
12:55 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2069.codfw.wmnet
12:53 dreamyjazz@deploy1002: Started scap: Backport for [CheckUser] Stop writing old for event table migration on testwiki (T360686)
12:53 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1069.eqiad.wmnet
12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping2004.codfw.wmnet
12:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ping2004.codfw.wmnet with OS bookworm
12:51 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2068.codfw.wmnet
12:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1246.eqiad.wmnet with reason: maintenance
12:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1246.eqiad.wmnet with reason: maintenance
12:49 arnaudb@cumin1002: dbctl commit (dc=all): 'depool db1246 T363119', diff saved to https://phabricator.wikimedia.org/P64101 and previous config saved to /var/cache/conftool/dbconfig/20240605-124918-arnaudb.json
12:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P64100 and previous config saved to /var/cache/conftool/dbconfig/20240605-124607-marostegui.json
12:46 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1069.eqiad.wmnet
12:45 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1068.eqiad.wmnet
12:45 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2068.codfw.wmnet
12:45 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2067.codfw.wmnet
12:43 moritzm: failover ganeti masters in drmrs
12:40 cgoubert@cumin1002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:wikikube-worker-codfw
12:39 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1068.eqiad.wmnet
12:39 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
12:38 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
12:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2066.codfw.wmnet
12:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ping2004.codfw.wmnet with reason: host reimage
12:35 fabfur: disabling puppet on A:cp-text to test IPIP encapsulation on magru (T366466)
12:33 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6004.drmrs.wmnet
12:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
12:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ping2004.codfw.wmnet with reason: host reimage
12:32 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2066.codfw.wmnet
12:31 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
12:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T364299)', diff saved to https://phabricator.wikimedia.org/P64099 and previous config saved to /var/cache/conftool/dbconfig/20240605-123059-marostegui.json
12:29 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
12:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2065.codfw.wmnet
12:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
12:26 fabfur: disabling magru DC to apply IPIP encapsulation patches (T366466)
12:21 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
12:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Long schema change
12:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Long schema change
12:20 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2064.codfw.wmnet
12:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6004.drmrs.wmnet
12:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6003.drmrs.wmnet
12:17 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
12:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
12:16 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
12:15 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ping2004.codfw.wmnet with OS bookworm
12:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ping2004.codfw.wmnet - jmm@cumin2002"
12:14 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2064.codfw.wmnet
12:14 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ping2004.codfw.wmnet - jmm@cumin2002"
12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping2004.codfw.wmnet on all recursors
12:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping2004.codfw.wmnet on all recursors
12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping2004.codfw.wmnet - jmm@cumin2002"
12:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet
12:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping2004.codfw.wmnet - jmm@cumin2002"
12:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
12:09 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
12:08 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
12:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
12:05 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping2004.codfw.wmnet
12:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
12:04 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
12:03 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6003.drmrs.wmnet
12:00 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
12:00 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1063.eqiad.wmnet
11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
11:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2062.codfw.wmnet
11:52 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1063.eqiad.wmnet
11:50 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1062.eqiad.wmnet
11:49 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2062.codfw.wmnet
11:48 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2061.codfw.wmnet
11:44 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-eqiad
11:41 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1062.eqiad.wmnet
11:41 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2002.codfw.wmnet
11:39 hnowlan@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1008.eqiad.wmnet|wikikube-worker1009.eqiad.wmnet|wikikube-worker1010.eqiad.wmnet|wikikube-worker1011.eqiad.wmnet|wikikube-worker1012.eqiad.wmnet),cluster=kubernetes,service=kubesvc
11:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2002.codfw.wmnet
11:38 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1061.eqiad.wmnet
11:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
11:37 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1031.eqiad.wmnet with OS bullseye
11:36 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-eqiad
11:31 hnowlan: running homer to configure bgp on 5 new k8s workers
11:31 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1011.eqiad.wmnet with OS bullseye
11:30 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
11:30 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1061.eqiad.wmnet
11:27 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1009.eqiad.wmnet with OS bullseye
11:21 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1031.eqiad.wmnet with reason: host reimage
11:17 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1031.eqiad.wmnet with reason: host reimage
11:12 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1011.eqiad.wmnet with reason: host reimage
11:09 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1009.eqiad.wmnet with reason: host reimage
11:06 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1011.eqiad.wmnet with reason: host reimage
11:06 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1009.eqiad.wmnet with reason: host reimage
11:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
11:03 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1031.eqiad.wmnet with OS bullseye
11:03 claime: restarted send_tile_invalidations.service on maps1009
11:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64098 and previous config saved to /var/cache/conftool/dbconfig/20240605-110303-ladsgroup.json
10:59 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-codfw
10:54 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1060.eqiad.wmnet
10:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64097 and previous config saved to /var/cache/conftool/dbconfig/20240605-105400-root.json
10:53 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1011.eqiad.wmnet with OS bullseye
10:53 hnowlan@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1011.eqiad.wmnet with OS bullseye
10:53 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1009.eqiad.wmnet with OS bullseye
10:52 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1009.eqiad.wmnet with OS bullseye
10:52 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-codfw
10:50 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
10:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
10:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64096 and previous config saved to /var/cache/conftool/dbconfig/20240605-104757-ladsgroup.json
10:46 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1060.eqiad.wmnet
10:46 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1059.eqiad.wmnet
10:42 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
10:39 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1003.eqiad.wmnet
10:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64094 and previous config saved to /var/cache/conftool/dbconfig/20240605-103854-root.json
10:37 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1003.eqiad.wmnet
10:37 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1012.eqiad.wmnet with OS bullseye
10:35 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
10:34 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1010.eqiad.wmnet with OS bullseye
10:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64093 and previous config saved to /var/cache/conftool/dbconfig/20240605-103251-ladsgroup.json
10:32 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1059.eqiad.wmnet
10:32 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
10:31 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2056.codfw.wmnet
10:30 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1008.eqiad.wmnet with OS bullseye
10:30 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1058.eqiad.wmnet
10:27 jmm@cumin2002: END (PASS) - Cookbook sre.netbox.restart-reboot (exit_code=0) rolling reboot on A:netbox
10:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64091 and previous config saved to /var/cache/conftool/dbconfig/20240605-102348-root.json
10:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64090 and previous config saved to /var/cache/conftool/dbconfig/20240605-102252-ladsgroup.json
10:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
10:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
10:22 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1058.eqiad.wmnet
10:22 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2056.codfw.wmnet
10:21 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2055.codfw.wmnet
10:21 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1057.eqiad.wmnet
10:18 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1012.eqiad.wmnet with reason: host reimage
10:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P64088 and previous config saved to /var/cache/conftool/dbconfig/20240605-101744-ladsgroup.json
10:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1152.eqiad.wmnet with OS bookworm
10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64087 and previous config saved to /var/cache/conftool/dbconfig/20240605-101521-ladsgroup.json
10:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
10:15 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1010.eqiad.wmnet with reason: host reimage
10:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
10:13 dcaro@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudcephosd1031.eqiad.wmnet
10:13 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1012.eqiad.wmnet with reason: host reimage
10:11 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1008.eqiad.wmnet with reason: host reimage
10:10 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1152 back to x2 eqiad master T366677', diff saved to https://phabricator.wikimedia.org/P64086 and previous config saved to /var/cache/conftool/dbconfig/20240605-101019-root.json
10:09 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1010.eqiad.wmnet with reason: host reimage
10:09 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1008.eqiad.wmnet with reason: host reimage
10:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64085 and previous config saved to /var/cache/conftool/dbconfig/20240605-100842-root.json
10:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64084 and previous config saved to /var/cache/conftool/dbconfig/20240605-100810-root.json
10:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64083 and previous config saved to /var/cache/conftool/dbconfig/20240605-100117-root.json
10:00 fabfur: disabling puppet on cp4037 to test Benthos performances (T358109)
10:00 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1012.eqiad.wmnet with OS bullseye
10:00 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1057.eqiad.wmnet
10:00 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1011.eqiad.wmnet with OS bullseye
10:00 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2055.codfw.wmnet
09:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
09:59 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
09:59 cgoubert@cumin1002: conftool action : set/pooled=yes:weight=10; selector: name=wikikube-worker1001.eqiad.wmnet,cluster=kubernetes,service=kubesvc
09:58 claime: pooling and uncordoning wikikube-worker1001 - T351074
09:57 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1456 to wikikube-worker1012
09:57 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1012
09:56 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
09:55 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1010.eqiad.wmnet with OS bullseye
09:55 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1009.eqiad.wmnet with OS bullseye
09:55 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1008.eqiad.wmnet with OS bullseye
09:55 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1008.eqiad.wmnet wikikube-worker1009.eqiad.wmnet wikikube-worker1010.eqiad.wmnet wikikube-worker1011.eqiad.wmnet wikikube-worker1012.eqiad.wmnet on all recursors
09:55 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1008.eqiad.wmnet wikikube-worker1009.eqiad.wmnet wikikube-worker1010.eqiad.wmnet wikikube-worker1011.eqiad.wmnet wikikube-worker1012.eqiad.wmnet on all recursors
09:54 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1012
09:54 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:54 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1456 to wikikube-worker1012 - hnowlan@cumin1002"
09:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1152.eqiad.wmnet with reason: host reimage
09:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
09:54 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
09:54 jmm@cumin2002: START - Cookbook sre.netbox.restart-reboot rolling reboot on A:netbox
09:53 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1456 to wikikube-worker1012 - hnowlan@cumin1002"
09:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64082 and previous config saved to /var/cache/conftool/dbconfig/20240605-095336-root.json
09:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64081 and previous config saved to /var/cache/conftool/dbconfig/20240605-095303-root.json
09:52 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1428 to wikikube-worker1011
09:52 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1011
09:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1152.eqiad.wmnet with reason: host reimage
09:51 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1031.eqiad.wmnet
09:51 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
09:51 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1456 to wikikube-worker1012
09:50 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1011
09:50 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:50 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1428 to wikikube-worker1011 - hnowlan@cumin1002"
09:49 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1428 to wikikube-worker1011 - hnowlan@cumin1002"
09:46 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
09:46 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1428 to wikikube-worker1011
09:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64080 and previous config saved to /var/cache/conftool/dbconfig/20240605-094611-root.json
09:46 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from mw1428 to wikikube-worker1011
09:45 hnowlan@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
09:45 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from mw1456 to wikikube-worker1012
09:44 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1410 to wikikube-worker1010
09:44 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1456 to wikikube-worker1012
09:44 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1010
09:44 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
09:44 claime: homer 'cr*eqiad*' commit 'T351074'
09:44 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1428 to wikikube-worker1011
09:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1001.eqiad.wmnet with OS bullseye
09:43 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1401 to wikikube-worker1009
09:43 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1009
09:42 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1010
09:42 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:41 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1009
09:41 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:41 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1401 to wikikube-worker1009 - hnowlan@cumin1002"
09:41 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
09:40 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1400 to wikikube-worker1008
09:40 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1008
09:39 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1401 to wikikube-worker1009 - hnowlan@cumin1002"
09:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2054.codfw.wmnet
09:38 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1008
09:38 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:38 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1400 to wikikube-worker1008 - hnowlan@cumin1002"
09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2003.wikimedia.org
09:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64079 and previous config saved to /var/cache/conftool/dbconfig/20240605-093830-root.json
09:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64078 and previous config saved to /var/cache/conftool/dbconfig/20240605-093757-root.json
09:37 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1152.eqiad.wmnet with OS bookworm
09:35 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
09:35 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1151 to temp x2 eqiad master T366677', diff saved to https://phabricator.wikimedia.org/P64077 and previous config saved to /var/cache/conftool/dbconfig/20240605-093507-root.json
09:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 6 hosts with reason: Reimage x2 eqiad master T366677
09:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on 6 hosts with reason: Reimage x2 eqiad master T366677
09:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2003.wikimedia.org
09:33 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1410 to wikikube-worker1010
09:33 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1400 to wikikube-worker1008 - hnowlan@cumin1002"
09:31 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from mw1410 to wikikube-worker1010.eqiad.wmnet
09:31 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1410 to wikikube-worker1010.eqiad.wmnet
09:31 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1030.eqiad.wmnet
09:31 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1401 to wikikube-worker1009
09:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64076 and previous config saved to /var/cache/conftool/dbconfig/20240605-093105-root.json
09:30 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
09:30 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1400 to wikikube-worker1008
09:29 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from mw1400 to wikikube-worker1008.eqiad.wmnet
09:29 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1400 to wikikube-worker1008.eqiad.wmnet
09:26 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1030.eqiad.wmnet
09:26 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1056.eqiad.wmnet
09:24 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2054.codfw.wmnet
09:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2053.codfw.wmnet
09:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS bookworm
09:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64075 and previous config saved to /var/cache/conftool/dbconfig/20240605-092324-root.json
09:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64074 and previous config saved to /var/cache/conftool/dbconfig/20240605-092251-root.json
09:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1001.eqiad.wmnet with reason: host reimage
09:19 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1056.eqiad.wmnet
09:18 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1055.eqiad.wmnet
09:17 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1001.eqiad.wmnet with reason: host reimage
09:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64073 and previous config saved to /var/cache/conftool/dbconfig/20240605-091559-root.json
09:15 brouberol@cumin2002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons.
09:11 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1055.eqiad.wmnet
09:11 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1054.eqiad.wmnet
09:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64072 and previous config saved to /var/cache/conftool/dbconfig/20240605-090745-root.json
09:06 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4044.ulsfo.wmnet
09:06 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet
09:06 brouberol@cumin2002: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons.
09:02 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1001.eqiad.wmnet with OS bullseye
09:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage
09:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1001.eqiad.wmnet on all recursors
09:01 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1001.eqiad.wmnet on all recursors
09:01 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4052.ulsfo.wmnet
09:00 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64071 and previous config saved to /var/cache/conftool/dbconfig/20240605-090053-root.json
09:00 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4044.ulsfo.wmnet
08:58 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
08:58 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
08:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage
08:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org
08:57 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2053.codfw.wmnet
08:57 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1054.eqiad.wmnet
08:54 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
08:54 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1053.eqiad.wmnet
08:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2052.codfw.wmnet
08:53 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/ipoid: apply
08:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org
08:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:52 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64070 and previous config saved to /var/cache/conftool/dbconfig/20240605-085239-root.json
08:52 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1029.eqiad.wmnet
08:51 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4052.ulsfo.wmnet
08:51 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
08:51 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4044.ulsfo.wmnet
08:50 fabfur@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cp4044.ulsfo.wmnet
08:50 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4044.ulsfo.wmnet
08:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2002.codfw.wmnet
08:47 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2052.codfw.wmnet
08:47 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1053.eqiad.wmnet
08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2002.codfw.wmnet
08:45 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64069 and previous config saved to /var/cache/conftool/dbconfig/20240605-084547-root.json
08:45 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1029.eqiad.wmnet
08:45 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4052.ulsfo.wmnet
08:44 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4044.ulsfo.wmnet
08:44 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS bookworm
08:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1227', diff saved to https://phabricator.wikimedia.org/P64068 and previous config saved to /var/cache/conftool/dbconfig/20240605-084211-root.json
08:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage
08:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage
08:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
08:37 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1028.eqiad.wmnet
08:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64067 and previous config saved to /var/cache/conftool/dbconfig/20240605-083733-root.json
08:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
08:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1358 to wikikube-worker1001
08:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1001
08:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2002.codfw.wmnet
08:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1186.eqiad.wmnet with OS bookworm
08:18 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
08:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P64063 and previous config saved to /var/cache/conftool/dbconfig/20240605-081755-marostegui.json
08:14 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1027.eqiad.wmnet
08:08 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1027.eqiad.wmnet
08:07 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1026.eqiad.wmnet
08:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P64062 and previous config saved to /var/cache/conftool/dbconfig/20240605-080247-marostegui.json
08:01 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1026.eqiad.wmnet
08:00 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1025.eqiad.wmnet
08:00 cgoubert@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-codfw
07:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mirror1001.wikimedia.org
07:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1186.eqiad.wmnet with reason: host reimage
07:54 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1025.eqiad.wmnet
07:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1186.eqiad.wmnet with reason: host reimage
07:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mirror1001.wikimedia.org
07:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T364299)', diff saved to https://phabricator.wikimedia.org/P64061 and previous config saved to /var/cache/conftool/dbconfig/20240605-074739-marostegui.json
07:45 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1021.eqiad.wmnet
07:38 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1186.eqiad.wmnet with OS bookworm
07:38 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1021.eqiad.wmnet
07:38 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host db1186.eqiad.wmnet with OS bookworm
07:38 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1186.eqiad.wmnet with OS bookworm
07:37 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host db1186.eqiad.wmnet with OS bookworm
07:37 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1186.eqiad.wmnet with OS bookworm
07:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install2004.wikimedia.org
07:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install1004.wikimedia.org
07:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1186.eqiad.wmnet with reason: Reimage
07:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install2004.wikimedia.org
07:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 5:00:00 on db1186.eqiad.wmnet with reason: Reimage
07:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install1004.wikimedia.org
07:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1186', diff saved to https://phabricator.wikimedia.org/P64060 and previous config saved to /var/cache/conftool/dbconfig/20240605-073024-root.json
07:28 marostegui: dbmaint codfw s2 deploy schema change on db2207 T364299
07:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2207.codfw.wmnet with reason: Long schema change
07:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on db2207.codfw.wmnet with reason: Long schema change
07:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2207 T366038', diff saved to https://phabricator.wikimedia.org/P64059 and previous config saved to /var/cache/conftool/dbconfig/20240605-072509-root.json
07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2204 to s2 primary T366038', diff saved to https://phabricator.wikimedia.org/P64058 and previous config saved to /var/cache/conftool/dbconfig/20240605-072427-marostegui.json
07:24 marostegui: Starting s2 codfw failover from db2207 to db2204 - T366038
07:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 T366038
07:08 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2204 with weight 0 T366038', diff saved to https://phabricator.wikimedia.org/P64057 and previous config saved to /var/cache/conftool/dbconfig/20240605-070758-root.json
07:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 T366038
04:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T364069)', diff saved to https://phabricator.wikimedia.org/P64056 and previous config saved to /var/cache/conftool/dbconfig/20240605-044418-marostegui.json
04:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
04:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
04:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T364069)', diff saved to https://phabricator.wikimedia.org/P64055 and previous config saved to /var/cache/conftool/dbconfig/20240605-044355-marostegui.json
04:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P64054 and previous config saved to /var/cache/conftool/dbconfig/20240605-042847-marostegui.json
04:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P64053 and previous config saved to /var/cache/conftool/dbconfig/20240605-041339-marostegui.json
04:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T364299)', diff saved to https://phabricator.wikimedia.org/P64052 and previous config saved to /var/cache/conftool/dbconfig/20240605-041306-marostegui.json
04:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
04:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
04:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2155.codfw.wmnet with reason: Maintenance
04:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2155.codfw.wmnet with reason: Maintenance
04:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T364299)', diff saved to https://phabricator.wikimedia.org/P64051 and previous config saved to /var/cache/conftool/dbconfig/20240605-041227-marostegui.json
03:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T352010)', diff saved to https://phabricator.wikimedia.org/P64050 and previous config saved to /var/cache/conftool/dbconfig/20240605-035855-ladsgroup.json
03:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
03:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
03:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T352010)', diff saved to https://phabricator.wikimedia.org/P64049 and previous config saved to /var/cache/conftool/dbconfig/20240605-035832-ladsgroup.json
03:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T364069)', diff saved to https://phabricator.wikimedia.org/P64048 and previous config saved to /var/cache/conftool/dbconfig/20240605-035831-marostegui.json
03:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P64047 and previous config saved to /var/cache/conftool/dbconfig/20240605-035719-marostegui.json
03:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P64046 and previous config saved to /var/cache/conftool/dbconfig/20240605-034326-ladsgroup.json
03:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P64045 and previous config saved to /var/cache/conftool/dbconfig/20240605-034212-marostegui.json
03:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P64044 and previous config saved to /var/cache/conftool/dbconfig/20240605-032817-ladsgroup.json
03:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T364299)', diff saved to https://phabricator.wikimedia.org/P64043 and previous config saved to /var/cache/conftool/dbconfig/20240605-032704-marostegui.json
03:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T352010)', diff saved to https://phabricator.wikimedia.org/P64042 and previous config saved to /var/cache/conftool/dbconfig/20240605-031310-ladsgroup.json
02:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T352010)', diff saved to https://phabricator.wikimedia.org/P64041 and previous config saved to /var/cache/conftool/dbconfig/20240605-023423-ladsgroup.json
02:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
02:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance

2024-06-04

23:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T364299)', diff saved to https://phabricator.wikimedia.org/P64040 and previous config saved to /var/cache/conftool/dbconfig/20240604-234228-marostegui.json
23:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2147.codfw.wmnet with reason: Maintenance
23:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2147.codfw.wmnet with reason: Maintenance
23:15 tzatziki: removing one file for legal compliance
23:09 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on miscweb1003.eqiad.wmnet with reason: reboot T366555
23:09 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on miscweb1003.eqiad.wmnet with reason: reboot T366555
22:50 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
22:47 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on contint.wikimedia.org with reason: reboot T366555
22:47 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on contint.wikimedia.org with reason: reboot T366555
22:47 tzatziki: removing one file for legal compliance
22:46 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on contint1002.wikimedia.org with reason: reboot T366555
22:46 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on contint1002.wikimedia.org with reason: reboot T366555
22:36 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on contint1002.wikimedia.org with reason: reboot T366555
22:36 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on contint1002.wikimedia.org with reason: reboot T366555
22:36 mutante: CI - (integration.wikimedia.org) short downtime for maintenance
22:35 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on contint.wikimedia.org with reason: reboot T366555
22:35 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on contint.wikimedia.org with reason: reboot T366555
22:29 tzatziki: removing two files for legal compliance
22:16 tzatziki: removing three files for legal compliance
22:08 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
22:02 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
22:02 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
22:00 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
21:59 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:59 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:41 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:41 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:35 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:34 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:33 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:33 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:33 urbanecm@deploy1002: Finished scap: Backport for Disable font size options on specified pages for most wikis (T366334) (duration: 15m 10s)
21:32 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:32 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:28 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:28 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
21:24 urbanecm@deploy1002: toyofuku and urbanecm: Continuing with sync
21:21 urbanecm@deploy1002: toyofuku and urbanecm: Backport for Disable font size options on specified pages for most wikis (T366334) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:18 urbanecm@deploy1002: Started scap: Backport for Disable font size options on specified pages for most wikis (T366334)
21:10 tgr@deploy1002: Finished scap: Backport for multiversion: Support beta for upload hostname check, multiversion: Add tests for MWMultiVersion::getMediaWiki() (duration: 16m 33s)
21:07 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
21:06 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001']
21:01 tgr@deploy1002: tgr: Continuing with sync
20:58 tgr@deploy1002: tgr: Backport for multiversion: Support beta for upload hostname check, multiversion: Add tests for MWMultiVersion::getMediaWiki() synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:56 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
20:53 tgr@deploy1002: Started scap: Backport for multiversion: Support beta for upload hostname check, multiversion: Add tests for MWMultiVersion::getMediaWiki()
20:52 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
20:47 tgr@deploy1002: Finished scap: Backport for beta: Introduce new test2wiki on test2.wikipedia.beta.wmcloud.org (T355281) (duration: 13m 12s)
20:39 tgr@deploy1002: tgr and pmiazga: Continuing with sync
20:37 tgr@deploy1002: tgr and pmiazga: Backport for beta: Introduce new test2wiki on test2.wikipedia.beta.wmcloud.org (T355281) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:34 tgr@deploy1002: Started scap: Backport for beta: Introduce new test2wiki on test2.wikipedia.beta.wmcloud.org (T355281)
20:28 ladsgroup@deploy1002: Finished scap: Backport for [pawiki] Enable wgMinervaEnableSiteNotice (T366434) (duration: 13m 24s)
20:27 jhathaway: vacuuming pcc db
20:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
20:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
20:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T364299)', diff saved to https://phabricator.wikimedia.org/P64039 and previous config saved to /var/cache/conftool/dbconfig/20240604-202554-marostegui.json
20:22 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
20:22 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
20:21 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
20:21 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
20:19 ladsgroup@deploy1002: pppery and ladsgroup: Continuing with sync
20:17 ladsgroup@deploy1002: pppery and ladsgroup: Backport for [pawiki] Enable wgMinervaEnableSiteNotice (T366434) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:15 ladsgroup@deploy1002: Started scap: Backport for [pawiki] Enable wgMinervaEnableSiteNotice (T366434)
20:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P64038 and previous config saved to /var/cache/conftool/dbconfig/20240604-201047-marostegui.json
20:00 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
19:59 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001']
19:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P64037 and previous config saved to /var/cache/conftool/dbconfig/20240604-195539-marostegui.json
19:49 ecarg@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
19:49 ecarg@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
19:47 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
19:44 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T364299)', diff saved to https://phabricator.wikimedia.org/P64036 and previous config saved to /var/cache/conftool/dbconfig/20240604-194031-marostegui.json
19:38 mutante: https://gerrit-replica.wikimedia.org - short downtime for maintenance
19:38 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on gerrit-replica.wikimedia.org with reason: reboot T366555
19:38 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on gerrit-replica.wikimedia.org with reason: reboot T366555
19:37 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
19:37 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on gerrit2002.wikimedia.org with reason: reboot T366555
19:37 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
19:37 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on gerrit2002.wikimedia.org with reason: reboot T366555
19:36 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
19:33 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on contint2002.wikimedia.org with reason: reboot T366555
19:32 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on contint2002.wikimedia.org with reason: reboot T366555
19:16 mutante: releases.wikimedia.org - short downtime for maintenance
19:14 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on releases1003.eqiad.wmnet with reason: reboot T366555
19:13 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on releases1003.eqiad.wmnet with reason: reboot T366555
19:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
19:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
19:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T364069)', diff saved to https://phabricator.wikimedia.org/P64035 and previous config saved to /var/cache/conftool/dbconfig/20240604-190931-marostegui.json
19:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
19:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
19:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T364069)', diff saved to https://phabricator.wikimedia.org/P64034 and previous config saved to /var/cache/conftool/dbconfig/20240604-190906-marostegui.json
19:06 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
19:06 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
19:06 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
19:00 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@43b966f]: 0.3.142 (duration: 12m 53s)
18:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P64033 and previous config saved to /var/cache/conftool/dbconfig/20240604-185358-marostegui.json
18:48 ryankemper: [WDQS Deploy] Forgot to run the command to set git hash to tip of origin/master so deploy was a partial no-op. Re-rolling...
18:47 ryankemper@deploy1002: Started deploy [wdqs/wdqs@43b966f]: 0.3.142
18:46 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@143ca33]: 0.3.142 (duration: 02m 02s)
18:45 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.142` on canary `wdqs1016`; proceeding to rest of fleet
18:44 ryankemper@deploy1002: Started deploy [wdqs/wdqs@143ca33]: 0.3.142
18:41 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.142`. Pre-deploy tests passing on canary `wdqs1016`
18:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P64032 and previous config saved to /var/cache/conftool/dbconfig/20240604-183850-marostegui.json
18:35 mutante: aphlict - (phab realtime notifications) - reboots
18:30 mutante: doc.wikimedia.org - very short downtime for maintenance
18:28 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on doc1003.eqiad.wmnet with reason: reboot T366555
18:28 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on doc1003.eqiad.wmnet with reason: reboot T366555
18:28 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on doc.wikimedia.org with reason: reboot T366555
18:28 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on doc.wikimedia.org with reason: reboot T366555
18:26 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.8 refs T361402
18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T364069)', diff saved to https://phabricator.wikimedia.org/P64031 and previous config saved to /var/cache/conftool/dbconfig/20240604-182342-marostegui.json
18:15 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
18:04 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp7014*} and A:cp
17:54 sukhe@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp7014*} and A:cp
17:53 sukhe: sudo cumin 'A:cp-upload and A:magru' "sed -i '/\sup ethtool -A eno12399np0/d' /etc/network/interfaces"
17:51 sukhe: sudo cumin 'A:cp-text and A:magru' "sed -i '/\sup ethtool -A eno12399np0/d' /etc/network/interfaces"
17:49 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp7002*} and A:cp
17:39 sukhe@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp7002*} and A:cp
17:23 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
17:22 sukhe: sudo cumin 'A:cp and A:magru' 'run-puppet-agent'
17:15 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:15 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1001 to a new rack - kamila@cumin1002"
17:14 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1001 to a new rack - kamila@cumin1002"
17:11 kamila@cumin1002: START - Cookbook sre.dns.netbox
16:53 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp700[12].magru.wmnet,service=(cdn|ats-be)
16:52 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
16:51 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
16:41 elukey: delete other 2 pods in eventgate-main on wikikube-eqiad to test if envoy on them is in a weird state
16:36 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1010.eqiad.wmnet
16:31 elukey: delete 3 pods in eventgate-main on wikikube-eqiad to test if envoy on them is in a weird state
16:29 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1010.eqiad.wmnet
16:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64028 and previous config saved to /var/cache/conftool/dbconfig/20240604-162241-root.json
16:22 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp7002.magru.wmnet
16:15 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp7001.magru.wmnet
16:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2137 (T364299)', diff saved to https://phabricator.wikimedia.org/P64025 and previous config saved to /var/cache/conftool/dbconfig/20240604-161233-marostegui.json
16:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
16:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
16:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T364299)', diff saved to https://phabricator.wikimedia.org/P64024 and previous config saved to /var/cache/conftool/dbconfig/20240604-161210-marostegui.json
16:11 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp7002.magru.wmnet
16:10 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp7001.magru.wmnet
16:10 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1013.eqiad.wmnet,service=s1
16:10 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1013.eqiad.wmnet,service=s3
16:09 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1013.eqiad.wmnet
16:09 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1005.eqiad.wmnet
16:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2051.codfw.wmnet
16:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64023 and previous config saved to /var/cache/conftool/dbconfig/20240604-160735-root.json
16:05 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
16:05 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
16:04 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
16:04 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host phab2002.codfw.wmnet
16:04 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
16:02 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1005.eqiad.wmnet
16:00 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1052.eqiad.wmnet
16:00 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
15:59 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
15:58 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host phab2002.codfw.wmnet
15:57 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1004.eqiad.wmnet
15:57 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd1003.eqiad.wmnet
15:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P64022 and previous config saved to /var/cache/conftool/dbconfig/20240604-155701-marostegui.json
15:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Bumping db1194 weight', diff saved to https://phabricator.wikimedia.org/P64021 and previous config saved to /var/cache/conftool/dbconfig/20240604-155629-ladsgroup.json
15:55 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1013.eqiad.wmnet
15:53 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd1003.eqiad.wmnet
15:53 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1052.eqiad.wmnet
15:53 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1051.eqiad.wmnet
15:52 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1004.eqiad.wmnet
15:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64020 and previous config saved to /var/cache/conftool/dbconfig/20240604-155228-root.json
15:52 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1003.eqiad.wmnet
15:52 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1013.eqiad.wmnet,service=s3
15:51 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd1002.eqiad.wmnet
15:51 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1013.eqiad.wmnet,service=s1
15:48 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host miscweb2003.codfw.wmnet
15:47 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd1002.eqiad.wmnet
15:47 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1003.eqiad.wmnet
15:47 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
15:47 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd1001.eqiad.wmnet
15:46 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1051.eqiad.wmnet
15:44 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host miscweb2003.codfw.wmnet
15:43 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd1001.eqiad.wmnet
15:43 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
15:43 elukey@cumin1002: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM aux-k8s-etcd1001.eqiad.wmnet
15:42 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd1001.eqiad.wmnet
15:42 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
15:42 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2001.codfw.wmnet
15:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P64019 and previous config saved to /var/cache/conftool/dbconfig/20240604-154153-marostegui.json
15:40 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_magru
15:38 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts2001.codfw.wmnet
15:37 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-ctrl1002.eqiad.wmnet
15:37 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=kubernetes203(0|3|5).codfw.wmnet,cluster=kubernetes,service=kubesvc
15:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64018 and previous config saved to /var/cache/conftool/dbconfig/20240604-153722-root.json
15:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes[2030,2033,2035].codfw.wmnet
15:36 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1002.eqiad.wmnet
15:36 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for kubernetes[2030,2033,2035].codfw.wmnet
15:36 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
15:34 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
15:31 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1002.eqiad.wmnet
15:31 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-ctrl1002.eqiad.wmnet
15:29 tchin@deploy1002: Finished deploy [airflow-dags/analytics_test@a279784]: (no justification provided) (duration: 00m 10s)
15:29 tchin@deploy1002: Started deploy [airflow-dags/analytics_test@a279784]: (no justification provided)
15:29 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
15:28 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
15:28 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1001.eqiad.wmnet
15:27 tchin@deploy1002: Finished deploy [airflow-dags/analytics@a279784]: (no justification provided) (duration: 00m 27s)
15:27 dcausse@deploy1002: Finished deploy [airflow-dags/search@a279784]: search: bump to discolytics 0.24 and name n-triples dumps (duration: 00m 27s)
15:27 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
15:27 tchin@deploy1002: Started deploy [airflow-dags/analytics@a279784]: (no justification provided)
15:27 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
15:27 dcausse@deploy1002: Started deploy [airflow-dags/search@a279784]: search: bump to discolytics 0.24 and name n-triples dumps
15:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T364299)', diff saved to https://phabricator.wikimedia.org/P64017 and previous config saved to /var/cache/conftool/dbconfig/20240604-152644-marostegui.json
15:25 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-ctrl1001.eqiad.wmnet
15:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64015 and previous config saved to /var/cache/conftool/dbconfig/20240604-152216-root.json
15:22 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1001.eqiad.wmnet
15:21 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1001
15:21 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1001
15:19 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:19 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-ctrl1001.eqiad.wmnet
15:18 elukey@cumin1002: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM aux-k8s-ctrl1001.eqiad.wmnet
15:18 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-ctrl1001.eqiad.wmnet
15:18 kamila@cumin1002: START - Cookbook sre.dns.netbox
15:16 ejegg: fundraising civicrm upgraded from 44900b8c to 71ed6bed
15:15 kamila@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
15:15 kamila@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1001 to a new rack - kamila@cumin1002"
15:15 ejegg: payments-wiki upgraded from 0174d89c to c255fda8
15:13 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
15:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
15:12 dancy@deploy1002: Installation of scap version "4.85.0" completed for 294 hosts
15:11 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1001 to a new rack - kamila@cumin1002"
15:11 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
15:11 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_magru
15:11 dancy@deploy1002: Installing scap version "4.85.0" for 294 hosts
15:11 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
15:09 kamila@cumin1002: START - Cookbook sre.dns.netbox
15:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T352010)', diff saved to https://phabricator.wikimedia.org/P64014 and previous config saved to /var/cache/conftool/dbconfig/20240604-150835-ladsgroup.json
15:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
15:08 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1004.mgmt.eqiad.wmnet with reboot policy FORCED
15:08 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest1004.mgmt.eqiad.wmnet with reboot policy FORCED
15:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
15:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64013 and previous config saved to /var/cache/conftool/dbconfig/20240604-150710-root.json
15:06 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp3066*} and A:cp
15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
15:04 brennen@deploy1002: Finished deploy [phabricator/deployment@ef680d8]: deploy phab1004 for T366605 (duration: 00m 32s)
15:04 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
15:04 brennen@deploy1002: Started deploy [phabricator/deployment@ef680d8]: deploy phab1004 for T366605
15:03 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phorge Update
15:03 aokoth@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phorge Update
15:03 brennen@deploy1002: Finished deploy [phabricator/deployment@ef680d8]: deploy phab2002 for T366605 (duration: 00m 33s)
15:02 brennen@deploy1002: Started deploy [phabricator/deployment@ef680d8]: deploy phab2002 for T366605
15:02 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phorge Update
15:02 aokoth@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phorge Update
14:57 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1001
14:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
14:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
14:55 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1001
14:55 sukhe@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp3066*} and A:cp
14:53 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
14:52 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
14:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64012 and previous config saved to /var/cache/conftool/dbconfig/20240604-145203-root.json
14:49 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
14:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kubernetes[2030,2033,2035].codfw.wmnet with reason: Hardware issue
14:48 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp4045*} and A:cp
14:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kubernetes[2030,2033,2035].codfw.wmnet with reason: Hardware issue
14:48 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
14:46 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=kubernetes203(1|4).codfw.wmnet,cluster=kubernetes,service=kubesvc
14:43 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
14:43 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
14:38 sukhe@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp4045*} and A:cp
14:33 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs7003.magru.wmnet
14:27 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs7003.magru.wmnet
14:22 cgoubert@cumin1002: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-worker-codfw
14:14 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl1001.eqiad.wmnet
14:14 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:14 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
14:10 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
14:06 kamila@cumin1002: START - Cookbook sre.dns.netbox
14:02 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs7002.magru.wmnet
14:00 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl1001.eqiad.wmnet
13:59 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs7002.magru.wmnet
13:59 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl1001.eqiad.wmnet
13:46 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
13:42 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
13:42 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
13:37 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
13:35 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
13:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Bumping db1194 weight', diff saved to https://phabricator.wikimedia.org/P64009 and previous config saved to /var/cache/conftool/dbconfig/20240604-133250-ladsgroup.json
13:29 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
13:29 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
13:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
13:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
13:24 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
13:23 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2002.codfw.wmnet
13:22 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
13:21 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
13:20 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
13:20 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
13:19 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
13:18 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: sync on production
13:17 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2002.codfw.wmnet
13:17 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2001.codfw.wmnet
13:14 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs7001.magru.wmnet
13:12 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2001.codfw.wmnet
13:11 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_magru
13:11 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_magru
13:11 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs7001.magru.wmnet
13:10 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2003.codfw.wmnet
13:08 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2003.codfw.wmnet
13:08 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2002.codfw.wmnet
13:05 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2002.codfw.wmnet
13:05 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2001.codfw.wmnet
13:03 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2001.codfw.wmnet
13:02 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1002.eqiad.wmnet
13:00 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1002.eqiad.wmnet
12:59 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1001.eqiad.wmnet
12:57 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1001.eqiad.wmnet
12:56 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
12:53 brouberol@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
12:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org
12:53 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
12:52 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
12:48 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
12:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org
12:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
12:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P64008 and previous config saved to /var/cache/conftool/dbconfig/20240604-124432-ladsgroup.json
12:43 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
12:39 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1003.wikimedia.org
12:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1003.wikimedia.org
12:32 brouberol@cumin2002: START - Cookbook sre.wdqs.restart
12:32 brouberol@cumin2002: END (ERROR) - Cookbook sre.wdqs.restart (exit_code=97)
12:32 brouberol@cumin2002: START - Cookbook sre.wdqs.restart
12:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-druid1001.eqiad.wmnet
12:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P64007 and previous config saved to /var/cache/conftool/dbconfig/20240604-122924-ladsgroup.json
12:29 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1002.eqiad.wmnet
12:28 taavi@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database dtpwiki (T365229)
12:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-test-druid1001.eqiad.wmnet
12:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64006 and previous config saved to /var/cache/conftool/dbconfig/20240604-122602-root.json
12:22 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog1002.eqiad.wmnet
12:17 klausman@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:ml-cache-eqiad
12:15 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
12:15 btullis@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
12:14 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
12:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P64005 and previous config saved to /var/cache/conftool/dbconfig/20240604-121415-ladsgroup.json
12:14 btullis@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
12:12 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
12:12 btullis@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
12:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64004 and previous config saved to /var/cache/conftool/dbconfig/20240604-121056-root.json
12:08 klausman@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:ml-cache-codfw
12:02 taavi@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database dtpwiki (T365229)
11:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P64003 and previous config saved to /var/cache/conftool/dbconfig/20240604-115907-ladsgroup.json
11:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64002 and previous config saved to /var/cache/conftool/dbconfig/20240604-115549-root.json
11:54 hnowlan: depooling 3 api appservers and 2 appservers in advance of reimaging
11:50 klausman@cumin2002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:ml-cache-eqiad
11:44 klausman@cumin2002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:ml-cache-codfw
11:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2136 (T364299)', diff saved to https://phabricator.wikimedia.org/P64001 and previous config saved to /var/cache/conftool/dbconfig/20240604-114157-marostegui.json
11:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2136.codfw.wmnet with reason: Maintenance
11:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2136.codfw.wmnet with reason: Maintenance
11:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64000 and previous config saved to /var/cache/conftool/dbconfig/20240604-114043-root.json
11:39 cgoubert@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-codfw
11:39 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling reboot on A:thanos-fe
11:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
11:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
11:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
11:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P63999 and previous config saved to /var/cache/conftool/dbconfig/20240604-112537-root.json
11:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
11:15 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2003-dev.codfw.wmnet
11:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P63998 and previous config saved to /var/cache/conftool/dbconfig/20240604-111031-root.json
11:06 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2003-dev.codfw.wmnet
11:06 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2002-dev.codfw.wmnet
11:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:04 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
11:00 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
10:59 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
10:59 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1002.eqiad.wmnet
10:57 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2002-dev.codfw.wmnet
10:57 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2001-dev.codfw.wmnet
10:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P63996 and previous config saved to /var/cache/conftool/dbconfig/20240604-105525-root.json
10:54 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1002.eqiad.wmnet
10:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw1358.eqiad.wmnet with reason: Waiting on iDrac update
10:53 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw1358.eqiad.wmnet with reason: Waiting on iDrac update
10:50 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb1002.eqiad.wmnet
10:50 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1001.eqiad.wmnet
10:49 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2001-dev.codfw.wmnet
10:48 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling reboot on A:thanos-fe
10:46 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling reboot on P{ms-fe2*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
10:45 marostegui: dbmaint codfw s1 deploy schema change on db2203 T364299
10:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2203.codfw.wmnet with reason: Long schema change
10:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on db2203.codfw.wmnet with reason: Long schema change
10:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2141.codfw.wmnet with reason: Long schema change
10:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on db2141.codfw.wmnet with reason: Long schema change
10:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2203 T366552', diff saved to https://phabricator.wikimedia.org/P63995 and previous config saved to /var/cache/conftool/dbconfig/20240604-104337-root.json
10:42 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2212 to s1 primary T366552', diff saved to https://phabricator.wikimedia.org/P63994 and previous config saved to /var/cache/conftool/dbconfig/20240604-104241-root.json
10:42 marostegui: Starting s1 codfw failover from db2203 to db2212 - T366552
10:42 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb1001.eqiad.wmnet
10:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2001.codfw.wmnet
10:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dumps::generation::worker::dumper
10:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS bookworm
10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2001.codfw.wmnet
10:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
10:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
10:28 hashar@deploy1002: Finished deploy [releng/jenkins-deploy@5d3a06d] (releasing): (no justification provided) (duration: 01m 12s)
10:27 hashar: Upgrading releases Jenkins instances # T366008
10:27 hashar@deploy1002: Started deploy [releng/jenkins-deploy@5d3a06d] (releasing): (no justification provided)
10:24 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dumps::generation::worker::dumper
10:23 claime: Migrating votewiki to mw-on-k8s - T362323
10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
10:20 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog1002.eqiad.wmnet
10:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
10:16 hashar: Upgrading CI Jenkins # T366008
10:15 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1002.eqiad.wmnet
10:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage
10:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage
10:10 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb2002-dev.codfw.wmnet
10:09 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling reboot on P{ms-fe2*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
10:08 marostegui: dbmaint eqiad s1 deploy schema change on db1184 T364299
10:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dumps::generation::worker::dumper_monitor
10:07 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb1002.eqiad.wmnet
10:06 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1001.eqiad.wmnet
10:04 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb2002-dev.codfw.wmnet
10:04 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling reboot on P{ms-fe1*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
10:00 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2212 with weight 0 T366552', diff saved to https://phabricator.wikimedia.org/P63993 and previous config saved to /var/cache/conftool/dbconfig/20240604-100024-root.json
10:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s1 T366552
09:59 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s1 T366552
09:58 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS bookworm
09:58 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb1001.eqiad.wmnet
09:57 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2003-dev.codfw.wmnet
09:56 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin1001.eqiad.wmnet
09:54 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2003-dev.codfw.wmnet
09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
09:53 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet
09:48 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
09:48 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2003-dev.codfw.wmnet
09:48 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw2003-dev.codfw.wmnet
09:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
09:45 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin2001.codfw.wmnet
09:45 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2004.codfw.wmnet
09:45 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2002-dev.codfw.wmnet
09:44 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2002-dev.codfw.wmnet
09:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3003.wikimedia.org
09:42 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
09:41 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcumin2001.codfw.wmnet
09:40 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2008-dev.codfw.wmnet
09:40 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog2002.codfw.wmnet
09:39 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dumps::generation::worker::dumper_monitor
09:38 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw2002-dev.codfw.wmnet
09:37 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host graphite2004.codfw.wmnet
09:37 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb1004.wikimedia.org
09:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3003.wikimedia.org
09:36 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2002-dev.codfw.wmnet
09:36 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2001-dev.codfw.wmnet
09:34 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2008-dev.codfw.wmnet
09:34 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2007-dev.codfw.wmnet
09:34 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host mwlog2002.codfw.wmnet
09:33 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog1002.eqiad.wmnet
09:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install4002.wikimedia.org
09:30 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudweb1004.wikimedia.org
09:29 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2003.wikimedia.org
09:29 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb1003.wikimedia.org
09:27 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2007-dev.codfw.wmnet
09:27 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host mwlog1002.eqiad.wmnet
09:27 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2001-dev.codfw.wmnet
09:27 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling reboot on P{ms-fe1*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
09:27 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testhost2001.codfw.wmnet
09:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install4002.wikimedia.org
09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install5002.wikimedia.org
09:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe2001.codfw.wmnet
09:22 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudweb1003.wikimedia.org
09:22 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host gitlab2003.wikimedia.org
09:21 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb2002-dev.wikimedia.org
09:21 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1003.wikimedia.org
09:21 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host testhost2001.codfw.wmnet
09:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install5002.wikimedia.org
09:17 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-fe2001.codfw.wmnet
09:15 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe1001.eqiad.wmnet
09:15 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host gitlab1003.wikimedia.org
09:15 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudweb2002-dev.wikimedia.org
09:14 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org
09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6002.wikimedia.org
09:09 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-fe1001.eqiad.wmnet
09:08 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org
09:08 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
09:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6002.wikimedia.org
09:01 moritzm: imported python3-xapian-haystack 2.1.1-1+deb12u1 to bookworm-wikimedia (already lined up for the next Bookworm point release to address https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1066136 and needed for the update of the Mailman servers T331706
08:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7001.wikimedia.org
08:54 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
08:52 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
08:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T364069)', diff saved to https://phabricator.wikimedia.org/P63992 and previous config saved to /var/cache/conftool/dbconfig/20240604-085205-marostegui.json
08:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance
08:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7001.wikimedia.org
08:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance
08:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364069)', diff saved to https://phabricator.wikimedia.org/P63991 and previous config saved to /var/cache/conftool/dbconfig/20240604-085141-marostegui.json
08:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1003.wikimedia.org
08:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1003.wikimedia.org
08:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1156', diff saved to https://phabricator.wikimedia.org/P63990 and previous config saved to /var/cache/conftool/dbconfig/20240604-084428-root.json
08:40 kostajh: UTC morning deploys done
08:38 kharlan@deploy1002: Finished scap: Backport for IPReputationHooks: Bump schema version (T354597) (duration: 15m 45s)
08:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P63989 and previous config saved to /var/cache/conftool/dbconfig/20240604-083633-marostegui.json
08:19 kharlan@deploy1002: Finished scap: Backport for IPReputationHooks: Bump schema version (T354597) (duration: 14m 08s)
08:10 kharlan@deploy1002: kharlan: Continuing with sync
08:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P63986 and previous config saved to /var/cache/conftool/dbconfig/20240604-080846-marostegui.json
08:08 kharlan@deploy1002: kharlan: Backport for IPReputationHooks: Bump schema version (T354597) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364069)', diff saved to https://phabricator.wikimedia.org/P63985 and previous config saved to /var/cache/conftool/dbconfig/20240604-080617-marostegui.json
08:06 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf2002.codfw.wmnet with reason: host reimage
08:05 kharlan@deploy1002: Started scap: Backport for IPReputationHooks: Bump schema version (T354597)
08:02 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
08:01 jiji@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf2002.codfw.wmnet with reason: host reimage
07:57 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
07:56 hashar: Restarting Gerrit for Java 17 upgrade # T364342
07:56 hashar@deploy1002: Finished deploy [gerrit/gerrit@6ba3f2e]: gerrit1003: switch to Java 17 version of plugins after having switched Java to 17- T364342 (duration: 00m 03s)
07:56 hashar@deploy1002: Started deploy [gerrit/gerrit@6ba3f2e]: gerrit1003: switch to Java 17 version of plugins after having switched Java to 17- T364342
07:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P63984 and previous config saved to /var/cache/conftool/dbconfig/20240604-075338-marostegui.json
07:47 hashar@deploy1002: Finished deploy [gerrit/gerrit@6ba3f2e]: gerrit2002: switch to Java 17 version of plugins after having switched Java to 17- T364342 (duration: 00m 05s)
07:46 hashar@deploy1002: Started deploy [gerrit/gerrit@6ba3f2e]: gerrit2002: switch to Java 17 version of plugins after having switched Java to 17- T364342
07:42 jiji@cumin2002: START - Cookbook sre.hosts.reimage for host mc-wf2002.codfw.wmnet with OS bookworm
07:42 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc-wf1002.eqiad.wmnet with OS bookworm
07:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T364299)', diff saved to https://phabricator.wikimedia.org/P63983 and previous config saved to /var/cache/conftool/dbconfig/20240604-073830-marostegui.json
07:27 marostegui: dbmaint eqiad s1 deploy schema change on db1184 T356166
07:15 moritzm: installing intel-microcode updates on bullseye
07:10 marostegui: dbmaint eqiad s1 deploy schema change on db1184 T355609
07:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
07:06 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
07:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1184.eqiad.wmnet with OS bookworm
06:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1184.eqiad.wmnet with reason: host reimage
06:40 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1184.eqiad.wmnet with reason: host reimage
06:26 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db1184.eqiad.wmnet with OS bookworm
06:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db1184.eqiad.wmnet with reason: reimage
06:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db1184.eqiad.wmnet with reason: reimage
06:14 marostegui: Rename table flaggedpage_pending on db1185 (s5 eqiad dbmaint) - T365568
06:09 arnaudb@cumin1002: dbctl commit (dc=all): ' fix api db1163 vs db1184 T366259', diff saved to https://phabricator.wikimedia.org/P63982 and previous config saved to /var/cache/conftool/dbconfig/20240604-060925-arnaudb.json
06:07 arnaudb@cumin1002: dbctl commit (dc=all): 'API db1163 T366259', diff saved to https://phabricator.wikimedia.org/P63981 and previous config saved to /var/cache/conftool/dbconfig/20240604-060747-arnaudb.json
06:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1184 T366259', diff saved to https://phabricator.wikimedia.org/P63980 and previous config saved to /var/cache/conftool/dbconfig/20240604-060703-arnaudb.json
06:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db1163 to s1 primary and set section read-write T366259', diff saved to https://phabricator.wikimedia.org/P63979 and previous config saved to /var/cache/conftool/dbconfig/20240604-060324-arnaudb.json
06:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Set s1 eqiad as read-only for maintenance - T366259', diff saved to https://phabricator.wikimedia.org/P63978 and previous config saved to /var/cache/conftool/dbconfig/20240604-060208-arnaudb.json
06:01 arnaudb: Starting s1 eqiad failover from db1184 to db1163 - T366259
05:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db1163 with weight 0 T366259', diff saved to https://phabricator.wikimedia.org/P63977 and previous config saved to /var/cache/conftool/dbconfig/20240604-052803-arnaudb.json
05:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s1 T366259
05:27 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s1 T366259
04:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P63976 and previous config saved to /var/cache/conftool/dbconfig/20240604-042011-ladsgroup.json
04:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
04:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.5 (duration: 00m 57s)
03:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T364299)', diff saved to https://phabricator.wikimedia.org/P63975 and previous config saved to /var/cache/conftool/dbconfig/20240604-035703-marostegui.json
03:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2216.codfw.wmnet with reason: Maintenance
03:56 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.8 refs T361402 (duration: 53m 47s)
03:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2216.codfw.wmnet with reason: Maintenance
03:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T364299)', diff saved to https://phabricator.wikimedia.org/P63974 and previous config saved to /var/cache/conftool/dbconfig/20240604-035640-marostegui.json
03:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P63973 and previous config saved to /var/cache/conftool/dbconfig/20240604-034132-marostegui.json
03:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P63972 and previous config saved to /var/cache/conftool/dbconfig/20240604-032625-marostegui.json
03:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T364299)', diff saved to https://phabricator.wikimedia.org/P63971 and previous config saved to /var/cache/conftool/dbconfig/20240604-031117-marostegui.json
03:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T364299)', diff saved to https://phabricator.wikimedia.org/P63970 and previous config saved to /var/cache/conftool/dbconfig/20240604-030906-marostegui.json
03:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2212.codfw.wmnet with reason: Maintenance
03:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2212.codfw.wmnet with reason: Maintenance
03:03 mwpresync@deploy1002: Started scap: testwikis wikis to 1.43.0-wmf.8 refs T361402
00:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
00:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
00:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T352010)', diff saved to https://phabricator.wikimedia.org/P63969 and previous config saved to /var/cache/conftool/dbconfig/20240604-002119-ladsgroup.json
00:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P63968 and previous config saved to /var/cache/conftool/dbconfig/20240604-000612-ladsgroup.json

2024-06-03

23:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P63967 and previous config saved to /var/cache/conftool/dbconfig/20240603-235104-ladsgroup.json
23:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T352010)', diff saved to https://phabricator.wikimedia.org/P63966 and previous config saved to /var/cache/conftool/dbconfig/20240603-233555-ladsgroup.json
23:14 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki mediawikiwiki "Extension:DynamicPageList (Wikimedia)" "Extension:DynamicPageList" "Zabe" --reason "per request T366488"
23:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2202.codfw.wmnet with reason: Maintenance
23:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2202.codfw.wmnet with reason: Maintenance
23:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T364299)', diff saved to https://phabricator.wikimedia.org/P63965 and previous config saved to /var/cache/conftool/dbconfig/20240603-231424-marostegui.json
22:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P63963 and previous config saved to /var/cache/conftool/dbconfig/20240603-225916-marostegui.json
22:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P63962 and previous config saved to /var/cache/conftool/dbconfig/20240603-224408-marostegui.json
22:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T364299)', diff saved to https://phabricator.wikimedia.org/P63961 and previous config saved to /var/cache/conftool/dbconfig/20240603-222900-marostegui.json
22:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T364069)', diff saved to https://phabricator.wikimedia.org/P63960 and previous config saved to /var/cache/conftool/dbconfig/20240603-222607-marostegui.json
22:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
22:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
22:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
22:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
22:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T364069)', diff saved to https://phabricator.wikimedia.org/P63959 and previous config saved to /var/cache/conftool/dbconfig/20240603-222524-marostegui.json
22:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P63958 and previous config saved to /var/cache/conftool/dbconfig/20240603-221016-marostegui.json
21:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P63957 and previous config saved to /var/cache/conftool/dbconfig/20240603-215508-marostegui.json
21:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T364069)', diff saved to https://phabricator.wikimedia.org/P63956 and previous config saved to /var/cache/conftool/dbconfig/20240603-214000-marostegui.json
21:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
21:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
21:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P63955 and previous config saved to /var/cache/conftool/dbconfig/20240603-212040-ladsgroup.json
21:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P63954 and previous config saved to /var/cache/conftool/dbconfig/20240603-211312-root.json
21:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P63953 and previous config saved to /var/cache/conftool/dbconfig/20240603-210532-ladsgroup.json
20:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P63952 and previous config saved to /var/cache/conftool/dbconfig/20240603-205806-root.json
20:51 urbanecm@deploy1002: Finished scap: Backport for Wrap tables in Vector 2022 for projects where legacy Vector is default (T366314), Enable night theme on pages which have no color contrast issues (T366370) (duration: 14m 57s)
20:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P63951 and previous config saved to /var/cache/conftool/dbconfig/20240603-205024-ladsgroup.json
20:43 urbanecm@deploy1002: jdlrobson and urbanecm: Continuing with sync
20:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P63950 and previous config saved to /var/cache/conftool/dbconfig/20240603-204300-root.json
20:39 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Wrap tables in Vector 2022 for projects where legacy Vector is default (T366314), Enable night theme on pages which have no color contrast issues (T366370) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:36 urbanecm@deploy1002: Started scap: Backport for Wrap tables in Vector 2022 for projects where legacy Vector is default (T366314), Enable night theme on pages which have no color contrast issues (T366370)
20:36 urbanecm@deploy1002: Finished scap: Backport for EventLogging: Enable IP reputation logging (T354597), [trwiki] Allow translator group to publish translation only in Extension:ContentTranslation, [trwiki] Reducing count edits ip and newbie per minute (T330811) (duration: 30m 14s)
20:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P63949 and previous config saved to /var/cache/conftool/dbconfig/20240603-203514-ladsgroup.json
20:27 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P63948 and previous config saved to /var/cache/conftool/dbconfig/20240603-202754-root.json
20:27 urbanecm@deploy1002: kharlan and urbanecm and gergesshamon: Continuing with sync
20:12 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P63947 and previous config saved to /var/cache/conftool/dbconfig/20240603-201248-root.json
20:10 urbanecm@deploy1002: kharlan and urbanecm and gergesshamon: Backport for EventLogging: Enable IP reputation logging (T354597), [trwiki] Allow translator group to publish translation only in Extension:ContentTranslation, [trwiki] Reducing count edits ip and newbie per minute (T330811) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:06 urbanecm@deploy1002: Started scap: Backport for EventLogging: Enable IP reputation logging (T354597), [trwiki] Allow translator group to publish translation only in Extension:ContentTranslation, [trwiki] Reducing count edits ip and newbie per minute (T330811)
19:57 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P63946 and previous config saved to /var/cache/conftool/dbconfig/20240603-195742-root.json
19:42 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P63945 and previous config saved to /var/cache/conftool/dbconfig/20240603-194236-root.json
18:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T364299)', diff saved to https://phabricator.wikimedia.org/P63944 and previous config saved to /var/cache/conftool/dbconfig/20240603-183029-marostegui.json
18:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2188.codfw.wmnet with reason: Maintenance
18:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2188.codfw.wmnet with reason: Maintenance
18:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T364299)', diff saved to https://phabricator.wikimedia.org/P63943 and previous config saved to /var/cache/conftool/dbconfig/20240603-183006-marostegui.json
18:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P63942 and previous config saved to /var/cache/conftool/dbconfig/20240603-181459-marostegui.json
17:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P63941 and previous config saved to /var/cache/conftool/dbconfig/20240603-175951-marostegui.json
17:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T364299)', diff saved to https://phabricator.wikimedia.org/P63940 and previous config saved to /var/cache/conftool/dbconfig/20240603-174442-marostegui.json
17:27 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1002.eqiad.wmnet|wikikube-worker1003.eqiad.wmnet|wikikube-worker1007.eqiad.wmnet|wikikube-worker1004.eqiad.wmnet),cluster=kubernetes,service=kubesvc
17:27 claime: Pooling and uncordoning wikikube-worker1002.eqiad.wmnet,wikikube-worker1003.eqiad.wmnet,wikikube-worker1007.eqiad.wmnet,wikikube-worker1004.eqiad.wmnet - T351074
17:19 claime: homer 'cr*eqiad*' commit 'T351074'
17:18 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
17:17 claime: homer 'lsw1-e2-eqiad*' commit 'T351074'
17:17 claime: homer 'lsw1-e2-eqiad*' commit 'T35107
17:17 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
17:17 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
17:16 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
17:15 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
17:14 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
16:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1007.eqiad.wmnet with OS bullseye
16:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1007.eqiad.wmnet with reason: host reimage
16:33 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1007.eqiad.wmnet with reason: host reimage
16:20 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1007.eqiad.wmnet with OS bullseye
16:18 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker1007.eqiad.wmnet with OS bullseye
16:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1003.eqiad.wmnet with OS bullseye
15:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1004.eqiad.wmnet with OS bullseye
15:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1002.eqiad.wmnet with OS bullseye
15:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2212', diff saved to https://phabricator.wikimedia.org/P63939 and previous config saved to /var/cache/conftool/dbconfig/20240603-155048-root.json
15:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1003.eqiad.wmnet with reason: host reimage
15:43 hashar@deploy1002: Finished deploy [gerrit/gerrit@c93e47d]: Revert "Rebuild plugins for Java 17" to stick to Java 11 based compiled plugins - T364342 (duration: 00m 05s)
15:43 hashar@deploy1002: Started deploy [gerrit/gerrit@c93e47d]: Revert "Rebuild plugins for Java 17" to stick to Java 11 based compiled plugins - T364342
15:42 jhathaway: deploying more restrictive SPF & DMARC settings for wikipedia.org
15:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1004.eqiad.wmnet with reason: host reimage
15:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1002.eqiad.wmnet with reason: host reimage
15:36 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1004.eqiad.wmnet with reason: host reimage
15:36 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-c2-codfw.mgmt.codfw.wmnet
15:35 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1003.eqiad.wmnet with reason: host reimage
15:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1002.eqiad.wmnet with reason: host reimage
15:30 dancy@deploy1002: sync-world aborted: testing (duration: 00m 00s)
15:30 dancy@deploy1002: Started scap: testing
15:27 dancy@mwmaint1002: scap failed: FileNotFoundError [Errno 2] No such file or directory: '/etc/helmfile-defaults/mediawiki-deployments.yaml' (duration: 00m 00s)
15:23 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1007.eqiad.wmnet with OS bullseye
15:23 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1004.eqiad.wmnet with OS bullseye
15:22 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1003.eqiad.wmnet with OS bullseye
15:21 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1002.eqiad.wmnet with OS bullseye
15:04 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:04 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-c2-codfw - pt1979@cumin2002"
15:03 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-c2-codfw - pt1979@cumin2002"
15:03 dancy@deploy1002: Installing scap version "4.84.0" for 297 hosts
15:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1490 to wikikube-worker1007
15:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1007
15:00 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:00 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-c2-codfw.mgmt.codfw.wmnet
15:00 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1007
15:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1490 to wikikube-worker1007 - cgoubert@cumin1002"
14:57 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1490 to wikikube-worker1007 - cgoubert@cumin1002"
14:57 hashar@deploy1002: Finished deploy [gerrit/gerrit@6ba3f2e]: Rebuild plugins for Java 17 - T364342 (duration: 00m 05s)
14:57 hashar@deploy1002: Started deploy [gerrit/gerrit@6ba3f2e]: Rebuild plugins for Java 17 - T364342
14:55 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:55 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1490 to wikikube-worker1007
14:54 hashar@deploy1002: Finished deploy [gerrit/gerrit@6ba3f2e]: Rebuild plugins for Java 17 - T364342 (duration: 00m 08s)
14:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1443 to wikikube-worker1004
14:54 hashar@deploy1002: Started deploy [gerrit/gerrit@6ba3f2e]: Rebuild plugins for Java 17 - T364342
14:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1004
14:53 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1004
14:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1443 to wikikube-worker1004 - cgoubert@cumin1002"
14:53 hashar@deploy1002: Finished deploy [gerrit/gerrit@c93e47d]: Rebuild plugins for Java 17 - T364342 (duration: 00m 05s)
14:53 hashar@deploy1002: Started deploy [gerrit/gerrit@c93e47d]: Rebuild plugins for Java 17 - T364342
14:52 Dreamy_Jazz: Afternoon UTC backport window done
14:52 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1443 to wikikube-worker1004 - cgoubert@cumin1002"
14:51 dreamyjazz@deploy1002: Finished scap: Backport for Ensure excluded SHA-1s have numeric keys for scanFilesInScanTable.php (T366473) (duration: 12m 04s)
14:45 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:45 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1443 to wikikube-worker1004
14:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1427 to wikikube-worker1003
14:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1003
14:43 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
14:42 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1003
14:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1427 to wikikube-worker1003 - cgoubert@cumin1002"
14:41 dreamyjazz@deploy1002: dreamyjazz: Backport for Ensure excluded SHA-1s have numeric keys for scanFilesInScanTable.php (T366473) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:41 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1427 to wikikube-worker1003 - cgoubert@cumin1002"
14:39 dreamyjazz@deploy1002: Started scap: Backport for Ensure excluded SHA-1s have numeric keys for scanFilesInScanTable.php (T366473)
14:39 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:38 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1427 to wikikube-worker1003
14:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1426 to wikikube-worker1002
14:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1002
14:37 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1002
14:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1426 to wikikube-worker1002 - cgoubert@cumin1002"
14:35 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1426 to wikikube-worker1002 - cgoubert@cumin1002"
14:34 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
14:33 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
14:33 vgutierrez: repool text@ulsfo with IPIP encapsulation enabled - T366466
14:31 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host snapshot1012.eqiad.wmnet with OS bullseye
14:31 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
14:31 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
14:30 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
14:30 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
14:30 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf2001.codfw.wmnet with OS bookworm
14:29 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:29 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1426 to wikikube-worker1002
14:28 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host snapshot1010.eqiad.wmnet with OS bullseye
14:25 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1001.eqiad.wmnet with OS bookworm
14:24 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from mw1358 to wikikube-worker1001
14:24 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1358 to wikikube-worker1001
14:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:12 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf2001.codfw.wmnet with reason: host reimage
14:09 jiji@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf2001.codfw.wmnet with reason: host reimage
14:08 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
14:05 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1012.eqiad.wmnet with reason: host reimage
14:05 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
14:02 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1010.eqiad.wmnet with reason: host reimage
14:01 tgr@deploy1002: Finished scap: Backport for [trwiki] Create translator group (T356440) (duration: 23m 15s)
13:59 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1012.eqiad.wmnet with reason: host reimage
13:59 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1010.eqiad.wmnet with reason: host reimage
13:58 vgutierrez: rolling restart of pybal on lvs4010 and lvs4008 - T366466
13:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P63937 and previous config saved to /var/cache/conftool/dbconfig/20240603-135634-ladsgroup.json
13:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
13:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
13:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T352010)', diff saved to https://phabricator.wikimedia.org/P63936 and previous config saved to /var/cache/conftool/dbconfig/20240603-135612-ladsgroup.json
13:54 vgutierrez: re-enable puppet on "A:cp-text_ulsfo" - T366466
13:50 jiji@cumin2002: START - Cookbook sre.hosts.reimage for host mc-wf2001.codfw.wmnet with OS bookworm
13:50 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc-wf1001.eqiad.wmnet with OS bookworm
13:49 vgutierrez: re-enable puppet on "A:cp-text and not A:cp-text_ulsfo" - T366466
13:46 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host snapshot1012.eqiad.wmnet with OS bullseye
13:46 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host snapshot1010.eqiad.wmnet with OS bullseye
13:44 tgr@deploy1002: gergesshamon and tgr: Continuing with sync
13:41 vgutierrez: disable puppet on A:cp-text before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1038294/ - T366466
13:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P63935 and previous config saved to /var/cache/conftool/dbconfig/20240603-134104-ladsgroup.json
13:40 tgr@deploy1002: gergesshamon and tgr: Backport for [trwiki] Create translator group (T356440) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:38 tgr@deploy1002: Started scap: Backport for [trwiki] Create translator group (T356440)
13:36 vgutierrez: depool text@ulsfo before enabling IPIP encapsulation - T366466
13:32 tgr@deploy1002: Finished scap: Backport for [Beta] cswiki: enable CommunityConfiguration for GrowthExperiments (T364892), [multiversion] Add 'manage-dblist init-labs' subcommand, [arwiki] add ipblock-exempt to bot group (T366404) (duration: 19m 07s)
13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P63934 and previous config saved to /var/cache/conftool/dbconfig/20240603-132556-ladsgroup.json
13:23 tgr@deploy1002: sgimeno and gergesshamon and tgr: Continuing with sync
13:20 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1001.eqiad.wmnet with OS bookworm
13:16 tgr@deploy1002: sgimeno and gergesshamon and tgr: Backport for [Beta] cswiki: enable CommunityConfiguration for GrowthExperiments (T364892), [multiversion] Add 'manage-dblist init-labs' subcommand, [arwiki] add ipblock-exempt to bot group (T366404) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:13 tgr@deploy1002: Started scap: Backport for [Beta] cswiki: enable CommunityConfiguration for GrowthExperiments (T364892), [multiversion] Add 'manage-dblist init-labs' subcommand, [arwiki] add ipblock-exempt to bot group (T366404)
13:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T352010)', diff saved to https://phabricator.wikimedia.org/P63933 and previous config saved to /var/cache/conftool/dbconfig/20240603-131048-ladsgroup.json
13:08 moritzm: uploaded intel-microcode 3.20240312.1~deb11u1 to apt.wikimedia.org (import from bullseye-proposed-updates, to be coupled with forthcoming reboots)
13:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
13:03 Emperor: depool moss-fe2001 with a view to returning it to apus T279621
13:02 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1001.eqiad.wmnet with reason: host reimage
13:02 Emperor: depool moss-fe1001 with a view to returning it to apus T279621
13:00 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1001.eqiad.wmnet with reason: host reimage
12:55 Emperor: depool/restart swift-proxy/repool ms-fe10{09,11,12,14} due to rising connection failures T360913
12:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
12:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T364299)', diff saved to https://phabricator.wikimedia.org/P63932 and previous config saved to /var/cache/conftool/dbconfig/20240603-124628-marostegui.json
12:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
12:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
12:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T364299)', diff saved to https://phabricator.wikimedia.org/P63931 and previous config saved to /var/cache/conftool/dbconfig/20240603-124605-marostegui.json
12:45 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1001.eqiad.wmnet with OS bookworm
12:41 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1002.eqiad.wmnet with OS bookworm
12:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
12:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P63930 and previous config saved to /var/cache/conftool/dbconfig/20240603-123057-marostegui.json
12:24 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1002.eqiad.wmnet with reason: host reimage
12:20 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1002.eqiad.wmnet with reason: host reimage
12:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P63929 and previous config saved to /var/cache/conftool/dbconfig/20240603-121549-marostegui.json
12:06 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1002.eqiad.wmnet with OS bookworm
12:03 ladsgroup@deploy1002: Finished scap: Backport for Enable numeric sorting for Persian (T329440) (duration: 12m 07s)
12:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T364299)', diff saved to https://phabricator.wikimedia.org/P63928 and previous config saved to /var/cache/conftool/dbconfig/20240603-120041-marostegui.json
11:54 ladsgroup@deploy1002: ebrahim and ladsgroup: Continuing with sync
11:53 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on backup2011.codfw.wmnet with reason: remount filesystem
11:53 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on backup2011.codfw.wmnet with reason: remount filesystem
11:53 ladsgroup@deploy1002: ebrahim and ladsgroup: Backport for Enable numeric sorting for Persian (T329440) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:51 ladsgroup@deploy1002: Started scap: Backport for Enable numeric sorting for Persian (T329440)
11:35 effie: restart memcached on mc1050 and mc2050
11:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T352010)', diff saved to https://phabricator.wikimedia.org/P63927 and previous config saved to /var/cache/conftool/dbconfig/20240603-113447-ladsgroup.json
11:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
11:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
11:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
11:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
11:27 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup2011.codfw.wmnet with reason: remount filesystem
11:26 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup2011.codfw.wmnet with reason: remount filesystem
11:24 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1037.eqiad.wmnet with OS bookworm
11:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host snapshot1013.eqiad.wmnet
11:07 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1037.eqiad.wmnet with reason: host reimage
11:04 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1037.eqiad.wmnet with reason: host reimage
10:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T364069)', diff saved to https://phabricator.wikimedia.org/P63926 and previous config saved to /var/cache/conftool/dbconfig/20240603-105416-marostegui.json
10:54 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host snapshot1013.eqiad.wmnet
10:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
10:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
10:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T364069)', diff saved to https://phabricator.wikimedia.org/P63925 and previous config saved to /var/cache/conftool/dbconfig/20240603-105352-marostegui.json
10:50 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc1037.eqiad.wmnet with OS bookworm
10:41 moritzm: installing linux 5.10.218 security updates
10:40 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1038.eqiad.wmnet with OS bookworm
10:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P63924 and previous config saved to /var/cache/conftool/dbconfig/20240603-103844-marostegui.json
10:29 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host snapshot1013.eqiad.wmnet with OS bullseye
10:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P63923 and previous config saved to /var/cache/conftool/dbconfig/20240603-102335-marostegui.json
10:21 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
10:18 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
10:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T364069)', diff saved to https://phabricator.wikimedia.org/P63922 and previous config saved to /var/cache/conftool/dbconfig/20240603-100827-marostegui.json
10:03 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc1038.eqiad.wmnet with OS bookworm
10:02 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1013.eqiad.wmnet with reason: host reimage
09:58 ladsgroup@deploy1002: Finished scap: Backport for Stop writing to the old pagelinks columns in s8 (T352010) (duration: 18m 39s)
09:57 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
09:56 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1013.eqiad.wmnet with reason: host reimage
09:49 jiji@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host mc-gp2001.codfw.wmnet with OS bookworm
09:45 ladsgroup@deploy1002: ladsgroup: Continuing with sync
09:43 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host snapshot1013.eqiad.wmnet with OS bullseye
09:42 ladsgroup@deploy1002: ladsgroup: Backport for Stop writing to the old pagelinks columns in s8 (T352010) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:41 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1039.eqiad.wmnet with OS bookworm
09:40 ladsgroup@deploy1002: Started scap: Backport for Stop writing to the old pagelinks columns in s8 (T352010)
09:31 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2001.codfw.wmnet with reason: host reimage
09:29 jiji@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2001.codfw.wmnet with reason: host reimage
09:25 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1039.eqiad.wmnet with reason: host reimage
09:22 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1039.eqiad.wmnet with reason: host reimage
09:10 jiji@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2001.codfw.wmnet with OS bookworm
09:10 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc1039.eqiad.wmnet with OS bookworm
09:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:08 jiji@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc1039.eqiad.wmnet']
08:49 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2002.codfw.wmnet with OS bookworm
08:45 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1003.eqiad.wmnet with OS bookworm
08:15 hashar@deploy1002: Finished deploy [gerrit/gerrit@c93e47d]: Revert Gerrit back to 3.8.6 - T354887 (duration: 00m 05s)
08:15 hashar@deploy1002: Started deploy [gerrit/gerrit@c93e47d]: Revert Gerrit back to 3.8.6 - T354887
08:10 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1003.eqiad.wmnet with OS bookworm
08:09 jiji@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2002.codfw.wmnet with OS bookworm
08:08 hashar@deploy1002: Finished deploy [gerrit/gerrit@7838134]: Gerrit to v3.9.5 on gerrit1003 - T354887 (duration: 00m 05s)
08:08 hashar@deploy1002: Started deploy [gerrit/gerrit@7838134]: Gerrit to v3.9.5 on gerrit1003 - T354887
08:08 hashar@deploy1002: Finished deploy [gerrit/gerrit@7838134]: Gerrit to v3.9.5 on gerrit2002 - T354887 (duration: 00m 08s)
08:08 hashar@deploy1002: Started deploy [gerrit/gerrit@7838134]: Gerrit to v3.9.5 on gerrit2002 - T354887
08:04 jiji@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc1039.eqiad.wmnet']
07:32 kartik@deploy1002: Finished scap: Backport for testwiki: Fix language for nan in Section Translation (duration: 28m 37s)
07:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T364299)', diff saved to https://phabricator.wikimedia.org/P63920 and previous config saved to /var/cache/conftool/dbconfig/20240603-072513-marostegui.json
07:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
07:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T364299)', diff saved to https://phabricator.wikimedia.org/P63919 and previous config saved to /var/cache/conftool/dbconfig/20240603-072450-marostegui.json
07:22 kartik@deploy1002: kartik: Continuing with sync
07:18 kartik@deploy1002: kartik: Backport for testwiki: Fix language for nan in Section Translation synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P63918 and previous config saved to /var/cache/conftool/dbconfig/20240603-070942-marostegui.json
07:04 kartik@deploy1002: Started scap: Backport for testwiki: Fix language for nan in Section Translation
06:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P63917 and previous config saved to /var/cache/conftool/dbconfig/20240603-065434-marostegui.json
06:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T364299)', diff saved to https://phabricator.wikimedia.org/P63916 and previous config saved to /var/cache/conftool/dbconfig/20240603-063925-marostegui.json
06:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T364299)', diff saved to https://phabricator.wikimedia.org/P63915 and previous config saved to /var/cache/conftool/dbconfig/20240603-063814-marostegui.json
06:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
06:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
06:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
06:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
06:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T364299)', diff saved to https://phabricator.wikimedia.org/P63914 and previous config saved to /var/cache/conftool/dbconfig/20240603-063735-marostegui.json
06:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P63913 and previous config saved to /var/cache/conftool/dbconfig/20240603-062227-marostegui.json
06:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 100%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63912 and previous config saved to /var/cache/conftool/dbconfig/20240603-061956-root.json
06:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P63911 and previous config saved to /var/cache/conftool/dbconfig/20240603-060719-marostegui.json
06:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 75%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63910 and previous config saved to /var/cache/conftool/dbconfig/20240603-060450-root.json
05:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T364299)', diff saved to https://phabricator.wikimedia.org/P63909 and previous config saved to /var/cache/conftool/dbconfig/20240603-055210-marostegui.json
05:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 50%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63908 and previous config saved to /var/cache/conftool/dbconfig/20240603-054944-root.json
05:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 25%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63907 and previous config saved to /var/cache/conftool/dbconfig/20240603-053438-root.json
05:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 10%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63906 and previous config saved to /var/cache/conftool/dbconfig/20240603-051932-root.json
05:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 5%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63905 and previous config saved to /var/cache/conftool/dbconfig/20240603-050424-root.json
04:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 1%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63904 and previous config saved to /var/cache/conftool/dbconfig/20240603-044918-root.json
04:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:18 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T364299)', diff saved to https://phabricator.wikimedia.org/P63903 and previous config saved to /var/cache/conftool/dbconfig/20240603-011839-marostegui.json
01:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
01:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
01:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T364299)', diff saved to https://phabricator.wikimedia.org/P63902 and previous config saved to /var/cache/conftool/dbconfig/20240603-011813-marostegui.json
01:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
01:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
01:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P63901 and previous config saved to /var/cache/conftool/dbconfig/20240603-010925-ladsgroup.json
01:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P63900 and previous config saved to /var/cache/conftool/dbconfig/20240603-010305-marostegui.json
01:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P63899 and previous config saved to /var/cache/conftool/dbconfig/20240603-005415-ladsgroup.json
00:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P63898 and previous config saved to /var/cache/conftool/dbconfig/20240603-004757-marostegui.json
00:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P63897 and previous config saved to /var/cache/conftool/dbconfig/20240603-003907-ladsgroup.json
00:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T364299)', diff saved to https://phabricator.wikimedia.org/P63896 and previous config saved to /var/cache/conftool/dbconfig/20240603-003247-marostegui.json
00:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P63895 and previous config saved to /var/cache/conftool/dbconfig/20240603-002359-ladsgroup.json
00:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply

2024-06-02

23:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T364069)', diff saved to https://phabricator.wikimedia.org/P63894 and previous config saved to /var/cache/conftool/dbconfig/20240602-232847-marostegui.json
23:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
23:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
23:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1213.eqiad.wmnet with reason: replication issues
20:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1213.eqiad.wmnet with reason: replication issues
20:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:47 taavi@cumin1002: dbctl commit (dc=all): 'depool db1213', diff saved to https://phabricator.wikimedia.org/P63893 and previous config saved to /var/cache/conftool/dbconfig/20240602-204719-taavi.json
20:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:32 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T364299)', diff saved to https://phabricator.wikimedia.org/P63892 and previous config saved to /var/cache/conftool/dbconfig/20240602-200046-marostegui.json
20:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
20:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
20:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T364299)', diff saved to https://phabricator.wikimedia.org/P63891 and previous config saved to /var/cache/conftool/dbconfig/20240602-200021-marostegui.json
20:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P63890 and previous config saved to /var/cache/conftool/dbconfig/20240602-194514-marostegui.json
19:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:32 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P63889 and previous config saved to /var/cache/conftool/dbconfig/20240602-193006-marostegui.json
19:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T364299)', diff saved to https://phabricator.wikimedia.org/P63888 and previous config saved to /var/cache/conftool/dbconfig/20240602-191458-marostegui.json
19:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P63887 and previous config saved to /var/cache/conftool/dbconfig/20240602-185215-ladsgroup.json
18:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
18:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
18:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P63886 and previous config saved to /var/cache/conftool/dbconfig/20240602-185151-ladsgroup.json
18:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P63885 and previous config saved to /var/cache/conftool/dbconfig/20240602-183643-ladsgroup.json
18:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P63884 and previous config saved to /var/cache/conftool/dbconfig/20240602-182135-ladsgroup.json
18:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P63883 and previous config saved to /var/cache/conftool/dbconfig/20240602-180627-ladsgroup.json
18:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:22 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:32 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T364299)', diff saved to https://phabricator.wikimedia.org/P63882 and previous config saved to /var/cache/conftool/dbconfig/20240602-144924-marostegui.json
14:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2146.codfw.wmnet with reason: Maintenance
14:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2146.codfw.wmnet with reason: Maintenance
14:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T364299)', diff saved to https://phabricator.wikimedia.org/P63881 and previous config saved to /var/cache/conftool/dbconfig/20240602-144900-marostegui.json
14:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P63880 and previous config saved to /var/cache/conftool/dbconfig/20240602-143352-marostegui.json
14:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P63879 and previous config saved to /var/cache/conftool/dbconfig/20240602-141843-marostegui.json
14:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P63878 and previous config saved to /var/cache/conftool/dbconfig/20240602-141139-root.json
14:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T364299)', diff saved to https://phabricator.wikimedia.org/P63877 and previous config saved to /var/cache/conftool/dbconfig/20240602-140334-marostegui.json
13:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P63876 and previous config saved to /var/cache/conftool/dbconfig/20240602-135632-root.json
13:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P63875 and previous config saved to /var/cache/conftool/dbconfig/20240602-134126-root.json
13:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P63874 and previous config saved to /var/cache/conftool/dbconfig/20240602-132620-root.json
13:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P63873 and previous config saved to /var/cache/conftool/dbconfig/20240602-131114-root.json
13:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P63872 and previous config saved to /var/cache/conftool/dbconfig/20240602-125608-root.json
12:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
12:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
12:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P63871 and previous config saved to /var/cache/conftool/dbconfig/20240602-124102-root.json
12:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P63870 and previous config saved to /var/cache/conftool/dbconfig/20240602-120033-ladsgroup.json
12:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
12:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P63869 and previous config saved to /var/cache/conftool/dbconfig/20240602-120010-ladsgroup.json
11:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P63868 and previous config saved to /var/cache/conftool/dbconfig/20240602-114503-ladsgroup.json
11:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P63867 and previous config saved to /var/cache/conftool/dbconfig/20240602-112955-ladsgroup.json
11:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T364069)', diff saved to https://phabricator.wikimedia.org/P63866 and previous config saved to /var/cache/conftool/dbconfig/20240602-112512-marostegui.json
11:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P63865 and previous config saved to /var/cache/conftool/dbconfig/20240602-111447-ladsgroup.json
11:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P63864 and previous config saved to /var/cache/conftool/dbconfig/20240602-111004-marostegui.json
10:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P63863 and previous config saved to /var/cache/conftool/dbconfig/20240602-105456-marostegui.json
10:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T364069)', diff saved to https://phabricator.wikimedia.org/P63862 and previous config saved to /var/cache/conftool/dbconfig/20240602-103948-marostegui.json
10:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:32 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:22 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T364299)', diff saved to https://phabricator.wikimedia.org/P63861 and previous config saved to /var/cache/conftool/dbconfig/20240602-091021-marostegui.json
09:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2145.codfw.wmnet with reason: Maintenance
09:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2145.codfw.wmnet with reason: Maintenance
09:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
09:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
09:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T364299)', diff saved to https://phabricator.wikimedia.org/P63860 and previous config saved to /var/cache/conftool/dbconfig/20240602-090941-marostegui.json
09:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P63859 and previous config saved to /var/cache/conftool/dbconfig/20240602-085433-marostegui.json
08:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P63858 and previous config saved to /var/cache/conftool/dbconfig/20240602-083925-marostegui.json
08:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1206.eqiad.wmnet with reason: Long schema change
07:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on db1206.eqiad.wmnet with reason: Long schema change
07:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P63856 and previous config saved to /var/cache/conftool/dbconfig/20240602-072956-root.json
07:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T364299)', diff saved to https://phabricator.wikimedia.org/P63855 and previous config saved to /var/cache/conftool/dbconfig/20240602-033618-marostegui.json
03:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2130.codfw.wmnet with reason: Maintenance
03:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2130.codfw.wmnet with reason: Maintenance
03:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T364299)', diff saved to https://phabricator.wikimedia.org/P63854 and previous config saved to /var/cache/conftool/dbconfig/20240602-033555-marostegui.json
03:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P63853 and previous config saved to /var/cache/conftool/dbconfig/20240602-032047-marostegui.json
03:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P63852 and previous config saved to /var/cache/conftool/dbconfig/20240602-030539-marostegui.json
03:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P63851 and previous config saved to /var/cache/conftool/dbconfig/20240602-025039-ladsgroup.json
02:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T364299)', diff saved to https://phabricator.wikimedia.org/P63850 and previous config saved to /var/cache/conftool/dbconfig/20240602-025031-marostegui.json
02:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
02:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
02:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T352010)', diff saved to https://phabricator.wikimedia.org/P63849 and previous config saved to /var/cache/conftool/dbconfig/20240602-025015-ladsgroup.json
02:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P63848 and previous config saved to /var/cache/conftool/dbconfig/20240602-023507-ladsgroup.json
02:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T364069)', diff saved to https://phabricator.wikimedia.org/P63847 and previous config saved to /var/cache/conftool/dbconfig/20240602-022710-marostegui.json
02:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
02:26 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
02:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T364069)', diff saved to https://phabricator.wikimedia.org/P63846 and previous config saved to /var/cache/conftool/dbconfig/20240602-022646-marostegui.json
02:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P63845 and previous config saved to /var/cache/conftool/dbconfig/20240602-021959-ladsgroup.json
02:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P63844 and previous config saved to /var/cache/conftool/dbconfig/20240602-021137-marostegui.json
02:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T352010)', diff saved to https://phabricator.wikimedia.org/P63843 and previous config saved to /var/cache/conftool/dbconfig/20240602-020451-ladsgroup.json
02:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P63842 and previous config saved to /var/cache/conftool/dbconfig/20240602-015629-marostegui.json
01:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T364069)', diff saved to https://phabricator.wikimedia.org/P63841 and previous config saved to /var/cache/conftool/dbconfig/20240602-014121-marostegui.json
01:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply

2024-06-01

23:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:32 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T364299)', diff saved to https://phabricator.wikimedia.org/P63839 and previous config saved to /var/cache/conftool/dbconfig/20240601-215534-marostegui.json
21:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2116.codfw.wmnet with reason: Maintenance
21:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2116.codfw.wmnet with reason: Maintenance
21:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2102.codfw.wmnet with reason: Long schema change
21:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on db2102.codfw.wmnet with reason: Long schema change
21:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1162 (T352010)', diff saved to https://phabricator.wikimedia.org/P63838 and previous config saved to /var/cache/conftool/dbconfig/20240601-201053-ladsgroup.json
20:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
20:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
20:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P63837 and previous config saved to /var/cache/conftool/dbconfig/20240601-201029-ladsgroup.json
19:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P63836 and previous config saved to /var/cache/conftool/dbconfig/20240601-195521-ladsgroup.json
19:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P63835 and previous config saved to /var/cache/conftool/dbconfig/20240601-194013-ladsgroup.json
19:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P63834 and previous config saved to /var/cache/conftool/dbconfig/20240601-192505-ladsgroup.json
19:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2102.codfw.wmnet with reason: Maintenance
17:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2102.codfw.wmnet with reason: Maintenance
17:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
17:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
17:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
17:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
17:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance
17:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance
17:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T364299)', diff saved to https://phabricator.wikimedia.org/P63833 and previous config saved to /var/cache/conftool/dbconfig/20240601-174133-marostegui.json
17:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P63832 and previous config saved to /var/cache/conftool/dbconfig/20240601-172625-marostegui.json
17:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T364069)', diff saved to https://phabricator.wikimedia.org/P63831 and previous config saved to /var/cache/conftool/dbconfig/20240601-172455-marostegui.json
17:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
17:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
17:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T364069)', diff saved to https://phabricator.wikimedia.org/P63830 and previous config saved to /var/cache/conftool/dbconfig/20240601-172432-marostegui.json
17:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P63829 and previous config saved to /var/cache/conftool/dbconfig/20240601-171116-marostegui.json
17:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P63828 and previous config saved to /var/cache/conftool/dbconfig/20240601-170924-marostegui.json
17:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T364299)', diff saved to https://phabricator.wikimedia.org/P63827 and previous config saved to /var/cache/conftool/dbconfig/20240601-165609-marostegui.json
16:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P63826 and previous config saved to /var/cache/conftool/dbconfig/20240601-165416-marostegui.json
16:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T364069)', diff saved to https://phabricator.wikimedia.org/P63825 and previous config saved to /var/cache/conftool/dbconfig/20240601-163907-marostegui.json
16:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:22 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:22 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:39 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1010.eqiad.wmnet with OS bullseye
13:39 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - akosiaris@cumin1002"
13:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T364299)', diff saved to https://phabricator.wikimedia.org/P63824 and previous config saved to /var/cache/conftool/dbconfig/20240601-125216-marostegui.json
12:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1235.eqiad.wmnet with reason: Maintenance
12:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1235.eqiad.wmnet with reason: Maintenance
12:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T364299)', diff saved to https://phabricator.wikimedia.org/P63823 and previous config saved to /var/cache/conftool/dbconfig/20240601-125152-marostegui.json
12:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P63822 and previous config saved to /var/cache/conftool/dbconfig/20240601-123644-marostegui.json
12:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P63821 and previous config saved to /var/cache/conftool/dbconfig/20240601-122136-marostegui.json
12:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T364299)', diff saved to https://phabricator.wikimedia.org/P63820 and previous config saved to /var/cache/conftool/dbconfig/20240601-120628-marostegui.json
12:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:08 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:08 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
11:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
11:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P63819 and previous config saved to /var/cache/conftool/dbconfig/20240601-095545-ladsgroup.json
09:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
09:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
09:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
09:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
09:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:36 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - akosiaris@cumin1002"
07:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:20 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1010.eqiad.wmnet with reason: host reimage
07:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
07:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T364299)', diff saved to https://phabricator.wikimedia.org/P63818 and previous config saved to /var/cache/conftool/dbconfig/20240601-071723-marostegui.json
07:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1234.eqiad.wmnet with reason: Maintenance
07:17 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1010.eqiad.wmnet with reason: host reimage
07:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1234.eqiad.wmnet with reason: Maintenance
07:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T364299)', diff saved to https://phabricator.wikimedia.org/P63817 and previous config saved to /var/cache/conftool/dbconfig/20240601-071700-marostegui.json
07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T364069)', diff saved to https://phabricator.wikimedia.org/P63816 and previous config saved to /var/cache/conftool/dbconfig/20240601-070211-marostegui.json
07:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
07:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P63815 and previous config saved to /var/cache/conftool/dbconfig/20240601-070151-marostegui.json
07:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
06:59 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1010.eqiad.wmnet with OS bullseye
06:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
06:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P63814 and previous config saved to /var/cache/conftool/dbconfig/20240601-064643-marostegui.json
06:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T364299)', diff saved to https://phabricator.wikimedia.org/P63813 and previous config saved to /var/cache/conftool/dbconfig/20240601-063135-marostegui.json
06:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
05:14 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
05:14 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
04:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
04:18 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:18 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
04:16 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:16 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
04:03 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
04:03 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:59 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:59 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:57 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:57 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:55 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:55 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:53 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:53 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:48 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:48 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:46 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:46 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:44 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:44 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:42 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:42 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:40 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:39 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:36 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:35 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:34 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:33 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:31 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:31 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:29 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:29 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:27 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:27 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:25 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:25 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:23 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:23 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:19 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:19 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:17 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:17 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:12 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:12 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:10 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:10 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:08 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:08 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:06 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:06 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:04 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
03:04 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
03:02 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
03:02 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:52 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:52 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:50 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:50 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:48 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:48 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:43 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:43 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:37 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:37 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:35 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:35 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:33 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:33 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:31 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:31 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:29 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:29 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:27 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:27 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:25 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:25 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:23 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:23 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:21 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:21 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:18 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:18 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:16 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:16 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:14 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:14 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T364299)', diff saved to https://phabricator.wikimedia.org/P63812 and previous config saved to /var/cache/conftool/dbconfig/20240601-021256-marostegui.json
02:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1232.eqiad.wmnet with reason: Maintenance
02:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1232.eqiad.wmnet with reason: Maintenance
02:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228 (T364299)', diff saved to https://phabricator.wikimedia.org/P63811 and previous config saved to /var/cache/conftool/dbconfig/20240601-021233-marostegui.json
02:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
02:03 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:03 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
02:01 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
02:01 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:59 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:59 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228', diff saved to https://phabricator.wikimedia.org/P63810 and previous config saved to /var/cache/conftool/dbconfig/20240601-015725-marostegui.json
01:57 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:57 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:55 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:55 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:53 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:52 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:51 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:51 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:49 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:49 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:47 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:47 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:45 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:45 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:43 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:43 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228', diff saved to https://phabricator.wikimedia.org/P63809 and previous config saved to /var/cache/conftool/dbconfig/20240601-014216-marostegui.json
01:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:40 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:40 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:36 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:36 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:32 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:32 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:30 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:30 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:28 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:28 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228 (T364299)', diff saved to https://phabricator.wikimedia.org/P63808 and previous config saved to /var/cache/conftool/dbconfig/20240601-012708-marostegui.json
01:26 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:26 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:24 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:24 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:22 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:22 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:20 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:20 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
01:18 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:18 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:16 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:16 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:14 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:14 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:12 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:12 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:10 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:10 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P63807 and previous config saved to /var/cache/conftool/dbconfig/20240601-010959-ladsgroup.json
01:08 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:08 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:06 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:06 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:04 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:04 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:02 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:02 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
01:00 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
01:00 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:58 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:58 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:56 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:55 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P63806 and previous config saved to /var/cache/conftool/dbconfig/20240601-005451-ladsgroup.json
00:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:54 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:53 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:52 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:51 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:49 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:49 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:47 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:47 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:45 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:45 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P63805 and previous config saved to /var/cache/conftool/dbconfig/20240601-003943-ladsgroup.json
00:38 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:38 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:30 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:30 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:27 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:27 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:25 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:25 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P63804 and previous config saved to /var/cache/conftool/dbconfig/20240601-002435-ladsgroup.json
00:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:22 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:21 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:21 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:19 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:18 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:17 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:16 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:13 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:13 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:11 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:11 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:09 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:09 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:06 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:06 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:04 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:04 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
00:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
00:01 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:01 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply

2024-06-21

2024-06-20

2024-06-19

2024-06-18

2024-06-17

2024-06-16

2024-06-15

2024-06-14

2024-06-13

2024-06-12

2024-06-11

2024-06-10

2024-06-09

2024-06-08

2024-06-07

2024-06-06

2024-06-05

2024-06-04

2024-06-03

2024-06-02

2024-06-01

Archives