Page MenuHomePhabricator

[SPIKE] Investigate the remaining causes of the SLO being in the red
Open, Needs TriagePublic3 Estimated Story Points

Description

We have already fixed the Tone Check failures that were causing the bulk of the failures, but there are still more to investigate.

Event Timeline

image.png (1,934×622 px, 82 KB)

Note the moment that we fixed it.

ppelberg set the point value for this task to 3.Apr 29 2026, 5:24 PM

Change #1287011 had a related patch set uploaded (by DLynch; author: DLynch):

[mediawiki/extensions/VisualEditor@master] EditCheckFactory: catch errors creating a check

https://gerrit.wikimedia.org/r/1287011

Change #1287012 had a related patch set uploaded (by DLynch; author: DLynch):

[mediawiki/extensions/VisualEditor@master] TextMatchEditCheck: fall back to a valid language if possible

https://gerrit.wikimedia.org/r/1287012

Change #1287017 had a related patch set uploaded (by DLynch; author: DLynch):

[mediawiki/extensions/VisualEditor@master] EditCheck controller: some more checks for destroyed surface

https://gerrit.wikimedia.org/r/1287017

Change #1287011 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] EditCheckFactory: catch errors creating a check

https://gerrit.wikimedia.org/r/1287011

Those patches all came from me investigating uncaught JS errors related to editcheck for this task, though not all of them could actually interfere with the SLO.

https://gerrit.wikimedia.org/r/1287017 specifically might be causing SLO-related errors that appear in a spiky way, because it's the one most likely to be happening if the tone check service had one of its slow-response periods, opening up an opportunity for people to cancel out of their editing session while that response was being awaited.

Change #1287012 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] TextMatchEditCheck: fall back to a valid language if possible

https://gerrit.wikimedia.org/r/1287012

Change #1287017 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] EditCheck controller: some more checks for destroyed surface

https://gerrit.wikimedia.org/r/1287017

CleanShot 2026-06-01 at 11.28.13@2x.png (840×616 px, 70 KB)

I think this has helped. Following those patches going out on the train (finished on the 21st) we haven't yet seen any spikes like there were before.

This does indeed look promising! Let's watch for another week or two, and then we can probably call it good enough.

Looks pretty good now. We've had literally no budget consumed in the last 7 days on the dashboard.