Page MenuHomePhabricator

Retool transcode table with a status column
Open, Needs TriagePublic

Description

The transcode table in TimedMediaHandler doesn't currently store the status in a straightforward way, which makes the table awkward to query in bulk at the scale of Wikimedia Commons.

Instead of implicitly storing the state of a transcode as the null/present state of various timestamp columns, add a column with suitable state constants and suitable indexes for pulling bulk info and most-recently-added N items to each state.

  • add column: transcode_status INT (eg 0 - 'missing', 1 - 'queued', 2 - 'active', 3 - 'failed')
  • add column: transcode_touched VARBINARY(14)
  • add index: `transcode_status_touched (transcode_status,transcode_touched)
  • update code to set transcode_state and transcode_touched on each update [switchable or carefully deployed]
  • update special:transcode_statistics to use the new fields [switchable or carefully deployed]
  • maintenance script to fill out the fields on existing rows after deployment
  • prep for deployment of the db update

Event Timeline

Yup, we shouldn't use ENUM for schema anymore, just use code constants (missing=0, queued=1, etc.)

@bvibber if we are going to run db updates anyways. Perhaps we should add a column for the file size of the derivs to the table as well ? Those are currently unknown and there's been a desire in the past to get better insight into that information. T57942: Expose file sizes of transcoded assets in API

Agreed let's do those together...

Yup, we shouldn't use ENUM for schema anymore, just use code constants (missing=0, queued=1, etc.)

You're right, enum is asking for trouble next time a state gets added. ;) Updated task description.

Reedy renamed this task from Retool transcode table with a status enum column to Retool transcode table with a status column.Apr 9 2024, 7:44 PM

Change #1018347 had a related patch set uploaded (by Bvibber; author: Bvibber):

[mediawiki/extensions/TimedMediaHandler@master] WIP transcode table updates

https://gerrit.wikimedia.org/r/1018347

Change #1222601 had a related patch set uploaded (by Zabe; author: Zabe):

[mediawiki/extensions/TimedMediaHandler@master] Add status columns to transcode table

https://gerrit.wikimedia.org/r/1222601

Change #1222601 merged by jenkins-bot:

[mediawiki/extensions/TimedMediaHandler@master] Add status columns to transcode table

https://gerrit.wikimedia.org/r/1222601

Change #1223284 had a related patch set uploaded (by Zabe; author: Zabe):

[mediawiki/extensions/TimedMediaHandler@master] Start writing to transcode status columns

https://gerrit.wikimedia.org/r/1223284

Change #1223284 merged by jenkins-bot:

[mediawiki/extensions/TimedMediaHandler@master] Start writing to transcode status columns

https://gerrit.wikimedia.org/r/1223284

We never really talked about what we want to do with the old columns, to get rid of them, or keep them. One use for them is still to calculate how long something has been on the queue, or how long it took to transcode. The first one we can probably calculate when the job starts and print to console, the latter one is actually in the interface (and good feedback to users), so we would need to store that in a column if we were to remove the old timestamps.

I realized this while making some queries on quarry, where I calculate those durations based on the old timestamps
Transcodes by last successful: https://quarry.wmcloud.org/query/100961
Transcodes in progress: https://quarry.wmcloud.org/query/100963
Transcodes in queue total: https://quarry.wmcloud.org/query/100966
Transcodes in queue by key: https://quarry.wmcloud.org/query/100964
Last 500 transcode errors: https://quarry.wmcloud.org/query/83666

Note that the in progress values contain some items that are 'stuck', because their failure wasn't properly registered: T414348
And that the in queue contains items that will never be processed because the transcode key was disabled T414414, or because the file no longer exists.

Change #1229613 had a related patch set uploaded (by Zabe; author: Zabe):

[mediawiki/extensions/TimedMediaHandler@master] Add script for populating transcode state columns

https://gerrit.wikimedia.org/r/1229613

Change #1229613 had a related patch set uploaded (by Zabe; author: Zabe):

[mediawiki/extensions/TimedMediaHandler@master] Add script for populating transcode state columns

https://gerrit.wikimedia.org/r/1229613

Change #1018347 abandoned by TheDJ:

[mediawiki/extensions/TimedMediaHandler@master] WIP transcode table updates

Reason:

done via I12e4bc72e0a6163948bf19d9e30bef12534ea043 and I3bc0e2c9047ffed7aee1aff8150d6519c902ea99

https://gerrit.wikimedia.org/r/1018347

Okay that now it's fully populated. What's the next step here? I'm sorry. I don't remember all the details of this.

Other places that need converting to make use of the new fields are:

  • TranscodeStatusTable
  • checkTimeSinceLastReset of ApiTranscodeReset

https://commons.wikimedia.org/wiki/Special:Transcode_statistics should be using the new column now.

We:

  • should check if with the new queries for those pages, there's index improvements we should/have to make ?
  • we can probably drop the OLD index over the 4 timestamps ?

Change #1261563 had a related patch set uploaded (by TheDJ; author: TheDJ):

[mediawiki/extensions/TimedMediaHandler@master] Use the state and touched fields where possible

https://gerrit.wikimedia.org/r/1261563

Change #1261594 had a related patch set uploaded (by TheDJ; author: TheDJ):

[mediawiki/extensions/TimedMediaHandler@master] Update WebVideoTrancode classes to rely on touched and state fields

https://gerrit.wikimedia.org/r/1261594

Change #1261603 had a related patch set uploaded (by TheDJ; author: TheDJ):

[mediawiki/extensions/TimedMediaHandler@master] [DNM] Cleanup table fields

https://gerrit.wikimedia.org/r/1261603

Change #1261563 merged by jenkins-bot:

[mediawiki/extensions/TimedMediaHandler@master] Use the state and touched fields where possible

https://gerrit.wikimedia.org/r/1261563

Change #1261594 merged by jenkins-bot:

[mediawiki/extensions/TimedMediaHandler@master] Update WebVideoTrancode classes to rely on touched and state fields

https://gerrit.wikimedia.org/r/1261594