Page MenuHomePhabricator

Ensure ParserOutput can always be combined asynchronously/out-of-order
Open, MediumPublic

Description

Right now a number of extensions treat the ParserOutput as a read-write store, reading a previous value from the ParserOutput, updating it, and then storing the updated value back. This is incompatible with the separate/asynchronous/incremental parsing model on the Parsoid roadmap, the "write-only" model of the Parsoid ContentMetadataCollector interface, and the proposed asynchronous-parsing aspect of the WikiFunctions feature.

Methods which assume a destructive update should be replaced with append and set variants which are guaranteed to be write-only. (The "set" will throw an exception if a different value is stored for a key, although use as a flag where multiple places set the same value is permitted; "append" collects all values in a set.)

Any post-processing (to select the minimum value set, the last value set in wikitext order, etc) should be done when the collected results are transferred to the OutputPage, not done in the ParserOutput itself (ie, in the OutputPageParserOutput hook).

Generally speaking the requirement is that the operations to combine metadata should be order-independent. Increment operations are fine, as long as they are recorded as such and not as "read, opaquely modify, then write". Similarly set-union operations are okay, even with repeated elements. I can take the metadata output *without a fragment* and union it with the metadata *of the fragment* and get a consistent result, without having to determine exactly where in the document the fragment falls. By the same logic, overwriting an existing key in the metadata with the same value is fine: the output will set the key to the desired value regardless of the order in which we evaluated the two fragments which wrote that key/value pair.

Overwriting a key with a new value is *not* okay, because we can't later update either of the pieces which wrote that key without knowing the order in which the fragments appeared, since the final value is typically "the last write". Cases where the final value is a function of *all* writes, for example cache expiry where the final value is min(<all writes>) are acceptable, because the final value doesn't depend on the order in which fragments are evaluated.

Event Timeline

Change 759837 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Use uniform representation for ParserOutput \"index policy\"

https://gerrit.wikimedia.org/r/759837

Change 762008 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Provide method to merge a ParserOutput into a ContentMetadataCollector

https://gerrit.wikimedia.org/r/762008

Change 762008 merged by jenkins-bot:

[mediawiki/core@master] Provide method to merge a ParserOutput into a ContentMetadataCollector

https://gerrit.wikimedia.org/r/762008

Change 759837 merged by jenkins-bot:

[mediawiki/core@master] Use uniform representation for ParserOutput "index policy"

https://gerrit.wikimedia.org/r/759837

Change 804310 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] [EXPERIMENTAL] deprecate mutation of ToC in OutputPageParserOutput hook

https://gerrit.wikimedia.org/r/804310

ssastry triaged this task as Medium priority.Jun 14 2022, 10:37 PM
ssastry added a project: Parsoid-Read-Views.

I don't have anything to back it up, but i feel like a really common use case, would be an extension wants to store some key value pairs under its own key as a namespace.

e.g. in the old version, you would do

$metadata = $parser->getExtensionData( 'MyExtension' ) ?: [];
// Assumption here is that either $someKey doesn't exist, or if it does, the extension does not care which write wins.
$metadata[$someKey] = $someValue;
$metadata = $parser->setExtensionData( 'MyExtension' );

The new appendExtensionData doesn't seem to cover that usecase.

I don't have anything to back it up, but i feel like a really common use case, would be an extension wants to store some key value pairs under its own key as a namespace.

e.g. in the old version, you would do

$metadata = $parser->getExtensionData( 'MyExtension' ) ?: [];
// Assumption here is that either $someKey doesn't exist, or if it does, the extension does not care which write wins.
$metadata[$someKey] = $someValue;
$metadata = $parser->setExtensionData( 'MyExtension' );

The new appendExtensionData doesn't seem to cover that usecase.

Yes, you should prefix the namespace. eg:

$parser->setExtensionData( "MyExtension-$someKey", $someValue);

That is a bit more robust than trying to mutate an entry in an array value.

Change #804310 abandoned by C. Scott Ananian:

[mediawiki/core@master] Deprecate mutation of ToC in OutputPageParserOutput hook

Reason:

We ended up removing setTOCHTML(), making this patch moot.

https://gerrit.wikimedia.org/r/804310

@cscott: Removing task assignee as this open task has been assigned for more than two years - see the email sent to all task assignees on 2024-04-15.
Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome! :)
If this task has been resolved in the meantime, or should not be worked on by anybody ("declined"), please update its task status via "Add Action… 🡒 Change Status".
Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator. Thanks!

Change #1227449 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/services/parsoid@master] Set 'prevent-selective-update' when conflicting metadata is set

https://gerrit.wikimedia.org/r/1227449

Change #1227449 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Set 'prevent-selective-update' when conflicting metadata is set

https://gerrit.wikimedia.org/r/1227449

Change #1264672 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.23.0-a24

https://gerrit.wikimedia.org/r/1264672

Change #1264672 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.23.0-a24

https://gerrit.wikimedia.org/r/1264672