Page MenuHomePhabricator

[Task] Drop support for php-serialized output from Special:EntityData
Closed, ResolvedPublic

Description

Context:
Special:EntityData is intended to be a LinkedData interface.
It should support various RDF flavours, and plain JSON.

Problem:
Special:EntityData currently inherits some formats from the MediaWiki API result formats, specifically php-serialized. This will be dropped.

This would allow us to use the JSON serialization of the entity directly, and drop the clunky dependency on the API result formatters.

Acceptance criteria:

  • The “php” format of Special:EntityData is completely gone.
    • https://www.wikidata.org/wiki/Special:EntityData/Q42.php returns an error (almost certainly 415 Unsupported Media Type, like Q42.blah)
    • curl -I -H 'Accept: application/vnd.php.serialized' https://www.wikidata.org/wiki/Special:EntityData/Q42 returns an error (almost certainly 406 Not Acceptable, like curl -I -H 'Accept: application/vnd.microsoft.portable-executable' https://www.wikidata.org/wiki/Special:EntityData/Q42)
  • The code and tests have been cleaned up as described above (“drop the clunky dependency on the API result formatters”), or we determine that this no longer applies.

Notes:

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

This is a product decision, so putting in the product column.
From the tech side removing this will leave us with less things to maintain, I'm not really sure if we "support" the php format here really.
I would guess that most of the PHP calls are perhaps scrapers and things requesting this format by accident.
Migration path would be to use JSON instead.

I can look it up in hadoop if that helps PM decision (@Lydia_Pintscher) on this.

Yesterday we had 3400 hits on php endpoints, 2089 were spiders and 1300 were from users (at least they faked user UA which is possible and happens quite often). 1000 of the hits belong to only four countries (which are not usual suspects) but I can't disclose more in a public ticket.

I'll raise this with Lydia in my next 1:1 with her

Yesterday we had 3400 hits on php endpoints, 2089 were spiders and 1300 were from users (at least they faked user UA which is possible and happens quite often). 1000 of the hits belong to only four countries (which are not usual suspects) but I can't disclose more in a public ticket.

@Ladsgroup can we get up to date numbers and also put this in relation to the other formats we expose?

Then @Lydia_Pintscher can make a current and informed decision on the future of the output.

Reasons to drop:

  • It adds a non-negligible amount of code that needs maintaining
  • It adds a stable interface to our stable interfaces that we need to communicate and follow the procedure for each and every change.
  • It adds two urls for cache busting (as explained above)
  • It doesn't give much benefit, the views of it are small (I put numbers below), its only unserializable in php and not any other language (unless with gymnastics).
  • (Might not be a big deal): Serialization and deserialization are security sensitive, we might expose something we shouldn't or receive something which would lead to arbitrary code execution.
    • This is not true here AFAIK but avoiding seriliazation and deserilazation in language the server is running is highly encouraged to reduce attack vectors.
TypeNumber of hits in September 21
json7,598,854
rdf89,861
ttl11,388,708
php2,116

Thank you!
Alright. Then let's do this.

@Ladsgroup can you say if the hits for the php-serialozed output are coming from one/very few individuals making a lot of requests or a lot of individuals making a few requests? Is there any discernable pattern in the requests or the tools they are made with? (I'm asking as this might change the communication a bit.)

It adds two urls for cache busting (as explained above)

No, we only cache a limited set of URLs and the RDF format is not included in those. (The earlier comment was from before the caching story was resolved, we changed plans at some point in there.)

(But to be clear, I also support getting rid of this.)

@Ladsgroup can you say if the hits for the php-serialozed output are coming from one/very few individuals making a lot of requests or a lot of individuals making a few requests? Is there any discernable pattern in the requests or the tools they are made with? (I'm asking as this might change the communication a bit.)

It has four major consumers, Three seems to be bots and one is either a bot with fake UA or a gadget. There are maybe in total thirty usages outside these four but negligible.

This task has gained some new urgency in the context of T118538: Reduce the usage of API format=php. Given T98035#7373635 and T98035#7375023, I assume we still have a valid product decision to go ahead with this removal.

I do have one question: Is this a breaking (or significant) change according to our Stable Interface Policy? The Linked Data Interface is a listed under Stable Public APIs, but Stable Data Formats only includes RDF and JSON.

(If we decide this isn’t a breaking change, IMHO it would be nice to clarify this in the policy. Example: “The Linked Data Interface accessible at …, when used with one of the Stable Data Formats, is considered a stable interface.”)

CC @Arian_Bozorg and @Ifrahkhanyaree_WMDE, I’m not sure if this falls in the scope of Omega or Reuse. Cain we just go ahead with this removal, or do we want to announce it first?

I 'm not sure which team it falls under, but I think although this is only used by a handful of people and is depricated we should announce it as a significant change

Is this a breaking (or significant) change […]

I would argue that the PHP serialized format never was and never can be stable and thus can never fall under https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy. It's still nice to inform people ahead of time when we remove an old feature like this. But that happened, didn't it?

@Lucas_Werkmeister_WMDE confirmed that this will get moved into Sprint planning on Tuesday.

The significant change announcement will be made on 22 May for a config change on 28 May

Change #1289331 had a related patch set uploaded (by Arthur taylor; author: Arthur taylor):

[mediawiki/extensions/Wikibase@master] [WIP] Remove support for PHP EntityData format

https://gerrit.wikimedia.org/r/1289331

(FTR, my assumption is that we want a (very temporary) config flag for this, so we can turn this off at a predictable point in time next Thursday rather than whenever the train rolls forward.)

@Lucas_Werkmeister_WMDE We already have the wmgWikibaseEntityDataFormats config option. Removing 'php' from that list has the desired effect (at least locally) of making the two requests in the acceptance criteria fail with 415 and 406 as required. We can simply make the config change a suitable amount of time after the interface change is announced, and merge the deprecation of the feature in a subsequent version / train.

Alright, sounds good to me. Thanks for investigating!

Change #1289736 had a related patch set uploaded (by Arthur taylor; author: Arthur taylor):

[operations/mediawiki-config@master] Disable support for PHP-serialized EntityData on Wikidata

https://gerrit.wikimedia.org/r/1289736

Change #1289751 had a related patch set uploaded (by Arthur taylor; author: Arthur taylor):

[mediawiki/extensions/Wikibase@master] [WIP] Remove ApiFormatter dependency

https://gerrit.wikimedia.org/r/1289751

Something I didn’t think about during task time but realized while reviewing the config change: we should deploy this to Test Wikidata and Beta Wikidata before the announcement, right? Then the announcement can say that people can already test the behavior there, as usual.

Change #1289898 had a related patch set uploaded (by Arthur taylor; author: Arthur taylor):

[operations/mediawiki-config@master] Disable support for PHP-serialized EntityData on Wikidata

https://gerrit.wikimedia.org/r/1289898

Change #1289736 merged by jenkins-bot:

[operations/mediawiki-config@master] Disable support for PHP-serialized EntityData on Beta / Test Wikidata

https://gerrit.wikimedia.org/r/1289736

Mentioned in SAL (#wikimedia-operations) [2026-05-20T13:18:47Z] <arthurtaylor@deploy1003> Started scap sync-world: Backport for [[gerrit:1289736|Disable support for PHP-serialized EntityData on Beta / Test Wikidata (T98035)]]

Mentioned in SAL (#wikimedia-operations) [2026-05-20T13:20:43Z] <arthurtaylor@deploy1003> arthurtaylor: Backport for [[gerrit:1289736|Disable support for PHP-serialized EntityData on Beta / Test Wikidata (T98035)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2026-05-20T13:26:14Z] <arthurtaylor@deploy1003> Finished scap sync-world: Backport for [[gerrit:1289736|Disable support for PHP-serialized EntityData on Beta / Test Wikidata (T98035)]] (duration: 07m 26s)

Change #1290698 had a related patch set uploaded (by Arthur taylor; author: Arthur taylor):

[mediawiki/extensions/Wikibase@master] [WIP] Serialization fixups v2

https://gerrit.wikimedia.org/r/1290698

Change #1290698 abandoned by Arthur taylor:

[mediawiki/extensions/Wikibase@master] [WIP] Serialization fixups v2

Reason:

merged into parent change

https://gerrit.wikimedia.org/r/1290698

Change #1289898 merged by jenkins-bot:

[operations/mediawiki-config@master] Disable support for PHP-serialized EntityData on Wikidata production

https://gerrit.wikimedia.org/r/1289898

Mentioned in SAL (#wikimedia-operations) [2026-05-28T07:04:03Z] <wmde-fisch@deploy1003> Started scap sync-world: Backport for [[gerrit:1289898|Disable support for PHP-serialized EntityData on Wikidata production (T98035)]]

Mentioned in SAL (#wikimedia-operations) [2026-05-28T07:06:08Z] <wmde-fisch@deploy1003> wmde-fisch, arthurtaylor: Backport for [[gerrit:1289898|Disable support for PHP-serialized EntityData on Wikidata production (T98035)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2026-05-28T07:11:18Z] <wmde-fisch@deploy1003> Finished scap sync-world: Backport for [[gerrit:1289898|Disable support for PHP-serialized EntityData on Wikidata production (T98035)]] (duration: 07m 15s)

Change #1289331 merged by Arthur taylor:

[mediawiki/extensions/Wikibase@master] Remove support for PHP EntityData format

https://gerrit.wikimedia.org/r/1289331

Change #1294961 had a related patch set uploaded (by Arthur taylor; author: Arthur taylor):

[mediawiki/extensions/Wikibase@master] Tests for JSON EntityData formatting

https://gerrit.wikimedia.org/r/1294961

Change #1294962 had a related patch set uploaded (by Arthur taylor; author: Arthur taylor):

[mediawiki/extensions/Wikibase@master] Refactor AddPageInfo to make item order stable

https://gerrit.wikimedia.org/r/1294962

Change #1295438 had a related patch set uploaded (by Arthur taylor; author: Arthur taylor):

[mediawiki/extensions/WikibaseLexeme@master] Introduce test of JSON serialization

https://gerrit.wikimedia.org/r/1295438

Change #1295444 had a related patch set uploaded (by Arthur taylor; author: Arthur taylor):

[mediawiki/extensions/WikibaseLexeme@master] Remove test temporarily for change of EntityDataSerializationService

https://gerrit.wikimedia.org/r/1295444

Change #1295445 had a related patch set uploaded (by Arthur taylor; author: Arthur taylor):

[mediawiki/extensions/WikibaseLexeme@master] Restore test of EntityDataSerializationService after refactor

https://gerrit.wikimedia.org/r/1295445

Change #1294961 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Tests for JSON EntityData formatting

https://gerrit.wikimedia.org/r/1294961

Change #1295438 merged by jenkins-bot:

[mediawiki/extensions/WikibaseLexeme@master] Introduce test of JSON serialization

https://gerrit.wikimedia.org/r/1295438

Change #1295444 merged by jenkins-bot:

[mediawiki/extensions/WikibaseLexeme@master] Disable test temporarily for change of EntityDataSerializationService

https://gerrit.wikimedia.org/r/1295444

Change #1289751 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Remove ApiFormatter dependency

https://gerrit.wikimedia.org/r/1289751

Change #1295445 merged by jenkins-bot:

[mediawiki/extensions/WikibaseLexeme@master] Restore test of EntityDataSerializationService after refactor

https://gerrit.wikimedia.org/r/1295445

Change #1294962 abandoned by Lucas Werkmeister (WMDE):

[mediawiki/extensions/Wikibase@master] Refactor AddPageInfo to make item order stable

https://gerrit.wikimedia.org/r/1294962

Looks like it is done and no more effort needed here.
Thank you!