Page MenuHomePhabricator

Deepcat should not rely on blazegraph's gas service
Closed, ResolvedPublic

Description

As part of the migration away from blazegraph the deepcat keyword should stop relying on blazegraph custom extensions.
In T425472 @AWesterinen-WMF is exploring ways to to rewrite queries relying on this feature.

The current approach is:

SELECT ?out (MIN(?d) AS ?depth) WHERE {
 BIND (<$input_category> AS ?in)
  { BIND (<$input_category> AS ?out) . BIND(0 AS ?d) }
  UNION { ?out mediawiki:isInCategory ?in . BIND(1 AS ?d) }
  UNION { ?out mediawiki:isInCategory/mediawiki:isInCategory ?in . BIND(2 AS ?d) }
  UNION { ?out mediawiki:isInCategory/mediawiki:isInCategory/mediawiki:isInCategory ?in . BIND(3 AS ?d) }
  UNION { ?out mediawiki:isInCategory/mediawiki:isInCategory/mediawiki:isInCategory/mediawiki:isInCategory ?in . BIND(4 AS ?d) }
  UNION { ?out mediawiki:isInCategory/mediawiki:isInCategory/mediawiki:isInCategory/mediawiki:isInCategory/mediawiki:isInCategory ?in . BIND(5 AS ?d) }

} GROUP BY ?out ORDER BY ?depth LIMIT $limit

The limitation is that we must have a limit on the depth.

AC:

  • rewrite and test the deepcat sparql query to use standard SPARQL
  • deepcat should use standard SPARQL to fetch the category graph

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
dcausse triaged this task as High priority.
dcausse updated the task description. (Show Details)
dcausse edited subscribers, added: AWesterinen-WMF; removed: AWesterinen.

Tested the rewrite and it returns the same set of results, when the limit is reached the results are slightly different but this is totally expected. We don't guarantee stable results on very large trees.

Change #1285791 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Deepcat: do not use custom blazegraph feature

https://gerrit.wikimedia.org/r/1285791

This is really great news!

Tested the rewrite and it returns the same set of results, when the limit is reached the results are slightly different but this is totally expected. We don't guarantee stable results on very large trees.

Do you expect any change in runtime and resource requirements?

Change #1285791 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Deepcat: do not use custom blazegraph feature

https://gerrit.wikimedia.org/r/1285791

This is really great news!

Tested the rewrite and it returns the same set of results, when the limit is reached the results are slightly different but this is totally expected. We don't guarantee stable results on very large trees.

Do you expect any change in runtime and resource requirements?

No, I don't think so, it might possibly timeout more often but testing random categories I did not notice a runtime difference.
Concerning resources I know the gas service is actually kind of hungry.