Following the work on T418499: Attribution API MVP: Provide initial-pass base reference count we've implemented a basic count for references in articles requested through the Attribution API based on their HTML. This works well for all pages that have Parser Cache, and use that cache to examine the HTML and output a close-to-accurate citation count with minimal RegEx.
However, as shown in T420024: [SPIKE] Attribution API Citation count followup: Citation # in Flagged Revs wikis, this will not work for pages that are utilizing FlaggedRevs 'stable' functionality. Those pages use the FlaggedRevs cache and not the regular Parser Cache, and in order to get the HTML from those pages we need to request it from the correct patch. This is relevant to several wikis (German and Russian Wikipedias) but also potentially to groups of pages on other wikis (like some pages on English Wikipedia).
This ticket is a followup to the investigation, utilizing the solution offered by Aaron to implement a proper check (whether the page utilizes 'stable' state from FlaggedRevs) and extract the correct HTML.
Acceptance criteria
- Add a check to the "number of citations" operation that checks whether there's a need to fall back on FlaggedRevs cache
- Implement the fallback to read from the correct cache (Parser vs FlaggedRevs)
- Implement a "no response found" fallback in case we have a cache-miss from FlaggedRevs (which does not automatically triggers a parse)
- Make sure the regex is correctly implemented for both cases
- Add unit tests to verify both types of HTML can work