Jump to content

CDN/Headers

From Wikitech
< CDN
(Redirected from X-Analytics)

The Wikimedia CDN uses headers for security, analytical, and functional purposes. Some headers are sent to clients and some are only seen through internal systems.

X-Analytics

An HTTP header used for measurement purposes, including in cache log format and the webrequest data stream. A MediaWiki extension implements the capability of extracting this information into the data lake.

Generally, values are added to the header on the server side; the only keys accepted from the client side are preview and pageview.

Format

The X-Analytics header is formatted as a list of key=value pairs separated by semicolons, like mf-m=b or zero=123-45;mf-m=b. If a key occurs more than once, it is undefined which one takes precedence.

The special value - must be interpreted as the empty string.

Keys

Current keys
Key Value Origin Since Until Team Contact Use case
mf-m b, amc, b,amc, or not set appserver ? Current Readers Web Phuedx If set, then the value b indicates that the user is opted into the beta mode (of the mobile site) (mf-m=b), the value amc indicates Advanced Mobile Contributions (mf-m=amc), and b,amc indicates both (mf-m=b,amc). See MobileContext.php.
proxy Proxy name, e.g. Opera varnish ? Current Wikipedia Zero Yurik If set, indicates that this request has been received via one of the trusted proxies such as Opera Mini servers.

Currently, the following proxies can be expected:

Value Description
Opera Opera mini proxy
Nokiaprod Nokia Xpress Production Proxy Servers
Nokiaqa Nokia Xpress QA Lab Proxy Servers
IORG Internet.org (set by analytics.inc.vcl in puppet)
https 1 varnish ? Current SRE Traffic BBlack If set, will be equal to "1", indicating HTTPS protocol. Currently set for the vast majority of requests, including all that are served with content from canonical WMF domains. If it is missing and the HTTP status is 301, the request was sent using HTTP and met with a HTTP redirect response, most likely to the corresponding HTTPS URL. For other response codes <400 (non-errors), it is assumed that the absence of this field also indicates a HTTP request. For some rare cases involving response codes >= 400, it may be possible that this field is not set even though the request was over HTTPS. (More details)
ismobile 1 varnish June 2025 Current ? ? If set, will equal "1", indicating that the request came from a mobile client (i.e. mobile user agent or mobile opt-in cookie), and is thus routed to MediaWiki with an X-Subdomain header to enable MobileFrontend. Launch task:T390924.
wmfuuid UUID v4 value varnish ? Current Mobile apps dr0ptp4kt If set, will be equal to a hyphen separated value, and indicates a unique app installation. The ID may span multiple requests, as it is generated once, at app install time, using an appropriate library (Java, Objective C), and conforms to RFC 4122 version 4.

Older versions of the app may contain an appInstallID parameter in the request URL instead, or may contain both the appInstallID parameter in the URL as well as the wmfuuid X-Analytics value. Later versions of the software should only contain the wmfuuid X-Analytics value and not the appInstallID parameter in the URL.

Requests from the app will not contain this header if the user has turned off "Send usage reports" in the settings menu of the app.

WMF-Last-Access dd-MMM-yyyy, e.g. 06-May-2015 varnish ? Current Analytics (Infrastructure) Milimetric Date of site last access. If set will be equal to the latest date when a device issued a request to the specific host in dd-MMM-yyyy format(Eg: 06-May-2015) and an expiration date set to ~31 days in the future. More explanation at Analytics/Unique_clients/Last_access_solution.
preview 1 client ? Analytics (Infrastructure) Milimetric Whether this is a preview request (not present if not). At the time of this writing, preview requests by mobile apps are not consider pageviews.

Expected value is preview=1.


Not actively used as of Dec 2021 (task T297172#7567161).

pageview 1 client ? Current Analytics (Infrastructure) Milimetric If set it will count the request in question as a pageview regardless of other attributes of request.
nocookies 1 varnish ? Current Analytics (Infrastructure) Madhuvishy or Nuria If set it will tag the request in question as a nocookie request. This means that either this is a fresh browser session, a user browsing with cookies disabled or possibly a bot request.

We expect that the majority of requests tagged with nocookies will belong to bots. Please see: change 244626.

loggedIn 1 appserver (WikimediaEvents) ? WMDE-Analytics Addshore If set, will be equal to "1", and indicates that the request came from a logged in user (see also code).
page_id Page ID appserver (WikimediaEvents) ? WMDE-Analytics Addshore, Ori.livneh If set, will be a string of a positive integer.
ns Namespace ID appserver (WikimediaEvents) ? WMDE-Analytics Addshore, Ori.livneh If set, will be a string integer (can be negative for negative namespace IDs)
special Special page name appserver (WikimediaEvents) ? WMDE-Analytics Addshore Set for special pages only. This will be the base name of the special page, so if the user is browsing a page via an alias the actual page name will be here.
translationengine Identifier, e.g. GT varnish Nov 2018 Current Product ABaso If set, indicates request served through a known intermediary service for machine translations. "GT" stands for Google Translate.

Added in T208795.

wprov <3_char_feature>

<1_char_platform><major_version>


e.g. srpw1 for SRP, Web, v1.

client or varnish ? Current ? ? see Provenance
debug 1 varnish Jan 2021 Current Analytics (Infrastructure) Milimetric Added in T263683.
client_port medium-size int varnish Jan 2021 Current Analytics (Infrastructure) JAllemandou Added in T271953.
public_cloud 1 varnish May 2021 Current SRE CDanis Added in T279380.
sessioncookie 1 varnish November 2022 Current SRE Vgutierrez Added in T319324.
prefetch_sec_purpose chrome_private_prefetch, chrome_prerender, chrome_preview, 1, or nonstandard varnish January 2024 Current Analytics (Infrastructure) ABaso / AOtto Added in T346463.
chrome_private_prefetch_version 1, or later on an incremented version varnish January 2024 Current Analytics (Infrastructure) ABaso / AOtto Added in T346463.
prefetch_purpose 1 varnish January 2024 Current Analytics (Infrastructure) ABaso / AOtto Added in T346463. Note that this may be present in conjunction with other prefetch tagged values.
prefetch_x_moz 1 varnish January 2024 Current Analytics (Infrastructure) ABaso / AOtto Added in T346463.
rev_id Revision ID appserver (WikimediaEvents) February 2024 Current ? ? Added in T346350.
authorization OAuth, or Bearer, or unknown. Possibly more in the future varnish February 2025 Current SRE CDanis A summary of the HTTP Authorization header sent by the client, if any.
wmfuniq_days integers 0 .. 8 varnish October 2025 Current SRE CDanis 0 => no valid Edge Unique cookie in the request

1..8 => number of days the same valid cookie has been returned to us, rounded up, capped at 8.

wmfuniq_weeks integers 0 .. 52 varnish October 2025 Current SRE CDanis 0 => no valid Edge Unique cookie in the request

1..52 => number of weeks the same valid cookie has been returned to us, rounded up, capped at 52

wmfuniq_freq integers 0 .. 10 varnish October 2025 Current SRE CDanis 0 => no valid Edge Unique cookie in the request, or, the user has visited the site on fewer than 10% of distinct weeks since cookie issuance

1..10 => the user has visited the site on (freq/10) * 100% of the distinct weeks since cookie issuance

ja3n MD5 hash HAProxy September 2025 Current SRE Vgutierrez JA3N fingerprint of the client performing the request
auth_type string unknown-$session->getProvider() appserver (WikimediaEvents) March 2026 Current MediaWiki Interfaces HCoplin Added in T418606
Former keys
Key Value Origin Since Until Team Contact Use case
php zend, or hhvm appserver ? Jan 2015 SRE _joe_ If set, marks the used PHP implementation.

This tag was only set between September 2014 and January 2015 during the migration from Zend to HHVM. (See I46ff99, and I75b30b)

zero MCC-MNC of a zero carrier, e.g. 404‑01. varnish ? July 2019 Wikipedia Zero Yurik If set, indicates that this request has been associated with the given carrier. It does not mean that the request qualifies as page view.

Removal in T213769.

zeronet Subdivision of a carrier, e.g. b varnish ? July 2019 Wikipedia Zero Yurik Used of disambiguate between different parts/configurations of a single carrier. Like broadband vs. special access points.

Removal in T213769.

max-snippet 1, 0, or not set appserver (WikimediaEvents) Mar 2022 Oct 2022 Readers Web cjming If set, the value 1 indicates the page's robots meta tag contains the max-snippet directive. The value 0 indicates the page's robots meta tag does not contain the max-snippet directive. If set, both 1 and 0 indicate that the page is part of an A/B test in the treatment and control groups respectively. Added in T301584.

Removed in I65ce99b04acc as part of T310267.

X-Cache

Origin Returned to client?
HAProxy, Varnish Yes

A comma-separated list of cache hostnames with information such as hit/miss status for each entry. This header is read right-to-left: The rightmost is the outermost cache and further entries to the left progress deeper towards the application layer. The rightmost cache is the in-memory cache while all others are disk caches. In case of cache hit, the number of times the object has been returned is also specified. Once "hit" is encountered while reading right to left, everything to the left of "hit" is part of the cached object that got hit. It's whether the entries to the left missed, passed, or hit when that object was first pulled into the hitting cache.

Possible values are:

  • hit: a cache hit in cache storage. There was no need to query a deeper cache server (or the applayer, if already at the last cache server). Hits could need reaching an inner layer if content is stale and must-revalidate is set. In this scenario the cache server sends a conditional request to an inner layer and if a 304 Not Modified is obtained the response is sent from the cache.
  • int: locally-generated response from the cache. For example, a 301 redirect. The cache did not use a cache object and it didn't need to contact another server. Backend errors will trigger an int response as well. let's consider a backend responding with a 429 without a response body, the cache will internally generate an error response after contacting the applayer.
  • miss: the object might be cacheable, but we don't have it.
  • pass: the object was uncacheable, talk to a deeper level.

Some subtleties on "pass": different caches (eg: in-memory vs. on-disk) might disagree on whether the object is cacheable or not. A pass on the in-memory cache (for example, because the object is too big) could be a hit for an on-disk cache. Also, it's sometimes not clear that an object is uncacheable till the moment we fetch it. In that case, we cache for a short while the fact that the object is uncachable. In Varnish terminology, this is a "hit-for-pass".

If we don't know an object is uncacheable until after we fetch it, it's initially identical to a normal miss. Which means coalescing, other requests for the same object will wait for the first response. But after that first fetch we get an uncacheable object, which can't answer the other requests which might have queued. Because of that they all get serialized and we've destroy the performance of hot (high-parallelism) objects that are uncacheable. "hit-for-pass" is the answer to that problem. When we make that first request (no knowledge), and get an uncacheable response, we create a special cache entry that says something like "this object cannot be cached, remember it for 10 minutes" and then all remaining queries for the next 10 minutes proceed in parallel without coalescing, because it's already known the object isn't cacheable.

The content of the X-Cache header is recorded for every request in the webrequest log table.

Example
  • X-Cache: cp1066 hit/6, cp3043 hit/1, cp3040 hit/26603

X-Cache-Status

Origin Returned to client?
HAProxy, Varnish Yes

This header condenses the various X-Cache values into a single value to describe the overall cache status.

Possible values are:

  • hit-front: A hit came from the outer-most cache level (Varnish).
  • hit-local: A hit came not from the outer-most cache level (Varnish) but instead an inner level (ATS).
  • int-front: An int came from the outer-most cache level (Varnish).
  • int-local: An int came not from the outer-most cache level (Varnish) but instead an inner level (ATS).
  • int-tls: The request only hit the TLS termination layer (HAProxy) and not the caches. This indicates HTTP→HTTPS redirection.
  • miss: the object might be cacheable, but no portion of the stack had it.
  • pass: the object was uncacheable by any portion of the stack.
  • unknown: Catch-all value when internal parsing mechanisms fail to categorize as any of the above values. You should never see this.
Examples
  • X-Cache: cp4038 miss, cp4038 hit/45761X-Cache-Status: hit-front
  • X-Cache: cp4051 hit, cp4051 missX-Cache-Status: hit-local
  • X-Cache: cp5021 intX-Cache-Status: int-tls
  • X-Cache: cp5021 intX-Cache-Status: int-front
  • X-Cache: cp5018 int, cp5018 passX-Cache-Status: int-local

X-Client-IP

Origin Returned to client?
Varnish Yes

Reports the User-Agent IP as reported by the layer 3 (no HTTP headers are parsed to populate the header).

Examples
  • X-Client-IP: 185.15.58.224
  • X-Client-IP: 2a02:ec80:600:ed1a::1

X-Client-Port

Origin Returned to client?
HAProxy No

Reports the source port of the connection on the client side, which is the port the client connected from.

Example
  • X-Client-Port: 25312

X-Connection-Properties

Origin Returned to client?
HAProxy No

A multi-value header that lists various properties of the request. These properties always include the following key=value properties delimited by semi-colons (;):

  • H2: Represents whether HTTP/2 is used. Possible values are 0 or 1.
  • SSR: Returns true if the TLS session has been resumed through the use of SSL session cache or TLS tickets on an incoming connection over an SSL/TLS transport layer. Possible values are 0 or 1.
  • SSL: Returns the name of the used protocol when the incoming connection was made over an TLS transport layer.
  • C: Returns the name of the used cipher when the incoming connection was made over an TLS transport layer.
  • EC: The elliptic curve used.
Example
  • X-Connection-Properties: H2=1;SSR=0;SSL=TLSv1.3;C=TLS_CHACHA20_POLY1305_SHA256;EC=X25519

X-Image-Generator

Origin Returned to client?
HAProxy No

An internal header used within the Wikimedia CDN and request classification systems to signal the source a link to a given image or thumbnail. It provides early, lightweight identification of known traffic type, helping optimize filtering and rate-limiting decisions for media content.

This header is meant to:

  • Tag media traffic, based on how it's being access and directed, as specified by MediaWiki.
  • Apply rate-limiting based on the indicated use-case.
  • Allow Requestctl, HAProxy and Varnish logic to apply differentiated rules based on known usage.

The header follows the form X-Image-Generator: value where value identifies the source/generator of the URL. Possible values are:

  • api
  • imageinfo
  • index
  • parser
  • rest

Any other value will be considered invalid.

Examples
  • X-Image-Generator: api
  • X-Image-Generator: parser

X-Is-Browser

Origin Returned to client?
HAProxy No

This header contains, for requests in class E and F of #X-Trusted-Request, a score indicating how likely it is that the request is coming from a browser and not a script. Values above 80 indicate a high likelihood that the request is coming from a browser, and not from a script. Conversely, a value below 20 indicates a high likelihood of the request not coming from a browser.

X-JA3N

Origin Returned to client?
HAProxy No

JA3N fingerprint for help with identifying abuse.

Example
  • X-JA3N: e7d705a3286e19ea42f587b344ee6865 (Tor client)

X-JA4H

Origin Returned to client?
HAProxy No

JA4H fingerprint for help with identifying abuse.

Example
  • X-JA4H: t13d1516h2_8daaf6152771_02713d6af862 (Chrome)

X-Provenance

Origin Returned to client?
HAProxy No

An internal header used within the Wikimedia CDN and request classification systems to signal the origin or trust level of a request. It provides early, lightweight identification of known traffic sources, helping optimize filtering and rate-limiting decisions such as bypassing generic rate limits/Requestctl rules for trusted sources or use as an input to moat-mode rules or future trust scoring systems

This header is meant to:

  • Tag traffic based on its origin before deeper inspection (e.g. session token validation or UA classification)
  • Enable fast-path handling (e.g. skip filtering, assign different rate limits)
  • Allow Requestctl, HAProxy and Varnish logic to apply differentiated rules based on known provenance

In the future it will also:

  • Integrate with session/token-based identification
  • Help shape rate-limiting tiers dynamically
  • Expand label taxonomy to support more trusted classes

The header follows the form X-Provenance: label1=value1;labelN=valueN where label identifies the provenance of the request.

Examples
  • X-Provenance: net: used to flag internal or requests coming from trusted network ranges
  • X-Provenance: abuser: request coming from a known abuser
  • X-Provenance: client: request coming from a known client ipblock
  • X-Provenance: cloud: request coming from a known cloud
  • X-Provenance: isp: ISP data provided by MaxMind ISP database
  • X-Provenance: net=unknown: default fallback value
  • X-Provenance: datacenter=true: indicates the request is coming from a datacenter, not from a eyeballs provider. Data is provided at the moment by the Spur datacenter feed
  • X-Provenance: id: request coming from a verified client, for which we have both a matching user agent and a matching provenance expression. For instance, a request with user-agent "Googlebot" coming from the ip ranges of googlebot.

X-Forwarded-Proto

Origin Returned to client?
HAProxy No

Identifies the protocol (HTTP or HTTPS) used by connecting client. The value of this header is hard-coded to https.

Example
  • X-Forwarded-Proto: https

X-Trusted-Request

Origin Returned to client?
HAProxy No

This header expresses the level of trust of a request from the point of view of identification: do we know who is making this request, and in that case, do we trust them? The values go from A to F, see the table below for an explanation of the meaning.

Value Meaning
A The request comes from a trusted network, like WMCS or another wikimedia network, and is exempted by most rate-limiting and requestctl filters.
B The request comes from verified crawlers and bots which we identify by their User-Agent and IP range. These requests have allocated rate-limits in the CDN, and are excluded from any other filtering rule.
C The request has a valid logged-in MediaWiki session (correctly signed JWT session token). The request is exempted from most requestctl filters, and rate-limiting is based on the MediaWiki account rather than the IP (via the encrypted JWT subject ID).
D The request is from a bot that identifies itself with a user-agent compliant with our robot policy but are not otherwise authenticated. Requests from these bots are automatically rate-limited based on their contact information, according to our robot policy.
E Generic, unidentified traffic. This includes most of the logged-out human traffic and bots that do not honor our UA policy. This traffic is subject to all requestctl filtering rules, and it also gets a score indicating the probability of being a browser (see X-Is-Browser below). Depending on the score, rate-limiting (which is based on the wmfuniq cookie, or IP as a fallback) will be more lax or steeper.
F Traffic from abusive networks. It should mostly be blocked or heavily rate-limited.

On the backend, this information can help you make decisions about performing expensive operations, or setting different limits on resource consumption.

Examples
  • X-Trusted-Request: B
  • X-Trusted-Request: -

X-UA-Contact

Origin Returned to client?
HAProxy No

This header is present in requests of classes C through F of #X-Trusted-Request and contains the contact information from the automated clients (bot) that respects our policy. This could either an URL or email address indicated in the User-Agent header by the client. If the client indicates both contact information in the User-Agent header the email is preferred and saved in the X-UA-Contact header sent downstream.

Historical headers

Headers that are no longer used and only retained here for historical information.

X-Varnish-Cluster

This header was used to signal the back-end caching layer which varnish cluster handled a request. The value of this header was hard-coded to misc.

Example
  • X-Varnish-Cluster: misc

See also