Index content from SharePoint in Microsoft 365 (preview)

Note

The SharePoint in Microsoft 365 indexer is in preview. It's offered "as-is" under Supplemental Terms of Use and supported on a best-effort basis only. Preview features aren't recommended for production workloads and aren't guaranteed to become generally available.

Before you proceed, review the known limitations.

Fill out this form to register for the preview. All requests are approved automatically. After you fill out the form, use a preview REST API to index your content.

Important

These features and functionality are part of the 2026-05-01-preview REST API. The 2026-05-01-preview is licensed to you as part of your Azure subscription and is subject to the terms applicable to "Previews" in the Microsoft Product Terms, the Microsoft Products and Services Data Protection Addendum ("DPA"), and the Supplemental Terms of Use for Microsoft Azure Previews.

The 2026-05-01-preview supports connections to other Microsoft services and third-party services. Use of these services is subject to their respective terms and might result in data processing or storage outside of the Azure compliance boundary, as well as data flowing into the Azure compliance boundary.

The 2026-05-01-preview can't modify access permissions that were set outside of the 2026-05-01-preview. If you use the 2026-05-01-preview with access- or permission-restricted content, a timing lag will occur before the 2026-05-01-preview recognizes changes to those access or permission restrictions.

It's your responsibility to manage whether your data will flow outside of your organization's compliance and geographic boundaries and any related implications, and that appropriate permissions, boundaries, and approvals are provisioned.

You're responsible for carefully reviewing and testing applications you build in the context of your specific use cases and making all appropriate decisions and customizations. This includes implementing your own responsible AI mitigations, such as metaprompts, content filters, or other safety systems, and ensuring your applications meet appropriate quality, reliability, security, and trustworthiness standards. For more information, see the Azure AI Search Transparency Note.

This article explains how to configure a search indexer to index documents stored in SharePoint document libraries for full-text search in Azure AI Search. The configuration steps are first, followed by behaviors and scenarios.

In Azure AI Search, an indexer extracts searchable data and metadata from a data source. The SharePoint in Microsoft 365 indexer provides the following functionality:

Indexes files and metadata from one or more document libraries.
Indexes SharePoint lists and their item field values, with each list column available as a source field for field mapping. This capability is in preview, starting in the 2026-05-01-preview REST API.
Indexes ASPX site pages (modern site pages). This capability is in preview, starting in the 2026-05-01-preview REST API.
Indexes mixed SharePoint content (document libraries, lists, and site pages) in a single indexer using the allSiteContent container value. This capability is in preview, starting in the 2026-05-01-preview REST API.
Indexes content across subsites when includeSubsites=true is set in the data source query. This capability is in preview, starting in the 2026-05-01-preview REST API.
Indexes incrementally, picking up just the new and changed files, list items, pages, and metadata.
Detects deleted content automatically. Deletion of files, list items, or pages is picked up on the next indexer run, and the corresponding search document is removed from the index.
Extracts text and normalized images from indexed documents automatically. Optionally, you can add a skillset for deeper AI enrichment, such as optical character recognition (OCR) or entity recognition.
Supports document basic access control lists (ACL) ingestion in preview. Starting in the 2026-05-01-preview, ACL changes are detected and updated incrementally on each successful indexer run for items with unique permissions. This release also extends ACL ingestion to list items, ASPX site pages, and SharePoint groups. For caveats and configuration steps, see Use a SharePoint indexer to ingest permission metadata.
Supports Microsoft Purview sensitivity label ingestion and honoring at query time. This functionality is in preview.

Prerequisites

Azure AI Search, Basic pricing tier or higher.
SharePoint in Microsoft 365 cloud service (OneDrive isn't a supported data source).
Files in a document library.
Visual Studio Code with the REST Client extension for setting up and running the indexer pipeline.

Choose your permissions setup

Before you create the app registration in Step 3, identify your scenario in the following table. Note the required Microsoft Graph permissions, SharePoint API permissions, and credential type, then follow the linked steps later in this article to apply them.

Scenario	Microsoft Graph permissions	SharePoint API permissions	Credential	Apply in
Index document libraries only, no ACL ingestion	`Files.Read.All`, `Sites.Read.All` (application) or delegated equivalents	None	Client secret (application) or device code (delegated)	Step 3, Step 6
Index lists, ASPX pages, or mixed content (no ACL ingestion)	`Files.Read.All`, `Sites.Read.All` (application)	None	Client secret or federated credential	Step 3
Document library ACL ingestion, Microsoft Entra users and standard groups only	`Files.Read.All`, `Sites.FullControl.All` (or `Sites.Selected`)	None	Client secret or federated credential	Step 3, Permissions by ACL scenario
ACL ingestion on lists, ASPX pages, or document libraries when SharePoint site groups must be honored	`Files.Read.All`, `Sites.FullControl.All` (or `Sites.Selected`)	`Sites.FullControl.All` (or `Sites.Selected`)	Federated credential (required)	Configuring the registered application with a managed identity, Permissions by ACL scenario
Query-time resolution of SharePoint site groups	No additional Microsoft Graph permissions (inherits from the prior row when also indexing document libraries, lists, or ASPX pages)	`User.Read.All`	Federated credential	Configure SharePoint groups support

Notes:

Delegated permissions are only viable for small testing and don't support ACL ingestion.
Federated credential is the recommended secretless authentication. It covers both indexer authentication and query-time SharePoint group resolution.
When you use Sites.Selected, grant the app explicit access to each target SharePoint site before indexing. If a site is configured in the data source without an explicit grant, the indexer fails.
This matrix is the entry-point summary. For ACL-specific scenario details, see Permissions by ACL scenario in the SharePoint ACL configuration article.

Supported document formats

The SharePoint in Microsoft 365 indexer can extract text from the following document formats:

CSV (see Indexing CSV blobs)
EML
EPUB
GZ
HTML
JSON (see Indexing JSON blobs)
KML (XML for geographic representations)
Markdown
Microsoft Office formats: DOCX/DOC/DOCM, XLSX/XLS/XLSM, PPTX/PPT/PPTM, MSG (Outlook emails), XML (both 2003 and 2006 WORD XML)
Open Document formats: ODT, ODS, ODP
PDF
Plain text files (see also Indexing plain text)
RTF
XML
ZIP

Limitations and considerations

Here are the limitations of this feature:

OneNote notebook files aren't supported.
Incremental indexing limitations:
- Renaming a SharePoint folder breaks incremental indexing. A renamed folder is treated as new content.
- Microsoft 365 processes that update SharePoint file system metadata can trigger incremental indexing, even if there are no other changes to content. Test your setup before relying on the indexer or AI enrichment. Verify how Microsoft 365 processes your documents.
Security limitations:
- No support for private endpoints. Secure network configuration must be enabled via a firewall.
- No support for tenants with Microsoft Entra ID Conditional Access enabled.
- No support for user-encrypted files and password-protected ZIP files. However, encrypted content is allowed if it's protected by Microsoft Purview sensitivity labels and if the configuration to preserve and honor those labels (preview) is enabled.
- Limited support for document-level access permissions. A basic level of ACL sync is currently in preview. For details and setup, see the SharePoint ACL configuration documentation. For required permissions per scenario, see Choose your permissions setup.

Here are some considerations when using this feature:

To build a custom Copilot or retrieval-augmented generation (RAG) app that interacts with SharePoint data using Azure AI Search, Microsoft recommends using the remote SharePoint knowledge source. This knowledge source uses the Copilot Retrieval API to query textual content directly from SharePoint in Microsoft 365, returning results to the agentic retrieval engine for merging, ranking, and response formulation. There's no search index used by this knowledge source, and only textual content is queried. Azure AI Search doesn't replicate data. It enforces the SharePoint permission model by returning only the results that each user is authorized to see.
If you need to create a custom Copilot/RAG application or AI agent to chat with SharePoint data in production environments, consider first building it directly via Microsoft Copilot Studio. If Copilot Studio doesn't meet your needs, consider:
- Creating a custom connector with SharePoint webhooks, calling the Microsoft Graph API to export data to an Azure Blob container, and then using the Azure blob indexer for incremental indexing.
- Creating your own Azure Logic Apps workflow that uses the Azure Logic Apps SharePoint connector and the Azure AI Search connector. The Azure AI Search connector is available once it reaches general availability. Use the workflow generated by the Azure portal wizard as a starting point, then customize it in the Azure Logic Apps designer to add the transformation steps you need. The workflow that the Azure AI Search wizard creates is a consumption workflow. For production workloads, switch to a standard logic app workflow to use its extra enterprise features.

Regardless of the approach you choose, whether building a custom connector with SharePoint webhooks or creating an Azure Logic Apps workflow, be sure to implement robust security measures. These measures include configuring shared private links, setting up firewalls, and preserving user permissions from the source and honoring those permissions at query time. You should also regularly audit and monitor your pipeline.

Configure the SharePoint in Microsoft 365 indexer

To set up the SharePoint in Microsoft 365 indexer, use a preview REST API. This section provides the steps.

(Optional) Step 1: Enable a system-assigned managed identity

Enable a system-assigned managed identity to automatically detect the tenant in which the search service is provisioned.

Perform this step if the SharePoint site is in the same tenant as the search service. Skip this step if the SharePoint site is in a different tenant. The identity is used for tenant detection. You can also skip this step if you want to put the tenant ID in the connection string. To use system-assigned or user-assigned managed identity for secretless indexing, configure the application permissions with secretless authentication.

After selecting Save, you receive an object ID assigned to your search service.

Step 2: Decide which permissions the indexer requires

For the decision matrix that covers ACL and non-ACL scenarios, see Choose your permissions setup. If you choose delegated permissions, user-delegated tokens expire every 75 minutes and require manual indexing using Run Indexer (preview) when they expire. Delegated permissions are recommended only for small testing operations.

Step 3: Create a Microsoft Entra application registration

The SharePoint in Microsoft 365 indexer uses a Microsoft Entra application for authentication. Create the application registration in the same tenant as Azure AI Search.

Sign in to the Azure portal.
Search for or navigate to Microsoft Entra ID, then select Add > App registration.
Select + New registration:
1. Enter a name for your app.
2. Select Single tenant.
3. Skip the URI designation step. No redirect URI required.
4. Select Register.
On the navigation pane under Manage, select API permissions, then Add a permission, then Microsoft Graph.
- If your indexer uses application API permissions, choose Application permissions.
  - For standard indexing, select:
    - Files.Read.All
    - Sites.Read.All
  - If you're enabling ACL ingestion (preview), the required permissions depend on which item types (document library files, list items, ASPX pages) and group types (Microsoft Entra vs. SharePoint site groups) you index. See Permissions by ACL scenario before completing this step. For the cross-scenario summary, see Choose your permissions setup.
    
    Using application permissions means that the indexer accesses the SharePoint site in a service context. So when you run the indexer, it has access to all content in the SharePoint tenant, which requires tenant admin approval. A client secret or secretless configuration is also required for authentication. Setting up the authentication mechanism is described later in this article under authentication modes for application API permissions only.
- If the indexer is using delegated API permissions, select Delegated permissions and then select Delegated - Files.Read.All, Delegated - Sites.Read.All, and Delegated - User.Read.
  
  Delegated permissions allow the search client to connect to SharePoint under the security identity of the current user.
Give admin consent.

Tenant admin consent is required when using application API permissions. Some tenants are locked down in such a way that tenant admin consent is required for delegated API permissions as well. If either of these conditions apply, you'll need to have a tenant admin grant consent for this Microsoft Entra application before creating the indexer.
Select the Authentication tab.
Set Allow public client flows to Yes, then select Save.
Select + Add a platform, then Mobile and desktop applications, then check https://login.microsoftonline.com/common/oauth2/nativeclient, then Configure.
Configure the indexer authentication method according to your solution needs.

Available authentication methods for application API permissions only

To authenticate the Microsoft Entra application with application permissions, the indexer uses either a client secret or a secretless configuration.

Using client secret

These are the instructions to configure the application to use a client secret to authenticate the indexer, so it can ingest data from SharePoint.

Select Certificates & Secrets from the menu on the left, then Client secrets, then New client secret.
In the menu that pops up, enter a description for the new client secret. Adjust the expiration date if necessary. If the secret expires, it needs to be recreated and the indexer needs to be updated with the new secret.
The new client secret appears in the secret list. Once you navigate away from the page, the secret isn't visible, so copy the value using the copy button and save it in a secure location.

Using secretless authentication to obtain application tokens

Use federated credentials to sign in without a client secret. Microsoft Entra trusts a managed identity to obtain an application token, so the indexer can ingest data from SharePoint without a stored secret. The next section walks through configuring a managed identity.

Configuring the registered application with a managed identity

Create (or select) a user-assigned managed identity and assign to your search service or a system-assigned managed identity, depending on your scenario requirements.
Capture the object (principal) ID. Use this value as part of the credentials configuration when you create the data source.
Select Certificates & Secrets from the menu on the left.
Under Federated credentials select + Add a credential.
Under Federated credential scenario select Managed Identity.
Select managed identity: Choose the managed identity created in step 1.
Add a name for your credential and select Save.

Step 4: Create data source

Starting in this section, use the latest preview REST API and a REST client or the latest supported beta SDK of your preference for the remaining steps.

A data source specifies which data to index, credentials, and policies to efficiently identify changes in the data (new, modified, or deleted rows). Multiple indexers in the same search service can use the same data source.

For SharePoint indexing, the data source must have the following required properties:

name is the unique name of the data source within your search service.
type must be "sharepoint". This value is case-sensitive.
credentials provide the SharePoint endpoint and the authentication method allowed for the application to request the Microsoft Entra tokens. An example SharePoint endpoint is https://[your-tenant-name].sharepoint.com/teams/MySharePointSite. You can get the endpoint by navigating to the home page of your SharePoint site and copying the URL from the browser. Review the connection string format for the supported syntax.
container specifies which document library to index. Properties control which documents are indexed.

To create a data source, call Create Data Source (preview).

Here's a data source definition sample for credentials with application secret or system-assigned managed identity.

POST https://[service name].search.windows.net/datasources?api-version=2026-05-01-preview
Content-Type: application/json
api-key: [admin key]

{
    "name" : "sharepoint-datasource",
    "type" : "sharepoint",
    "credentials" : { "connectionString" : "[connection-string]" },
    "container" : { "name" : "defaultSiteLibrary", "query" : null }
}

For user-assigned managed identities, supply the identity block in the data source and omit FederatedCredentialApplicationId from the connection string. For system-assigned managed identities, set FederatedCredentialApplicationId in the connection string (see the connection-string formats below).

POST https://[service name].search.windows.net/datasources?api-version=2026-05-01-preview
Content-Type: application/json
api-key: [admin key]

{
    "name" : "sharepoint-datasource",
    "type" : "sharepoint",
    "credentials" : { "connectionString" : "[connection-string]" },
    "container" : { "name" : "defaultSiteLibrary", "query" : null },
    "identity": {
      "@odata.type": "#Microsoft.Azure.Search.DataUserAssignedIdentity",
      "userAssignedIdentity": "/subscriptions/[Azure subscription ID]/resourceGroups/[resource-group]/providers/Microsoft.ManagedIdentity/userAssignedIdentities/[user-assigned managed identity]"
    }
}

Connection string format

The format of the connection string changes based on whether the indexer is using delegated API permissions or application API permissions.

Delegated API permissions connection string format

SharePointOnlineEndpoint=[SharePoint site url];ApplicationId=[Azure AD App ID];TenantId=[SharePoint site tenant id]
Application API permissions with application secret connection string format

SharePointOnlineEndpoint=[SharePoint site url];ApplicationId=[Azure AD App ID];ApplicationSecret=[Azure AD App client secret];TenantId=[SharePoint site tenant id]
Application API permissions with secretless (federated identity credential) connection string format:

SharePointOnlineEndpoint=[SharePoint site url];ApplicationId=[Azure AD App ID];FederatedCredentialApplicationId=[Entra application (client) ID that the FIC federates to];TenantId=[SharePoint site tenant id]

The following table describes each connection string field.

Field	Required	Description
`SharePointOnlineEndpoint`	Yes	SharePoint site URL (for example, `https://[your-tenant-name].sharepoint.com`).
`ApplicationId`	Yes	Microsoft Entra application (client) ID of the ingestion app. Must be a valid GUID.
`TenantId`	Optional	Microsoft Entra tenant GUID. Required when the SharePoint site is in a different tenant from the search service.
`ApplicationSecret`	Conditional	Client secret of the ingestion app. Use for secret-based authentication.
`FederatedCredentialApplicationId`	Conditional (FIC mode)	Microsoft Entra application (client) ID that the federated identity credential federates to. Must be a valid GUID.

Important

FederatedCredentialApplicationId and ApplicationSecret are mutually exclusive. Connection strings that combine them are rejected on data source create or update.

Note

For backward compatibility, the SharePoint indexer still accepts FederatedCredentialObjectId (the object/principal ID of the federated identity credential on the ingestion app) in the connection string, so existing data sources keep working without changes. Use FederatedCredentialApplicationId for new and updated data sources.

You can get tenantId from the Overview page in the Microsoft Entra admin center in your Microsoft 365 subscription.

You can get the managed identity object (principal) ID from the Configuring the registered application with a managed identity section.

Note

If the SharePoint site is in the same tenant as the search service and system-assigned managed identity is enabled, TenantId doesn't have to be included in the connection string. If the SharePoint site is in a different tenant from the search service, TenantId must be included.

The following example shows a data source created with FederatedCredentialApplicationId:

PUT https://[service name].search.windows.net/datasources/sharepoint-ds?api-version=2026-05-01-preview
Content-Type: application/json
api-key: [admin key]

{
  "name": "sharepoint-ds",
  "type": "sharepoint",
  "credentials": {
    "connectionString": "SharePointOnlineEndpoint=https://[your-tenant-name].sharepoint.com;ApplicationId=[Azure AD App ID];TenantId=[SharePoint site tenant id];FederatedCredentialApplicationId=[Entra application (client) ID that the FIC federates to]"
  },
  "container": { "name": "defaultSiteLibrary" }
}

If your indexer uses SharePoint ACL configuration (preview) or preserves and honors Microsoft Purview sensitivity labels (preview), review the related articles before you create the indexer. Each feature has specific data source, index, and skillset configuration steps.

Step 5: Create an index

The index specifies the fields in a document, attributes, and other constructs that shape the search experience.

To create an index, call Create Index (preview):

POST https://[service name].search.windows.net/indexes?api-version=2026-05-01-preview
Content-Type: application/json
api-key: [admin key]

{
    "name" : "sharepoint-index",
    "fields": [
        { "name": "id", "type": "Edm.String", "key": true, "searchable": false },
        { "name": "metadata_spo_item_name", "type": "Edm.String", "key": false, "searchable": true, "filterable": false, "sortable": false, "facetable": false },
        { "name": "metadata_spo_item_path", "type": "Edm.String", "key": false, "searchable": false, "filterable": false, "sortable": false, "facetable": false },
        { "name": "metadata_spo_item_content_type", "type": "Edm.String", "key": false, "searchable": false, "filterable": true, "sortable": false, "facetable": true },
        { "name": "metadata_spo_item_last_modified", "type": "Edm.DateTimeOffset", "key": false, "searchable": false, "filterable": false, "sortable": true, "facetable": false },
        { "name": "metadata_spo_item_size", "type": "Edm.Int64", "key": false, "searchable": false, "filterable": false, "sortable": false, "facetable": false },
        { "name": "content", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false }
    ]
}

Important

The key field in an index populated by the SharePoint in Microsoft 365 indexer depends on the container type in the data source:

For document-library content (defaultSiteLibrary, allSiteLibraries, or useQuery with library or folder filters), use metadata_spo_site_library_item_id. If a key field doesn't exist in the data source, metadata_spo_site_library_item_id is automatically mapped to the key field.
For list, page, or mixed content (allSiteLists, allSitePages, or allSiteContent), use metadata_spo_site_asset_item_id. This key field is in preview, starting in the 2026-05-01-preview REST API. Auto-mapping doesn't apply to this field — define an explicit fieldMappings entry from metadata_spo_site_asset_item_id to your index key field.

Apply the base64Encode mapping function when mapping these key fields to your index id field.

Step 6: Create an indexer

An indexer connects a data source with a target search index and provides a schedule to automate the data refresh. After the data source and index are created, you can create the indexer.

To create the indexer:

Send a Create Indexer (preview) request:

POST https://[service name].search.windows.net/indexers?api-version=2026-05-01-preview
Content-Type: application/json
api-key: [admin key]

{
    "name" : "sharepoint-indexer",
    "dataSourceName" : "sharepoint-datasource",
    "targetIndexName" : "sharepoint-index",
    "parameters": {
    "batchSize": null,
    "maxFailedItems": null,
    "base64EncodeKeys": null,
    "maxFailedItemsPerBatch": null,
    "configuration": {
        "indexedFileNameExtensions" : ".pdf, .docx",
        "excludedFileNameExtensions" : ".png, .jpg",
        "dataToExtract": "contentAndMetadata"
      }
    },
    "schedule" : { },
    "fieldMappings" : [
        { 
          "sourceFieldName" : "metadata_spo_site_library_item_id", 
          "targetFieldName" : "id", 
          "mappingFunction" : { 
            "name" : "base64Encode" 
          } 
         }
    ]
}

For data sources that use the allSiteLists, allSitePages, or allSiteContent container values, map metadata_spo_site_asset_item_id instead of metadata_spo_site_library_item_id.

When you use application permissions, the index can be queried while the initial indexer run is in progress, but only items that have already been indexed return results. Wait until the run completes for full coverage. The remaining instructions in this step apply only to delegated permissions.

When you create the indexer for the first time, the Create Indexer (preview) request waits until you complete the next step. You must call Get Indexer Status to get the link and enter your new device code.
```
GET https://[service name].search.windows.net/indexers/sharepoint-indexer/status?api-version=2026-05-01-preview
Content-Type: application/json
api-key: [admin key]
```
If you don't call Get Indexer Status within 10 minutes, the code expires and you must recreate the data source.

Copy the device login code from the Get Indexer Status response. The device login can be found in the "errorMessage".

{
    "lastResult": {
        "status": "transientFailure",
        "errorMessage": "To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code <CODE> to authenticate."
    }
}

Enter the code that was included in the error message.
The SharePoint in Microsoft 365 indexer will access the SharePoint content as the signed-in user. The user that logs in during this step will be that signed-in user. So, if you sign in with a user account that doesn't have access to a document in the Document Library that you want to index, the indexer won't have access to that document.

If possible, create a new organizational user account and grant it the exact permissions that you want the indexer to have.
Approve the permissions that are being requested.
The Create Indexer (preview) initial request completes if all the permissions provided above are correct and within the 10-minute timeframe.

Note

If the Microsoft Entra application requires admin approval and wasn't approved before logging in, you might see the following screen. Admin approval is required to continue.

Step 7: Check the indexer status

After the indexer has been created, you can call Get Indexer Status:

GET https://[service name].search.windows.net/indexers/sharepoint-indexer/status?api-version=2026-05-01-preview
Content-Type: application/json
api-key: [admin key]

GET https://[service-name].search.windows.net/indexes/[index-name]/docs?search=*&$count=true&api-version=2026-05-01-preview
api-key: [admin-api-key]

Update the data source

If there are no updates to the data source object, the indexer runs on a schedule without any user interaction.

If you change the data source while the device code is expired, sign in again to run the indexer. For example, if you change the data source query, sign in again using the https://microsoft.com/devicelogin and get the new device code.

Here are the steps for updating a data source, assuming an expired device code:

Call Run Indexer (preview) to manually start indexer execution.

POST https://[service name].search.windows.net/indexers/sharepoint-indexer/run?api-version=2026-05-01-preview  
Content-Type: application/json
api-key: [admin key]

Check the indexer status.

GET https://[service name].search.windows.net/indexers/sharepoint-indexer/status?api-version=2026-05-01-preview
Content-Type: application/json
api-key: [admin key]

If you get an error asking you to visit https://microsoft.com/devicelogin, open the page and copy the new code.
Paste the code into the dialog box.
Manually run the indexer again and check the indexer status. This time, the indexer run should successfully start.

Index document metadata

If you're indexing document metadata ("dataToExtract": "contentAndMetadata"), the following metadata is available to index.

Identifier	Type	Description
metadata_spo_site_library_item_id	Edm.String	The combination key of site ID, library ID, and item ID, which uniquely identifies an item in a document library for a site. Use this field as the index key for the `defaultSiteLibrary`, `allSiteLibraries`, and `useQuery` (library or folder filters) container values.
metadata_spo_site_asset_item_id	Edm.String	The combination key that uniquely identifies a list item, ASPX site page, or any asset in mixed-content mode. Use this field as the index key for the `allSiteLists`, `allSitePages`, and `allSiteContent` container values. Preview, starting in the 2026-05-01-preview REST API.
metadata_spo_site_id	Edm.String	The ID of the SharePoint site.
metadata_spo_library_id	Edm.String	The ID of document library.
metadata_spo_item_id	Edm.String	The ID of the (document) item in the library.
metadata_spo_item_last_modified	Edm.DateTimeOffset	The last modified date/time (UTC) of the item.
metadata_spo_item_name	Edm.String	The name of the item.
metadata_spo_item_size	Edm.Int64	The size (in bytes) of the item.
metadata_spo_item_content_type	Edm.String	The content type of the item.
metadata_spo_item_extension	Edm.String	The extension of the item.
metadata_spo_item_weburi	Edm.String	The URI of the item.
metadata_spo_item_path	Edm.String	The combination of the parent path and item name.

The SharePoint in Microsoft 365 indexer also supports metadata specific to each document type. For more information, see Content metadata properties used in Azure AI Search.

Note

To index custom metadata, "additionalColumns" must be specified in the query parameter of the data source.

Index SharePoint lists

SharePoint lists are indexable in preview, starting in the 2026-05-01-preview REST API. Set the data source container.name to allSiteLists to index all list items from a site, or to allSiteContent to combine list items with document libraries and site pages in a single indexer. To include subsite lists, add includeSubsites=true to the container.query.

For list-based or mixed-content indexers, the index key field must map from metadata_spo_site_asset_item_id. The list item content is surfaced in the content field as JSON-formatted field values, and the standard metadata_spo_item_* fields (such as metadata_spo_item_name, metadata_spo_item_weburi, and metadata_spo_item_last_modified) are populated for each list item.

Map list columns to index fields

Each column defined on a SharePoint list is exposed as a source field with the same name as the SharePoint column. You can map each column to an index field by using field mappings.

For example, consider a SharePoint list with the following columns.

SharePoint column	SharePoint column type
`Title`	Single line of text
`Price`	Number
`InStock`	Yes/No
`Category`	Choice

Add matching fields to your index definition, then map each column to its target field in the indexer:

{
  "name": "my-sharepoint-list-indexer",
  "dataSourceName": "my-sharepoint-list-ds",
  "targetIndexName": "products-index",
  "fieldMappings": [
    {
      "sourceFieldName": "metadata_spo_site_asset_item_id",
      "targetFieldName": "id",
      "mappingFunction": { "name": "base64Encode" }
    },
    { "sourceFieldName": "Title", "targetFieldName": "productName" },
    { "sourceFieldName": "Price", "targetFieldName": "price" },
    { "sourceFieldName": "InStock", "targetFieldName": "available" },
    { "sourceFieldName": "Category", "targetFieldName": "category" },
    { "sourceFieldName": "metadata_spo_item_last_modified", "targetFieldName": "lastUpdated" },
    { "sourceFieldName": "metadata_spo_item_weburi", "targetFieldName": "itemUrl" }
  ]
}

Make sure each target field exists in your index with a compatible type (for example, Edm.String for Title, Edm.Double or Edm.Int64 for Price, Edm.Boolean for InStock).

Index ASPX site pages

Modern ASPX site pages are indexable in preview, starting in the 2026-05-01-preview REST API. Set the data source container.name to allSitePages to index all pages from a site, or to allSiteContent to combine pages with document libraries and lists in a single indexer. To include subsite pages, add includeSubsites=true to the container.query.

For page-based or mixed-content indexers, the index key field must map from metadata_spo_site_asset_item_id. Page text is extracted into the content field, and the standard metadata_spo_item_* fields (such as metadata_spo_item_name, metadata_spo_item_weburi, and metadata_spo_item_last_modified) are populated for each page.

Include or exclude by file type

You can control which files are indexed by setting inclusion and exclusion criteria in the "parameters" section of the indexer definition.

Include specific file extensions by setting "indexedFileNameExtensions" to a comma-separated list of file extensions (with a leading dot). Exclude specific file extensions by setting "excludedFileNameExtensions" to the extensions that should be skipped. If the same extension is in both lists, it's excluded from indexing.

PUT /indexers/[indexer name]?api-version=2026-05-01-preview
{
    "parameters" : { 
        "configuration" : { 
            "indexedFileNameExtensions" : ".pdf, .docx",
            "excludedFileNameExtensions" : ".png, .jpeg" 
        } 
    }
}

Control which documents are indexed

A single SharePoint in Microsoft 365 indexer can index content from one or more document libraries. To specify which sites and document libraries to index, use the "container" parameter in the data source definition.

The data source "container" section has two properties for this task: "name" and "query".

Name

The "name" property is required and must be one of three values:

Value	Description
defaultSiteLibrary	Index all content from the site's default document library.
allSiteLibraries	Index all content from all document libraries in a site. Document libraries from a subsite are out of scope unless you set `includeSubsites=true` in the query (preview, 2026-05-01-preview). You can also choose `useQuery` and specify `includeLibrariesInSite` to scope to specific sites or subsites.
allSiteLists	Index all SharePoint list items from a site. Preview, starting in the 2026-05-01-preview REST API.
allSitePages	Index all modern ASPX site pages from a site. Preview, starting in the 2026-05-01-preview REST API.
allSiteContent	Index libraries, lists, and pages from a site in a single indexer. Preview, starting in the 2026-05-01-preview REST API.
useQuery	Only index the content defined in the `query`.

For data sources that use allSiteLists, allSitePages, or allSiteContent, the indexer key field mapping must use metadata_spo_site_asset_item_id instead of metadata_spo_site_library_item_id. For details, see Step 6: Create an indexer.

Query

The "query" parameter of the data source is made up of keyword/value pairs. The following keywords can be used. The values are either site URLs or document library URLs.

Note

To get the value for a particular keyword, navigate to the document library you want to include or exclude and copy the URI from the browser. This is the easiest way to get the value to use with a keyword in the query.

Keyword	Value description and examples
null	If null or empty, index either the default document library or all document libraries depending on the container name. Example: `"container" : { "name" : "defaultSiteLibrary", "query" : null }`
includeSubsites	When set to `true`, the indexer traverses the root site and all subsites. Combine with `allSiteLibraries`, `allSiteLists`, `allSitePages`, or `allSiteContent`. Preview, starting in the 2026-05-01-preview REST API. Example: `"container" : { "name" : "allSiteLibraries", "query" : "includeSubsites=true" }`
includeLibrariesInSite	Index content from all libraries under the specified site in the connection string. The value should be the URI of the site or subsite. Example 1: `"container" : { "name" : "useQuery", "query" : "includeLibrariesInSite=https://mycompany.sharepoint.com/mysite" }` Example 2 (include a few subsites only): `"container" : { "name" : "useQuery", "query" : "includeLibrariesInSite=https://mycompany.sharepoint.com/sites/TopSite/SubSite1;includeLibrariesInSite=https://mycompany.sharepoint.com/sites/TopSite/SubSite2" }`
includeLibrary	Index all content from this library. The value is the fully qualified path to the library, which can be copied from your browser: Example 1 (fully qualified path): `"container" : { "name" : "useQuery", "query" : "includeLibrary=https://mycompany.sharepoint.com/mysite/MyDocumentLibrary" }` Example 2 (URI copied from your browser): `"container" : { "name" : "useQuery", "query" : "includeLibrary=https://mycompany.sharepoint.com/teams/mysite/MyDocumentLibrary/Forms/AllItems.aspx" }`
excludeLibrary	Don't index content from this library. The value is the fully qualified path to the library, which can be copied from your browser: Example 1 (fully qualified path): `"container" : { "name" : "useQuery", "query" : "includeLibrariesInSite=https://mysite.sharepoint.com/subsite1; excludeLibrary=https://mysite.sharepoint.com/subsite1/MyDocumentLibrary" }` Example 2 (URI copied from your browser): `"container" : { "name" : "useQuery", "query" : "includeLibrariesInSite=https://mycompany.sharepoint.com/teams/mysite; excludeLibrary=https://mycompany.sharepoint.com/teams/mysite/MyDocumentLibrary/Forms/AllItems.aspx" }`
includeFolder	Index content from a specific folder and its subfolders. Value must be a full SharePoint folder URL. Behavior: Applies recursively to all subfolders. Multiple folders can be specified by repeating the parameter with semicolons. Folder filters are scoped to a single document library. Root-only paths aren't supported. If a folder referenced is renamed, the query must be updated. Example 1 (single folder): `"container": { "name": "useQuery", "query": "includeFolder=[your-tenant-name].sharepoint.com/sites/hr/Shared Documents/Policies" }` Example 2 (multiple folders): `"container": { "name": "useQuery", "query": "includeFolder=[your-tenant-name].sharepoint.com/sites/hr/Shared Documents/Specs;includeFolder=[your-tenant-name].sharepoint.com/sites/hr/Shared Documents/Designs" }`
excludeFolder	Don't index content from a specific folder and its subfolders. Value must be a full SharePoint folder URL. Behavior: Applies recursively to all subfolders. If a file matches both include and exclude rules, exclude takes precedence and the file is skipped. Folder filters are scoped to a single document library. Example 1 (exclude folder): `"container": { "name": "useQuery", "query": "excludeFolder=[your-tenant-name].sharepoint.com/sites/hr/Shared Documents/Policies/Archive" }` Example 2 (combine include + exclude): `"container": { "name": "useQuery", "query": "includeFolder=[your-tenant-name].sharepoint.com/sites/hr/Shared Documents/Policies;excludeFolder=[your-tenant-name].sharepoint.com/sites/hr/Shared Documents/Policies/Drafts" }`
additionalColumns	Index columns from the document library. The value is a comma-separated list of column names you want to index. Use a double backslash to escape semicolons and commas in column names: Example 1 (additionalColumns=MyCustomColumn,MyCustomColumn2): `"container" : { "name" : "useQuery", "query" : "includeLibrary=https://mycompany.sharepoint.com/mysite/MyDocumentLibrary;additionalColumns=MyCustomColumn,MyCustomColumn2" }` Example 2 (escape characters using double backslash): `"container" : { "name" : "useQuery", "query" : "includeLibrary=https://mycompany.sharepoint.com/teams/mysite/MyDocumentLibrary/Forms/AllItems.aspx;additionalColumns=MyCustomColumnWith\\,,MyCustomColumnWith\\;" }`

Handle errors

By default, the SharePoint in Microsoft 365 indexer stops as soon as it encounters a document with an unsupported content type, such as an image. You can use the excludedFileNameExtensions parameter to skip certain content types. However, you might need to index documents without knowing all the possible content types in advance. To continue indexing when an unsupported content type is encountered, set the failOnUnsupportedContentType configuration parameter to false:

PUT https://[service name].search.windows.net/indexers/[indexer name]?api-version=2026-05-01-preview
Content-Type: application/json
api-key: [admin key]

{
    ... other parts of indexer definition
    "parameters" : { "configuration" : { "failOnUnsupportedContentType" : false } }
}

For some documents, Azure AI Search is unable to determine the content type or unable to process a document of an otherwise supported content type. To ignore this failure mode, set the failOnUnprocessableDocument configuration parameter to false:

"parameters" : { "configuration" : { "failOnUnprocessableDocument" : false } }

Azure AI Search limits the size of documents that are indexed. These limits are documented in Service Limits in Azure AI Search. Oversized documents are treated as errors by default. However, you can still index storage metadata of oversized documents if you set the indexStorageMetadataOnlyForOversizedDocuments configuration parameter to true:

"parameters" : { "configuration" : { "indexStorageMetadataOnlyForOversizedDocuments" : true } }

You can also continue indexing if errors happen at any point of processing, either while parsing documents or while adding documents to an index. To ignore a specific number of errors, set the maxFailedItems and maxFailedItemsPerBatch configuration parameters to the desired values. For example:

{
    ... other parts of indexer definition
    "parameters" : { "maxFailedItems" : 10, "maxFailedItemsPerBatch" : 10 }
}

If a file on the SharePoint site has encryption enabled, you might see the following error message:

Code: resourceModified Message: The resource has changed since the caller last read it; usually an eTag mismatch Inner error: Code: irmEncryptFailedToFindProtector

The error message also includes the SharePoint site ID, drive ID, and drive item ID in the following pattern: <sharepoint site id> :: <drive id> :: <drive item id>. Use this information to identify which item is failing on the SharePoint end. The user can then remove the encryption from the item to resolve the issue.

Feedback

Was this page helpful?

Last updated on 2026-06-02

Index content from SharePoint in Microsoft 365 (preview)

Prerequisites

Choose your permissions setup

Supported document formats

Limitations and considerations

Configure the SharePoint in Microsoft 365 indexer

(Optional) Step 1: Enable a system-assigned managed identity

Step 2: Decide which permissions the indexer requires

Step 3: Create a Microsoft Entra application registration

Available authentication methods for application API permissions only

Using client secret

Using secretless authentication to obtain application tokens

Configuring the registered application with a managed identity

Step 4: Create data source

Connection string format

Step 5: Create an index

Step 6: Create an indexer

Step 7: Check the indexer status

Update the data source

Index document metadata

Index SharePoint lists

Map list columns to index fields

Index ASPX site pages

Include or exclude by file type

Control which documents are indexed

Name

Query

Handle errors

Related content

Feedback

Additional resources