Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Note
The SharePoint in Microsoft 365 indexer is in preview. It's offered "as-is" under Supplemental Terms of Use and supported on a best-effort basis only. Preview features aren't recommended for production workloads and aren't guaranteed to become generally available.
Before you proceed, review the known limitations.
Fill out this form to register for the preview. All requests are approved automatically. After you fill out the form, use a preview REST API to index your content.
Important
These features and functionality are part of the 2026-05-01-preview REST API. The 2026-05-01-preview is licensed to you as part of your Azure subscription and is subject to the terms applicable to "Previews" in the Microsoft Product Terms, the Microsoft Products and Services Data Protection Addendum ("DPA"), and the Supplemental Terms of Use for Microsoft Azure Previews.
The 2026-05-01-preview supports connections to other Microsoft services and third-party services. Use of these services is subject to their respective terms and might result in data processing or storage outside of the Azure compliance boundary, as well as data flowing into the Azure compliance boundary.
The 2026-05-01-preview can't modify access permissions that were set outside of the 2026-05-01-preview. If you use the 2026-05-01-preview with access- or permission-restricted content, a timing lag will occur before the 2026-05-01-preview recognizes changes to those access or permission restrictions.
It's your responsibility to manage whether your data will flow outside of your organization's compliance and geographic boundaries and any related implications, and that appropriate permissions, boundaries, and approvals are provisioned.
You're responsible for carefully reviewing and testing applications you build in the context of your specific use cases and making all appropriate decisions and customizations. This includes implementing your own responsible AI mitigations, such as metaprompts, content filters, or other safety systems, and ensuring your applications meet appropriate quality, reliability, security, and trustworthiness standards. For more information, see the Azure AI Search Transparency Note.
This article explains how to configure a search indexer to index documents stored in SharePoint document libraries for full-text search in Azure AI Search. The configuration steps are first, followed by behaviors and scenarios.
In Azure AI Search, an indexer extracts searchable data and metadata from a data source. The SharePoint in Microsoft 365 indexer provides the following functionality:
- Indexes files and metadata from one or more document libraries.
- Indexes SharePoint lists and their item field values, with each list column available as a source field for field mapping. This capability is in preview, starting in the 2026-05-01-preview REST API.
- Indexes ASPX site pages (modern site pages). This capability is in preview, starting in the 2026-05-01-preview REST API.
- Indexes mixed SharePoint content (document libraries, lists, and site pages) in a single indexer using the
allSiteContentcontainer value. This capability is in preview, starting in the 2026-05-01-preview REST API. - Indexes content across subsites when
includeSubsites=trueis set in the data source query. This capability is in preview, starting in the 2026-05-01-preview REST API. - Indexes incrementally, picking up just the new and changed files, list items, pages, and metadata.
- Detects deleted content automatically. Deletion of files, list items, or pages is picked up on the next indexer run, and the corresponding search document is removed from the index.
- Extracts text and normalized images from indexed documents automatically. Optionally, you can add a skillset for deeper AI enrichment, such as optical character recognition (OCR) or entity recognition.
- Supports document basic access control lists (ACL) ingestion in preview. Starting in the 2026-05-01-preview, ACL changes are detected and updated incrementally on each successful indexer run for items with unique permissions. This release also extends ACL ingestion to list items, ASPX site pages, and SharePoint groups. For caveats and configuration steps, see Use a SharePoint indexer to ingest permission metadata.
- Supports Microsoft Purview sensitivity label ingestion and honoring at query time. This functionality is in preview.
Prerequisites
Azure AI Search, Basic pricing tier or higher.
SharePoint in Microsoft 365 cloud service (OneDrive isn't a supported data source).
Files in a document library.
Visual Studio Code with the REST Client extension for setting up and running the indexer pipeline.
Choose your permissions setup
Before you create the app registration in Step 3, identify your scenario in the following table. Note the required Microsoft Graph permissions, SharePoint API permissions, and credential type, then follow the linked steps later in this article to apply them.
| Scenario | Microsoft Graph permissions | SharePoint API permissions | Credential | Apply in |
|---|---|---|---|---|
| Index document libraries only, no ACL ingestion | Files.Read.All, Sites.Read.All (application) or delegated equivalents |
None | Client secret (application) or device code (delegated) | Step 3, Step 6 |
| Index lists, ASPX pages, or mixed content (no ACL ingestion) | Files.Read.All, Sites.Read.All (application) |
None | Client secret or federated credential | Step 3 |
| Document library ACL ingestion, Microsoft Entra users and standard groups only | Files.Read.All, Sites.FullControl.All (or Sites.Selected) |
None | Client secret or federated credential | Step 3, Permissions by ACL scenario |
| ACL ingestion on lists, ASPX pages, or document libraries when SharePoint site groups must be honored | Files.Read.All, Sites.FullControl.All (or Sites.Selected) |
Sites.FullControl.All (or Sites.Selected) |
Federated credential (required) | Configuring the registered application with a managed identity, Permissions by ACL scenario |
| Query-time resolution of SharePoint site groups | No additional Microsoft Graph permissions (inherits from the prior row when also indexing document libraries, lists, or ASPX pages) | User.Read.All |
Federated credential | Configure SharePoint groups support |
Notes:
- Delegated permissions are only viable for small testing and don't support ACL ingestion.
- Federated credential is the recommended secretless authentication. It covers both indexer authentication and query-time SharePoint group resolution.
- When you use
Sites.Selected, grant the app explicit access to each target SharePoint site before indexing. If a site is configured in the data source without an explicit grant, the indexer fails. - This matrix is the entry-point summary. For ACL-specific scenario details, see Permissions by ACL scenario in the SharePoint ACL configuration article.
Supported document formats
The SharePoint in Microsoft 365 indexer can extract text from the following document formats:
- CSV (see Indexing CSV blobs)
- EML
- EPUB
- GZ
- HTML
- JSON (see Indexing JSON blobs)
- KML (XML for geographic representations)
- Markdown
- Microsoft Office formats: DOCX/DOC/DOCM, XLSX/XLS/XLSM, PPTX/PPT/PPTM, MSG (Outlook emails), XML (both 2003 and 2006 WORD XML)
- Open Document formats: ODT, ODS, ODP
- Plain text files (see also Indexing plain text)
- RTF
- XML
- ZIP
Limitations and considerations
Here are the limitations of this feature:
OneNote notebook files aren't supported.
Incremental indexing limitations:
Renaming a SharePoint folder breaks incremental indexing. A renamed folder is treated as new content.
Microsoft 365 processes that update SharePoint file system metadata can trigger incremental indexing, even if there are no other changes to content. Test your setup before relying on the indexer or AI enrichment. Verify how Microsoft 365 processes your documents.
Security limitations:
No support for private endpoints. Secure network configuration must be enabled via a firewall.
No support for tenants with Microsoft Entra ID Conditional Access enabled.
No support for user-encrypted files and password-protected ZIP files. However, encrypted content is allowed if it's protected by Microsoft Purview sensitivity labels and if the configuration to preserve and honor those labels (preview) is enabled.
Limited support for document-level access permissions. A basic level of ACL sync is currently in preview. For details and setup, see the SharePoint ACL configuration documentation. For required permissions per scenario, see Choose your permissions setup.
Here are some considerations when using this feature:
To build a custom Copilot or retrieval-augmented generation (RAG) app that interacts with SharePoint data using Azure AI Search, Microsoft recommends using the remote SharePoint knowledge source. This knowledge source uses the Copilot Retrieval API to query textual content directly from SharePoint in Microsoft 365, returning results to the agentic retrieval engine for merging, ranking, and response formulation. There's no search index used by this knowledge source, and only textual content is queried. Azure AI Search doesn't replicate data. It enforces the SharePoint permission model by returning only the results that each user is authorized to see.
If you need to create a custom Copilot/RAG application or AI agent to chat with SharePoint data in production environments, consider first building it directly via Microsoft Copilot Studio. If Copilot Studio doesn't meet your needs, consider:
Creating a custom connector with SharePoint webhooks, calling the Microsoft Graph API to export data to an Azure Blob container, and then using the Azure blob indexer for incremental indexing.
Creating your own Azure Logic Apps workflow that uses the Azure Logic Apps SharePoint connector and the Azure AI Search connector. The Azure AI Search connector is available once it reaches general availability. Use the workflow generated by the Azure portal wizard as a starting point, then customize it in the Azure Logic Apps designer to add the transformation steps you need. The workflow that the Azure AI Search wizard creates is a consumption workflow. For production workloads, switch to a standard logic app workflow to use its extra enterprise features.
Regardless of the approach you choose, whether building a custom connector with SharePoint webhooks or creating an Azure Logic Apps workflow, be sure to implement robust security measures. These measures include configuring shared private links, setting up firewalls, and preserving user permissions from the source and honoring those permissions at query time. You should also regularly audit and monitor your pipeline.
Configure the SharePoint in Microsoft 365 indexer
To set up the SharePoint in Microsoft 365 indexer, use a preview REST API. This section provides the steps.
(Optional) Step 1: Enable a system-assigned managed identity
Enable a system-assigned managed identity to automatically detect the tenant in which the search service is provisioned.
Perform this step if the SharePoint site is in the same tenant as the search service. Skip this step if the SharePoint site is in a different tenant. The identity is used for tenant detection. You can also skip this step if you want to put the tenant ID in the connection string. To use system-assigned or user-assigned managed identity for secretless indexing, configure the application permissions with secretless authentication.
After selecting Save, you receive an object ID assigned to your search service.
Step 2: Decide which permissions the indexer requires
For the decision matrix that covers ACL and non-ACL scenarios, see Choose your permissions setup. If you choose delegated permissions, user-delegated tokens expire every 75 minutes and require manual indexing using Run Indexer (preview) when they expire. Delegated permissions are recommended only for small testing operations.
Step 3: Create a Microsoft Entra application registration
The SharePoint in Microsoft 365 indexer uses a Microsoft Entra application for authentication. Create the application registration in the same tenant as Azure AI Search.
Sign in to the Azure portal.
Search for or navigate to Microsoft Entra ID, then select Add > App registration.
Select + New registration:
- Enter a name for your app.
- Select Single tenant.
- Skip the URI designation step. No redirect URI required.
- Select Register.
On the navigation pane under Manage, select API permissions, then Add a permission, then Microsoft Graph.
If your indexer uses application API permissions, choose Application permissions.
For standard indexing, select:
Files.Read.AllSites.Read.All
If you're enabling ACL ingestion (preview), the required permissions depend on which item types (document library files, list items, ASPX pages) and group types (Microsoft Entra vs. SharePoint site groups) you index. See Permissions by ACL scenario before completing this step. For the cross-scenario summary, see Choose your permissions setup.
Using application permissions means that the indexer accesses the SharePoint site in a service context. So when you run the indexer, it has access to all content in the SharePoint tenant, which requires tenant admin approval. A client secret or secretless configuration is also required for authentication. Setting up the authentication mechanism is described later in this article under authentication modes for application API permissions only.
If the indexer is using delegated API permissions, select Delegated permissions and then select
Delegated - Files.Read.All,Delegated - Sites.Read.All, andDelegated - User.Read.Delegated permissions allow the search client to connect to SharePoint under the security identity of the current user.
Give admin consent.
Tenant admin consent is required when using application API permissions. Some tenants are locked down in such a way that tenant admin consent is required for delegated API permissions as well. If either of these conditions apply, you'll need to have a tenant admin grant consent for this Microsoft Entra application before creating the indexer.
Select the Authentication tab.
Set Allow public client flows to Yes, then select Save.
Select + Add a platform, then Mobile and desktop applications, then check
https://login.microsoftonline.com/common/oauth2/nativeclient, then Configure.Configure the indexer authentication method according to your solution needs.
Available authentication methods for application API permissions only
To authenticate the Microsoft Entra application with application permissions, the indexer uses either a client secret or a secretless configuration.
Using client secret
These are the instructions to configure the application to use a client secret to authenticate the indexer, so it can ingest data from SharePoint.
Select Certificates & Secrets from the menu on the left, then Client secrets, then New client secret.
In the menu that pops up, enter a description for the new client secret. Adjust the expiration date if necessary. If the secret expires, it needs to be recreated and the indexer needs to be updated with the new secret.
The new client secret appears in the secret list. Once you navigate away from the page, the secret isn't visible, so copy the value using the copy button and save it in a secure location.
Using secretless authentication to obtain application tokens
Use federated credentials to sign in without a client secret. Microsoft Entra trusts a managed identity to obtain an application token, so the indexer can ingest data from SharePoint without a stored secret. The next section walks through configuring a managed identity.
Configuring the registered application with a managed identity
Create (or select) a user-assigned managed identity and assign to your search service or a system-assigned managed identity, depending on your scenario requirements.
Capture the object (principal) ID. Use this value as part of the credentials configuration when you create the data source.
Select Certificates & Secrets from the menu on the left.
Under Federated credentials select + Add a credential.
Under Federated credential scenario select Managed Identity.
Select managed identity: Choose the managed identity created in step 1.
Add a name for your credential and select Save.
Step 4: Create data source
Starting in this section, use the latest preview REST API and a REST client or the latest supported beta SDK of your preference for the remaining steps.
A data source specifies which data to index, credentials, and policies to efficiently identify changes in the data (new, modified, or deleted rows). Multiple indexers in the same search service can use the same data source.
For SharePoint indexing, the data source must have the following required properties:
- name is the unique name of the data source within your search service.
- type must be "sharepoint". This value is case-sensitive.
- credentials provide the SharePoint endpoint and the authentication method allowed for the application to request the Microsoft Entra tokens. An example SharePoint endpoint is
https://[your-tenant-name].sharepoint.com/teams/MySharePointSite. You can get the endpoint by navigating to the home page of your SharePoint site and copying the URL from the browser. Review the connection string format for the supported syntax. - container specifies which document library to index. Properties control which documents are indexed.
To create a data source, call Create Data Source (preview).
Here's a data source definition sample for credentials with application secret or system-assigned managed identity.
POST https://[service name].search.windows.net/datasources?api-version=2026-05-01-preview
Content-Type: application/json
api-key: [admin key]
{
"name" : "sharepoint-datasource",
"type" : "sharepoint",
"credentials" : { "connectionString" : "[connection-string]" },
"container" : { "name" : "defaultSiteLibrary", "query" : null }
}
For user-assigned managed identities, supply the identity block in the data source and omit FederatedCredentialApplicationId from the connection string. For system-assigned managed identities, set FederatedCredentialApplicationId in the connection string (see the connection-string formats below).
POST https://[service name].search.windows.net/datasources?api-version=2026-05-01-preview
Content-Type: application/json
api-key: [admin key]
{
"name" : "sharepoint-datasource",
"type" : "sharepoint",
"credentials" : { "connectionString" : "[connection-string]" },
"container" : { "name" : "defaultSiteLibrary", "query" : null },
"identity": {
"@odata.type": "#Microsoft.Azure.Search.DataUserAssignedIdentity",
"userAssignedIdentity": "/subscriptions/[Azure subscription ID]/resourceGroups/[resource-group]/providers/Microsoft.ManagedIdentity/userAssignedIdentities/[user-assigned managed identity]"
}
}
Connection string format
The format of the connection string changes based on whether the indexer is using delegated API permissions or application API permissions.
Delegated API permissions connection string format
SharePointOnlineEndpoint=[SharePoint site url];ApplicationId=[Azure AD App ID];TenantId=[SharePoint site tenant id]Application API permissions with application secret connection string format
SharePointOnlineEndpoint=[SharePoint site url];ApplicationId=[Azure AD App ID];ApplicationSecret=[Azure AD App client secret];TenantId=[SharePoint site tenant id]Application API permissions with secretless (federated identity credential) connection string format:
SharePointOnlineEndpoint=[SharePoint site url];ApplicationId=[Azure AD App ID];FederatedCredentialApplicationId=[Entra application (client) ID that the FIC federates to];TenantId=[SharePoint site tenant id]
The following table describes each connection string field.
| Field | Required | Description |
|---|---|---|
SharePointOnlineEndpoint |
Yes | SharePoint site URL (for example, https://[your-tenant-name].sharepoint.com). |
ApplicationId |
Yes | Microsoft Entra application (client) ID of the ingestion app. Must be a valid GUID. |
TenantId |
Optional | Microsoft Entra tenant GUID. Required when the SharePoint site is in a different tenant from the search service. |
ApplicationSecret |
Conditional | Client secret of the ingestion app. Use for secret-based authentication. |
FederatedCredentialApplicationId |
Conditional (FIC mode) | Microsoft Entra application (client) ID that the federated identity credential federates to. Must be a valid GUID. |
Important
FederatedCredentialApplicationId and ApplicationSecret are mutually exclusive. Connection strings that combine them are rejected on data source create or update.
Note
For backward compatibility, the SharePoint indexer still accepts FederatedCredentialObjectId (the object/principal ID of the federated identity credential on the ingestion app) in the connection string, so existing data sources keep working without changes. Use FederatedCredentialApplicationId for new and updated data sources.
You can get tenantId from the Overview page in the Microsoft Entra admin center in your Microsoft 365 subscription.
You can get the managed identity object (principal) ID from the Configuring the registered application with a managed identity section.
Note
If the SharePoint site is in the same tenant as the search service and system-assigned managed identity is enabled, TenantId doesn't have to be included in the connection string. If the SharePoint site is in a different tenant from the search service, TenantId must be included.
The following example shows a data source created with FederatedCredentialApplicationId:
PUT https://[service name].search.windows.net/datasources/sharepoint-ds?api-version=2026-05-01-preview
Content-Type: application/json
api-key: [admin key]
{
"name": "sharepoint-ds",
"type": "sharepoint",
"credentials": {
"connectionString": "SharePointOnlineEndpoint=https://[your-tenant-name].sharepoint.com;ApplicationId=[Azure AD App ID];TenantId=[SharePoint site tenant id];FederatedCredentialApplicationId=[Entra application (client) ID that the FIC federates to]"
},
"container": { "name": "defaultSiteLibrary" }
}
If your indexer uses SharePoint ACL configuration (preview) or preserves and honors Microsoft Purview sensitivity labels (preview), review the related articles before you create the indexer. Each feature has specific data source, index, and skillset configuration steps.
Step 5: Create an index
The index specifies the fields in a document, attributes, and other constructs that shape the search experience.
To create an index, call Create Index (preview):
POST https://[service name].search.windows.net/indexes?api-version=2026-05-01-preview
Content-Type: application/json
api-key: [admin key]
{
"name" : "sharepoint-index",
"fields": [
{ "name": "id", "type": "Edm.String", "key": true, "searchable": false },
{ "name": "metadata_spo_item_name", "type": "Edm.String", "key": false, "searchable": true, "filterable": false, "sortable": false, "facetable": false },
{ "name": "metadata_spo_item_path", "type": "Edm.String", "key": false, "searchable": false, "filterable": false, "sortable": false, "facetable": false },
{ "name": "metadata_spo_item_content_type", "type": "Edm.String", "key": false, "searchable": false, "filterable": true, "sortable": false, "facetable": true },
{ "name": "metadata_spo_item_last_modified", "type": "Edm.DateTimeOffset", "key": false, "searchable": false, "filterable": false, "sortable": true, "facetable": false },
{ "name": "metadata_spo_item_size", "type": "Edm.Int64", "key": false, "searchable": false, "filterable": false, "sortable": false, "facetable": false },
{ "name": "content", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false }
]
}
Important
The key field in an index populated by the SharePoint in Microsoft 365 indexer depends on the container type in the data source:
- For document-library content (
defaultSiteLibrary,allSiteLibraries, oruseQuerywith library or folder filters), usemetadata_spo_site_library_item_id. If a key field doesn't exist in the data source,metadata_spo_site_library_item_idis automatically mapped to the key field. - For list, page, or mixed content (
allSiteLists,allSitePages, orallSiteContent), usemetadata_spo_site_asset_item_id. This key field is in preview, starting in the 2026-05-01-preview REST API. Auto-mapping doesn't apply to this field — define an explicitfieldMappingsentry frommetadata_spo_site_asset_item_idto your index key field.
Apply the base64Encode mapping function when mapping these key fields to your index id field.
Step 6: Create an indexer
An indexer connects a data source with a target search index and provides a schedule to automate the data refresh. After the data source and index are created, you can create the indexer.
To create the indexer:
Send a Create Indexer (preview) request:
POST https://[service name].search.windows.net/indexers?api-version=2026-05-01-preview Content-Type: application/json api-key: [admin key] { "name" : "sharepoint-indexer", "dataSourceName" : "sharepoint-datasource", "targetIndexName" : "sharepoint-index", "parameters": { "batchSize": null, "maxFailedItems": null, "base64EncodeKeys": null, "maxFailedItemsPerBatch": null, "configuration": { "indexedFileNameExtensions" : ".pdf, .docx", "excludedFileNameExtensions" : ".png, .jpg", "dataToExtract": "contentAndMetadata" } }, "schedule" : { }, "fieldMappings" : [ { "sourceFieldName" : "metadata_spo_site_library_item_id", "targetFieldName" : "id", "mappingFunction" : { "name" : "base64Encode" } } ] }For data sources that use the
allSiteLists,allSitePages, orallSiteContentcontainer values, mapmetadata_spo_site_asset_item_idinstead ofmetadata_spo_site_library_item_id.When you use application permissions, the index can be queried while the initial indexer run is in progress, but only items that have already been indexed return results. Wait until the run completes for full coverage. The remaining instructions in this step apply only to delegated permissions.
When you create the indexer for the first time, the Create Indexer (preview) request waits until you complete the next step. You must call Get Indexer Status to get the link and enter your new device code.
GET https://[service name].search.windows.net/indexers/sharepoint-indexer/status?api-version=2026-05-01-preview Content-Type: application/json api-key: [admin key]If you don't call Get Indexer Status within 10 minutes, the code expires and you must recreate the data source.
Copy the device login code from the Get Indexer Status response. The device login can be found in the "errorMessage".
{ "lastResult": { "status": "transientFailure", "errorMessage": "To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code <CODE> to authenticate." } }Enter the code that was included in the error message.
The SharePoint in Microsoft 365 indexer will access the SharePoint content as the signed-in user. The user that logs in during this step will be that signed-in user. So, if you sign in with a user account that doesn't have access to a document in the Document Library that you want to index, the indexer won't have access to that document.
If possible, create a new organizational user account and grant it the exact permissions that you want the indexer to have.
Approve the permissions that are being requested.
The Create Indexer (preview) initial request completes if all the permissions provided above are correct and within the 10-minute timeframe.
Note
If the Microsoft Entra application requires admin approval and wasn't approved before logging in, you might see the following screen. Admin approval is required to continue.
Step 7: Check the indexer status
After the indexer has been created, you can call Get Indexer Status:
GET https://[service name].search.windows.net/indexers/sharepoint-indexer/status?api-version=2026-05-01-preview
Content-Type: application/json
api-key: [admin key]
GET https://[service-name].search.windows.net/indexes/[index-name]/docs?search=*&$count=true&api-version=2026-05-01-preview
api-key: [admin-api-key]
Update the data source
If there are no updates to the data source object, the indexer runs on a schedule without any user interaction.
If you change the data source while the device code is expired, sign in again to run the indexer. For example, if you change the data source query, sign in again using the https://microsoft.com/devicelogin and get the new device code.
Here are the steps for updating a data source, assuming an expired device code:
Call Run Indexer (preview) to manually start indexer execution.
POST https://[service name].search.windows.net/indexers/sharepoint-indexer/run?api-version=2026-05-01-preview Content-Type: application/json api-key: [admin key]Check the indexer status.
GET https://[service name].search.windows.net/indexers/sharepoint-indexer/status?api-version=2026-05-01-preview Content-Type: application/json api-key: [admin key]If you get an error asking you to visit
https://microsoft.com/devicelogin, open the page and copy the new code.Paste the code into the dialog box.
Manually run the indexer again and check the indexer status. This time, the indexer run should successfully start.
Index document metadata
If you're indexing document metadata ("dataToExtract": "contentAndMetadata"), the following metadata is available to index.
| Identifier | Type | Description |
|---|---|---|
| metadata_spo_site_library_item_id | Edm.String | The combination key of site ID, library ID, and item ID, which uniquely identifies an item in a document library for a site. Use this field as the index key for the defaultSiteLibrary, allSiteLibraries, and useQuery (library or folder filters) container values. |
| metadata_spo_site_asset_item_id | Edm.String | The combination key that uniquely identifies a list item, ASPX site page, or any asset in mixed-content mode. Use this field as the index key for the allSiteLists, allSitePages, and allSiteContent container values. Preview, starting in the 2026-05-01-preview REST API. |
| metadata_spo_site_id | Edm.String | The ID of the SharePoint site. |
| metadata_spo_library_id | Edm.String | The ID of document library. |
| metadata_spo_item_id | Edm.String | The ID of the (document) item in the library. |
| metadata_spo_item_last_modified | Edm.DateTimeOffset | The last modified date/time (UTC) of the item. |
| metadata_spo_item_name | Edm.String | The name of the item. |
| metadata_spo_item_size | Edm.Int64 | The size (in bytes) of the item. |
| metadata_spo_item_content_type | Edm.String | The content type of the item. |
| metadata_spo_item_extension | Edm.String | The extension of the item. |
| metadata_spo_item_weburi | Edm.String | The URI of the item. |
| metadata_spo_item_path | Edm.String | The combination of the parent path and item name. |
The SharePoint in Microsoft 365 indexer also supports metadata specific to each document type. For more information, see Content metadata properties used in Azure AI Search.
Note
To index custom metadata, "additionalColumns" must be specified in the query parameter of the data source.
Index SharePoint lists
SharePoint lists are indexable in preview, starting in the 2026-05-01-preview REST API. Set the data source container.name to allSiteLists to index all list items from a site, or to allSiteContent to combine list items with document libraries and site pages in a single indexer. To include subsite lists, add includeSubsites=true to the container.query.
For list-based or mixed-content indexers, the index key field must map from metadata_spo_site_asset_item_id. The list item content is surfaced in the content field as JSON-formatted field values, and the standard metadata_spo_item_* fields (such as metadata_spo_item_name, metadata_spo_item_weburi, and metadata_spo_item_last_modified) are populated for each list item.
Map list columns to index fields
Each column defined on a SharePoint list is exposed as a source field with the same name as the SharePoint column. You can map each column to an index field by using field mappings.
For example, consider a SharePoint list with the following columns.
| SharePoint column | SharePoint column type |
|---|---|
Title |
Single line of text |
Price |
Number |
InStock |
Yes/No |
Category |
Choice |
Add matching fields to your index definition, then map each column to its target field in the indexer:
{
"name": "my-sharepoint-list-indexer",
"dataSourceName": "my-sharepoint-list-ds",
"targetIndexName": "products-index",
"fieldMappings": [
{
"sourceFieldName": "metadata_spo_site_asset_item_id",
"targetFieldName": "id",
"mappingFunction": { "name": "base64Encode" }
},
{ "sourceFieldName": "Title", "targetFieldName": "productName" },
{ "sourceFieldName": "Price", "targetFieldName": "price" },
{ "sourceFieldName": "InStock", "targetFieldName": "available" },
{ "sourceFieldName": "Category", "targetFieldName": "category" },
{ "sourceFieldName": "metadata_spo_item_last_modified", "targetFieldName": "lastUpdated" },
{ "sourceFieldName": "metadata_spo_item_weburi", "targetFieldName": "itemUrl" }
]
}
Make sure each target field exists in your index with a compatible type (for example, Edm.String for Title, Edm.Double or Edm.Int64 for Price, Edm.Boolean for InStock).
Index ASPX site pages
Modern ASPX site pages are indexable in preview, starting in the 2026-05-01-preview REST API. Set the data source container.name to allSitePages to index all pages from a site, or to allSiteContent to combine pages with document libraries and lists in a single indexer. To include subsite pages, add includeSubsites=true to the container.query.
For page-based or mixed-content indexers, the index key field must map from metadata_spo_site_asset_item_id. Page text is extracted into the content field, and the standard metadata_spo_item_* fields (such as metadata_spo_item_name, metadata_spo_item_weburi, and metadata_spo_item_last_modified) are populated for each page.
Include or exclude by file type
You can control which files are indexed by setting inclusion and exclusion criteria in the "parameters" section of the indexer definition.
Include specific file extensions by setting "indexedFileNameExtensions" to a comma-separated list of file extensions (with a leading dot). Exclude specific file extensions by setting "excludedFileNameExtensions" to the extensions that should be skipped. If the same extension is in both lists, it's excluded from indexing.
PUT /indexers/[indexer name]?api-version=2026-05-01-preview
{
"parameters" : {
"configuration" : {
"indexedFileNameExtensions" : ".pdf, .docx",
"excludedFileNameExtensions" : ".png, .jpeg"
}
}
}
Control which documents are indexed
A single SharePoint in Microsoft 365 indexer can index content from one or more document libraries. To specify which sites and document libraries to index, use the "container" parameter in the data source definition.
The data source "container" section has two properties for this task: "name" and "query".
Name
The "name" property is required and must be one of three values:
| Value | Description |
|---|---|
| defaultSiteLibrary | Index all content from the site's default document library. |
| allSiteLibraries | Index all content from all document libraries in a site. Document libraries from a subsite are out of scope unless you set includeSubsites=true in the query (preview, 2026-05-01-preview). You can also choose useQuery and specify includeLibrariesInSite to scope to specific sites or subsites. |
| allSiteLists | Index all SharePoint list items from a site. Preview, starting in the 2026-05-01-preview REST API. |
| allSitePages | Index all modern ASPX site pages from a site. Preview, starting in the 2026-05-01-preview REST API. |
| allSiteContent | Index libraries, lists, and pages from a site in a single indexer. Preview, starting in the 2026-05-01-preview REST API. |
| useQuery | Only index the content defined in the query. |
For data sources that use allSiteLists, allSitePages, or allSiteContent, the indexer key field mapping must use metadata_spo_site_asset_item_id instead of metadata_spo_site_library_item_id. For details, see Step 6: Create an indexer.
Query
The "query" parameter of the data source is made up of keyword/value pairs. The following keywords can be used. The values are either site URLs or document library URLs.
Note
To get the value for a particular keyword, navigate to the document library you want to include or exclude and copy the URI from the browser. This is the easiest way to get the value to use with a keyword in the query.
| Keyword | Value description and examples |
|---|---|
| null | If null or empty, index either the default document library or all document libraries depending on the container name. Example: "container" : { "name" : "defaultSiteLibrary", "query" : null } |
| includeSubsites | When set to true, the indexer traverses the root site and all subsites. Combine with allSiteLibraries, allSiteLists, allSitePages, or allSiteContent. Preview, starting in the 2026-05-01-preview REST API. Example: "container" : { "name" : "allSiteLibraries", "query" : "includeSubsites=true" } |
| includeLibrariesInSite | Index content from all libraries under the specified site in the connection string. The value should be the URI of the site or subsite. Example 1: "container" : { "name" : "useQuery", "query" : "includeLibrariesInSite=https://mycompany.sharepoint.com/mysite" } Example 2 (include a few subsites only): "container" : { "name" : "useQuery", "query" : "includeLibrariesInSite=https://mycompany.sharepoint.com/sites/TopSite/SubSite1;includeLibrariesInSite=https://mycompany.sharepoint.com/sites/TopSite/SubSite2" } |
| includeLibrary | Index all content from this library. The value is the fully qualified path to the library, which can be copied from your browser: Example 1 (fully qualified path): "container" : { "name" : "useQuery", "query" : "includeLibrary=https://mycompany.sharepoint.com/mysite/MyDocumentLibrary" } Example 2 (URI copied from your browser): "container" : { "name" : "useQuery", "query" : "includeLibrary=https://mycompany.sharepoint.com/teams/mysite/MyDocumentLibrary/Forms/AllItems.aspx" } |
| excludeLibrary | Don't index content from this library. The value is the fully qualified path to the library, which can be copied from your browser: Example 1 (fully qualified path): "container" : { "name" : "useQuery", "query" : "includeLibrariesInSite=https://mysite.sharepoint.com/subsite1; excludeLibrary=https://mysite.sharepoint.com/subsite1/MyDocumentLibrary" } Example 2 (URI copied from your browser): "container" : { "name" : "useQuery", "query" : "includeLibrariesInSite=https://mycompany.sharepoint.com/teams/mysite; excludeLibrary=https://mycompany.sharepoint.com/teams/mysite/MyDocumentLibrary/Forms/AllItems.aspx" } |
| includeFolder | Index content from a specific folder and its subfolders. Value must be a full SharePoint folder URL. Behavior: Applies recursively to all subfolders. Multiple folders can be specified by repeating the parameter with semicolons. Folder filters are scoped to a single document library. Root-only paths aren't supported. If a folder referenced is renamed, the query must be updated. Example 1 (single folder): "container": { "name": "useQuery", "query": "includeFolder=[your-tenant-name].sharepoint.com/sites/hr/Shared Documents/Policies" }Example 2 (multiple folders): "container": { "name": "useQuery", "query": "includeFolder=[your-tenant-name].sharepoint.com/sites/hr/Shared Documents/Specs;includeFolder=[your-tenant-name].sharepoint.com/sites/hr/Shared Documents/Designs" } |
| excludeFolder | Don't index content from a specific folder and its subfolders. Value must be a full SharePoint folder URL. Behavior: Applies recursively to all subfolders. If a file matches both include and exclude rules, exclude takes precedence and the file is skipped. Folder filters are scoped to a single document library. Example 1 (exclude folder): "container": { "name": "useQuery", "query": "excludeFolder=[your-tenant-name].sharepoint.com/sites/hr/Shared Documents/Policies/Archive" }Example 2 (combine include + exclude): "container": { "name": "useQuery", "query": "includeFolder=[your-tenant-name].sharepoint.com/sites/hr/Shared Documents/Policies;excludeFolder=[your-tenant-name].sharepoint.com/sites/hr/Shared Documents/Policies/Drafts" } |
| additionalColumns | Index columns from the document library. The value is a comma-separated list of column names you want to index. Use a double backslash to escape semicolons and commas in column names: Example 1 (additionalColumns=MyCustomColumn,MyCustomColumn2): "container" : { "name" : "useQuery", "query" : "includeLibrary=https://mycompany.sharepoint.com/mysite/MyDocumentLibrary;additionalColumns=MyCustomColumn,MyCustomColumn2" } Example 2 (escape characters using double backslash): "container" : { "name" : "useQuery", "query" : "includeLibrary=https://mycompany.sharepoint.com/teams/mysite/MyDocumentLibrary/Forms/AllItems.aspx;additionalColumns=MyCustomColumnWith\\,,MyCustomColumnWith\\;" } |
Handle errors
By default, the SharePoint in Microsoft 365 indexer stops as soon as it encounters a document with an unsupported content type, such as an image. You can use the excludedFileNameExtensions parameter to skip certain content types. However, you might need to index documents without knowing all the possible content types in advance. To continue indexing when an unsupported content type is encountered, set the failOnUnsupportedContentType configuration parameter to false:
PUT https://[service name].search.windows.net/indexers/[indexer name]?api-version=2026-05-01-preview
Content-Type: application/json
api-key: [admin key]
{
... other parts of indexer definition
"parameters" : { "configuration" : { "failOnUnsupportedContentType" : false } }
}
For some documents, Azure AI Search is unable to determine the content type or unable to process a document of an otherwise supported content type. To ignore this failure mode, set the failOnUnprocessableDocument configuration parameter to false:
"parameters" : { "configuration" : { "failOnUnprocessableDocument" : false } }
Azure AI Search limits the size of documents that are indexed. These limits are documented in Service Limits in Azure AI Search. Oversized documents are treated as errors by default. However, you can still index storage metadata of oversized documents if you set the indexStorageMetadataOnlyForOversizedDocuments configuration parameter to true:
"parameters" : { "configuration" : { "indexStorageMetadataOnlyForOversizedDocuments" : true } }
You can also continue indexing if errors happen at any point of processing, either while parsing documents or while adding documents to an index. To ignore a specific number of errors, set the maxFailedItems and maxFailedItemsPerBatch configuration parameters to the desired values. For example:
{
... other parts of indexer definition
"parameters" : { "maxFailedItems" : 10, "maxFailedItemsPerBatch" : 10 }
}
If a file on the SharePoint site has encryption enabled, you might see the following error message:
Code: resourceModified Message: The resource has changed since the caller last read it; usually an eTag mismatch Inner error: Code: irmEncryptFailedToFindProtector
The error message also includes the SharePoint site ID, drive ID, and drive item ID in the following pattern: <sharepoint site id> :: <drive id> :: <drive item id>. Use this information to identify which item is failing on the SharePoint end. The user can then remove the encryption from the item to resolve the issue.
Related content
- YouTube video: SharePoint in Microsoft 365 indexer
- Indexers in Azure AI Search
- Content metadata properties used in Azure AI Search
- Index SharePoint content and other sources in Azure AI Search using Azure Logic App connectors
- Ingest SharePoint ACL configuration (preview)
- Synchronize ACLs between SharePoint and the index
- Configure SharePoint groups support
- Preserve and honor Microsoft Purview sensitivity labels (preview)