Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

使用服务器端自动化搜索多区域沿袭

本文档介绍了如何使用 searchLineageStreaming API 查找多级跨区域数据沿袭。

searchLineageStreaming API 从一组定义的根实体开始，在指定方向（上游或下游）执行广度优先搜索，并以实时流式传输响应的形式返回统一的沿袭图。

如需了解详情，请参阅多区域沿袭搜索简介。

主要功能

searchLineageStreaming API 具有以下功能：

广度优先搜索：逐层遍历沿袭图，准确计算每个连接资产的深度。
流式传输响应：返回后端系统发现的子图和沿袭链接。对于广泛或深入的沿袭图，这种方式非常高效，并且可以防止请求超时。
多位置和多项目遍历：虽然您在请求路径中仅指定一个结算项目，但只要您拥有所需的权限，API 就会自动发现并遍历多个 Google Cloud 项目和地理位置的沿袭链接。
精细的列级沿袭：支持搜索资产之间的列级依赖项。
通配符查找：让您可以通过在完全限定名称 (FQN) 后添加 * 来检索特定实体的所有列级沿袭。
流水线洞见：可以选择检索有关创建沿袭链接的转换流水线（进程）的元数据。

准备工作

在向 API 发出请求之前，请确保您已满足以下安全和环境前提条件：

所需的角色

如需获得搜索数据沿袭链接所需的权限，请让您的管理员为您授予存储沿袭链接和进程的项目中的Data Lineage Viewer (roles/datalineage.viewer) IAM 角色。如需详细了解如何授予角色，请参阅管理对项目、文件夹和组织的访问权限。

此预定义角色包含搜索数据沿袭链接所需的权限。如需查看所需的确切权限，请展开所需权限部分：

所需权限

您必须拥有以下权限才能搜索数据沿袭链接：

搜索实体级沿袭： datalineage.events.get 对存储链接的项目具有权限
搜索列级沿袭： datalineage.events.getFields 对存储链接的项目具有权限
检索完整的流水线进程详细信息： datalineage.processes.get 对存储进程的项目具有

您也可以使用自定义角色或其他预定义角色来获取这些权限。

资源范围界定

配置 API 请求时，您必须区分用于管理结算的资源和 API 扫描的实际位置：

结算父路径：网址请求中的 parent 路径必须使用格式 projects/project/locations/location. 此特定项目-位置对专门用于评估结算配额和 API 速率限制。
目标位置：在请求正文内的 locations 数组中明确定义您希望后端扫描的区域。

身份验证设置

使用 Google Cloud 访问令牌初始化环境变量，以对 curl 命令进行身份验证：

export ACCESS_TOKEN=$(gcloud auth print-access-token)

用法示例

以下示例使用端点 datalineage.googleapis.com。

搜索多级多项目沿袭

如需执行深度沿袭搜索，以遍历图的多个深度并扫描不同的项目，请定义以下变量： Google Cloud

将 limits.maxDepth 设置为目标遍历深度（接受 1 到 100 之间的值）。
使用您希望后端交叉对比的目标区域填充 locations 数组（例如 ["us", "us-east1"]）。

C#

C#

试用此示例之前，请按照 C# 设置说明进行操作。请按照 Knowledge Catalog 快速入门：使用客户端库中的说明进行操作。如需了解详情，请参阅 Knowledge Catalog C# API 参考文档。

如需向 Knowledge Catalog 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

using Google.Api.Gax.Grpc;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.DataCatalog.Lineage.V1;
using System.Threading.Tasks;

public sealed partial class GeneratedLineageClientSnippets
{
    /// <summary>Snippet for SearchLineageStreaming</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public async Task SearchLineageStreamingRequestObject()
    {
        // Create client
        LineageClient lineageClient = LineageClient.Create();
        // Initialize request argument(s)
        SearchLineageStreamingRequest request = new SearchLineageStreamingRequest
        {
            ParentAsLocationName = LocationName.FromProjectLocation("[PROJECT]", "[LOCATION]"),
            Locations = { "", },
            RootCriteria = new SearchLineageStreamingRequest.Types.RootCriteria(),
            Direction = SearchLineageStreamingRequest.Types.SearchDirection.Unspecified,
            Filters = new SearchLineageStreamingRequest.Types.SearchFilters(),
            Limits = new SearchLineageStreamingRequest.Types.SearchLimits(),
        };
        // Make the request, returning a streaming response
        using LineageClient.SearchLineageStreamingStream response = lineageClient.SearchLineageStreaming(request);

        // Read streaming responses from server until complete
        // Note that C# 8 code can use await foreach
        AsyncResponseStream<SearchLineageStreamingResponse> responseStream = response.GetResponseStream();
        while (await responseStream.MoveNextAsync())
        {
            SearchLineageStreamingResponse responseItem = responseStream.Current;
            // Do something with streamed response
        }
        // The response stream has completed
    }
}

Java

Java

试用此示例之前，请按照 Java 设置说明进行操作，具体请参阅 Knowledge Catalog 快速入门：使用客户端库。如需了解详情，请参阅 Knowledge Catalog Java API 参考文档。

如需向 Knowledge Catalog 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

import com.google.api.gax.rpc.ServerStream;
import com.google.cloud.datacatalog.lineage.v1.LineageClient;
import com.google.cloud.datacatalog.lineage.v1.LocationName;
import com.google.cloud.datacatalog.lineage.v1.SearchLineageStreamingRequest;
import com.google.cloud.datacatalog.lineage.v1.SearchLineageStreamingResponse;
import java.util.ArrayList;

public class AsyncSearchLineageStreaming {

  public static void main(String[] args) throws Exception {
    asyncSearchLineageStreaming();
  }

  public static void asyncSearchLineageStreaming() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (LineageClient lineageClient = LineageClient.create()) {
      SearchLineageStreamingRequest request =
          SearchLineageStreamingRequest.newBuilder()
              .setParent(LocationName.of("[PROJECT]", "[LOCATION]").toString())
              .addAllLocations(new ArrayList<String>())
              .setRootCriteria(SearchLineageStreamingRequest.RootCriteria.newBuilder().build())
              .setFilters(SearchLineageStreamingRequest.SearchFilters.newBuilder().build())
              .setLimits(SearchLineageStreamingRequest.SearchLimits.newBuilder().build())
              .build();
      ServerStream<SearchLineageStreamingResponse> stream =
          lineageClient.searchLineageStreamingCallable().call(request);
      for (SearchLineageStreamingResponse response : stream) {
        // Do something when a response is received.
      }
    }
  }
}

Node.js

Java

试用此示例之前，请按照 Java 设置说明进行操作。请按照 Knowledge Catalog 快速入门：使用客户端库中的说明进行操作。

如需向 Knowledge Catalog 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  Required. The project and location to initiate the search from.
 */
// const parent = 'abc123'
/**
 *  Required. The locations to search in.
 */
// const locations = ['abc','def']
/**
 *  Required. Criteria for the root of the search.
 */
// const rootCriteria = {}
/**
 *  Required. Direction of the search.
 */
// const direction = {}
/**
 *  Optional. Filters for the search.
 */
// const filters = {}
/**
 *  Optional. Limits for the search.
 */
// const limits = {}

// Imports the Lineage library
const {LineageClient} = require('@google-cloud/lineage').v1;

// Instantiates a client
const lineageClient = new LineageClient();

async function callSearchLineageStreaming() {
  // Construct request
  const request = {
    parent,
    locations,
    rootCriteria,
    direction,
  };

  // Run request
  const stream = await lineageClient.searchLineageStreaming(request);
  stream.on('data', (response) => { console.log(response) });
  stream.on('error', (err) => { throw(err) });
  stream.on('end', () => { /* API call completed */ });
}

callSearchLineageStreaming();

Python

Python

试用此示例之前，请按照 Python 设置说明进行操作，具体请参阅 Knowledge Catalog 快速入门：使用使用客户端库。如需了解详情，请参阅 Knowledge Catalog Python API 参考文档。

如需向 Knowledge Catalog 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

# This snippet has been automatically generated and should be regarded as a
# code template only.
# It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
#   client as shown in:
#   https://googleapis.dev/python/google-api-core/latest/client_options.html
from google.cloud import datacatalog_lineage_v1


def sample_search_lineage_streaming():
    # Create a client
    client = datacatalog_lineage_v1.LineageClient()

    # Initialize request argument(s)
    request = datacatalog_lineage_v1.SearchLineageStreamingRequest(
        parent="parent_value",
        locations=["locations_value1", "locations_value2"],
        direction="UPSTREAM",
    )

    # Make the request
    stream = client.search_lineage_streaming(request=request)

    # Handle the response
    for response in stream:
        print(response)

Ruby

Ruby

试用此示例之前，请按照 Ruby 设置说明进行操作。请参阅 Knowledge Catalog 快速入门：使用客户端库。如需了解详情，请参阅 Knowledge Catalog Ruby API 参考文档。

如需向 Knowledge Catalog 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

require "google/cloud/data_catalog/lineage/v1"

##
# Snippet for the search_lineage_streaming call in the Lineage service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DataCatalog::Lineage::V1::Lineage::Client#search_lineage_streaming.
#
def search_lineage_streaming
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DataCatalog::Lineage::V1::Lineage::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DataCatalog::Lineage::V1::SearchLineageStreamingRequest.new

  # Call the search_lineage_streaming method to start streaming.
  output = client.search_lineage_streaming request

  # The returned object is a streamed enumerable yielding elements of type
  # ::Google::Cloud::DataCatalog::Lineage::V1::SearchLineageStreamingResponse
  output.each do |current_response|
    p current_response
  end
end

REST

如需搜索数据沿袭，请使用 searchLineageStreaming 方法。

在使用任何请求数据之前，请先进行以下替换：

PROJECT_ID：用于管理结算和配额评估的项目 ID。 Google Cloud
LOCATION_ID：位置，例如 us-central1。 Google Cloud
SOURCE_PROJECT_ID：源表所在的项目 Google Cloud ID。
DATASET_ID：BigQuery 数据集 ID。
TABLE_ID：BigQuery 表 ID。

HTTP 方法和网址：

POST https://datalineage.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID:searchLineageStreaming

请求 JSON 正文：

{
  "parent": "projects/PROJECT_ID/locations/LOCATION_ID",
  "locations": [
    "LOCATION_ID",
    "us-east1",
    "us-central1"
  ],
  "rootCriteria": {
    "entities": {
      "entities": [
        {
          "fullyQualifiedName": "bigquery:SOURCE_PROJECT_ID.DATASET_ID.TABLE_ID"
        }
      ]
    }
  },
  "direction": "DOWNSTREAM",
  "limits": {
    "maxDepth": 10,
    "maxResults": 5000
  }
}

如需发送您的请求，请展开以下选项之一：

curl（Linux、macOS 或 Cloud Shell）

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI，或者使用了 Cloud Shell，这会使您自动登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://datalineage.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID:searchLineageStreaming"

PowerShell (Windows)

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://datalineage.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID:searchLineageStreaming" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：

{
  "links": [
    {
      "source": {
        "fullyQualifiedName": "bigquery:project-prod.dataset.source_table"
      },
      "target": {
        "fullyQualifiedName": "bigquery:project-prod.dataset.target_table"
      },
      "depth": 1,
      "location": "us"
    }
  ]
}

搜索多个地理位置

您可以通过修改在 locations 重复数组字段内传递的地理区域来限制或扩大沿袭图扫描范围。

例如：

curl -H "Authorization: Bearer ${ACCESS_TOKEN}" \
-H "Content-Type: application/json" \
-X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
--data '{
  "parent": "projects/my-billing-project/locations/us",
  "locations": ["us", "europe-west1", "asia-south2"],
  "rootCriteria": {
    "entities": {
      "entities": [{
        "fullyQualifiedName": "bigquery:my-project.dataset.global_table"
      }]
    }
  },
  "direction": "DOWNSTREAM"
}'

检索沿袭链接的进程名称

默认情况下，API 会省略进程信息（maxProcessPerLink 默认为 0）。如需检索创建数据链接的流水线的资源名称，请将 limits.maxProcessPerLink 配置为非零正整数。

例如：

curl -H "Authorization: Bearer ${ACCESS_TOKEN}" \
-H "Content-Type: application/json" \
-X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
--data '{
  "parent": "projects/my-billing-project/locations/us",
  "locations": ["us"],
  "rootCriteria": {
    "entities": {
      "entities": [{
        "fullyQualifiedName": "bigquery:my-project.dataset.target_table"
      }]
    }
  },
  "direction": "UPSTREAM",
  "limits": {
    "maxProcessPerLink": 5
  }
}'

响应行为：生成的流会使用仅包含其绝对系统资源名称（例如 projects/my-project/locations/us/processes/my-process）的进程消息填充 links[].processes 字段。

使用 FieldMask 检索完整的进程详细信息

如果您需要有关流水线的完整结构元数据（例如其 displayName、系统 attributes 或执行 origin），而不是仅需要其资源名称，则必须使用 API FieldMask：

为 limits.maxProcessPerLink 提供非零值。
将 fields 查询参数附加到网址路径，指定 links.processes.process 以及其他必需字段。

例如：

curl -H "Authorization: Bearer ${ACCESS_TOKEN}" \
-H "Content-Type: application/json" \
-X POST "https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming?fields=links.processes.process,links.source,links.target,links.depth" \
--data '{
  "parent": "projects/my-billing-project/locations/us",
  "locations": ["us"],
  "rootCriteria": {
    "entities": {
      "entities": [{
        "fullyQualifiedName": "bigquery:my-project.dataset.target_table"
      }]
    }
  },
  "direction": "UPSTREAM",
  "limits": {
    "maxProcessPerLink": 5
  }
}'

同时搜索表级和列级沿袭

您可以在单个请求中搜索表级（资产级）和列级（字段级）沿袭，方法是在 rootCriteria.entities.entities 列表中提供多个实体：

对于表级沿袭，请省略 field 数组。
对于列级沿袭，请在 field 数组中指定单个列。

例如：

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json" \
     -X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
     --data '{
       "parent": "projects/my-billing-project/locations/us",
       "locations": ["us"],
       "rootCriteria": {
         "entities": {
           "entities": [
             {
               "fullyQualifiedName": "bigquery:my-project.dataset.table_a"
             },
             {
               "fullyQualifiedName": "bigquery:my-project.dataset.table_b",
               "field": ["email"]
             }
           ]
         }
       },
       "direction": "DOWNSTREAM"
     }'

对列级沿袭使用通配符

如需搜索特定表的所有可用列级沿袭，而无需单独列出每个列，请使用通配符 * 作为 field 数组中的单个值。

例如：

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json" \
     -X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
     --data '{
       "parent": "projects/my-billing-project/locations/us",
       "locations": ["us"],
       "rootCriteria": {
         "entities": {
           "entities": [{
             "fullyQualifiedName": "bigquery:my-project.dataset.my_table",
             "field": ["*"]
           }]
         }
       },
       "direction": "DOWNSTREAM"
     }'

过滤沿袭结果

您可以使用请求正文中的 filters 块来优化沿袭搜索结果。

按依赖项类型过滤

如需将结果限制为特定依赖项类型（例如直接副本 (EXACT_COPY) 或过滤和分组等转换 (OTHER)），请使用 dependencyTypes 过滤器。

例如：

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json" \
     -X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
     --data '{
       "parent": "projects/my-billing-project/locations/us",
       "locations": ["us"],
       "rootCriteria": {
         "entities": {
           "entities": [{
             "fullyQualifiedName": "bigquery:my-project.dataset.my_table"
           }]
         }
       },
       "direction": "DOWNSTREAM",
       "filters": {
         "dependencyTypes": ["EXACT_COPY"]
       }
     }'

仅限表级沿袭

如需确保搜索仅返回表级沿袭并完全排除列级沿袭，请将 entitySet 过滤器设置为 ENTITIES。

例如：

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json" \
     -X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
     --data '{
       "parent": "projects/my-billing-project/locations/us",
       "locations": ["us"],
       "rootCriteria": {
         "entities": {
           "entities": [{
             "fullyQualifiedName": "bigquery:my-project.dataset.my_table"
           }]
         }
       },
       "direction": "DOWNSTREAM",
       "filters": {
         "entitySet": "ENTITIES"
       }
     }'

按时间范围过滤

您可以将沿袭搜索结果限制为特定时间间隔。

例如，如需搜索在特定时间戳之后创建的沿袭数据，请使用以下请求：

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json" \
     -X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
     --data '{
       "parent": "projects/my-billing-project/locations/us",
       "locations": ["us"],
       "rootCriteria": {
         "entities": {
           "entities": [{
             "fullyQualifiedName": "bigquery:my-project.dataset.my_table"
           }]
         }
       },
       "direction": "DOWNSTREAM",
       "filters": {
         "timeRange": {
           "startTime": "2026-01-01T00:00:00Z"
         }
       }
     }'

处理无法访问的位置（部分结果）

由于流式传输 API 会同时扫描一组分布式项目和位置，因此在执行期间，某些远程区域可能会暂时关闭、无法通信或配置错误。

为保护数据完整性，searchLineageStreamingResponse 流包含一个名为 unreachable 的专用诊断字段：

字段名称：unreachable（表示为重复字符串）
值格式：projects/PROJECT_NUMBER/locations/LOCATION （例如 projects/123456789/locations/us-east1）

后续步骤

详细了解多区域沿袭搜索。
详细了解数据沿袭。
详细了解沿袭可视化。

如未另行说明，那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可，并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情，请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。

最后更新时间 (UTC)：2026-06-12。