使用伺服器端自動化功能搜尋多區域沿襲

本文說明如何使用 searchLineageStreaming API 查詢多層級的跨區域資料沿襲。

searchLineageStreaming API 會從一組已定義的根實體開始,在指定方向 (上游或下游) 執行廣度優先搜尋,並以即時串流回應的形式傳回統一的歷程圖。

詳情請參閱「關於多區域歷程搜尋」。

主要功能

searchLineageStreaming API 包含下列功能:

  • 廣度優先搜尋:逐層遍歷沿襲圖,準確計算每個連結資產的深度。

  • 串流回應:在後端系統探索子圖和沿襲連結時傳回。這項功能非常適合廣泛或深入的沿襲圖,可避免要求逾時。

  • 多個位置和多個專案的遍歷:雖然您只在要求路徑中指定一個帳單專案,但只要您具備必要權限,API 就會自動探索並遍歷多個 Google Cloud 專案和地理位置的歷程連結。

  • 精細的資料欄層級歷程:支援搜尋資產間的資料欄層級依附元件。

  • 萬用字元查詢:在完整名稱 (FQN) 後方加上 *,即可擷取特定實體的所有欄位層級沿襲資訊。

  • 管道洞察:選擇性擷取有關建立沿襲連結的轉換管道 (程序) 的中繼資料。

事前準備

向 API 發出要求前,請先確認您符合下列安全性和環境先決條件:

必要的角色

如要取得搜尋資料歷程連結所需的權限,請要求管理員在儲存歷程連結和程序的專案中,授予您「資料歷程檢視者 」(roles/datalineage.viewer) IAM 角色。如要進一步瞭解如何授予角色,請參閱「管理專案、資料夾和組織的存取權」。

這個預先定義的角色具備搜尋資料沿襲連結所需的權限。如要查看確切的必要權限,請展開「Required permissions」(必要權限) 部分:

所需權限

如要搜尋資料沿襲連結,必須具備下列權限:

  • 搜尋實體層級的沿革: datalineage.events.get 在儲存連結的專案中
  • 搜尋資料欄層級的沿革: datalineage.events.getFields 在儲存連結的專案中
  • 擷取完整的管道程序詳細資料: datalineage.processes.get 在儲存程序的專案中

您或許還可透過自訂角色或其他預先定義的角色取得這些權限。

資源範圍

設定 API 要求時,您必須區分用於管理結帳的資源,以及 API 掃描的實際位置:

  • 帳單父項路徑:網址要求中的 parent 路徑必須使用 projects/project/locations/location 格式。這個特定專案/位置組合專門用於評估帳單配額和 API 速率限制。

  • 目標位置:在要求主體內的 locations 陣列中,明確定義要讓後端掃描的區域。

設定驗證

使用存取權杖初始化環境變數,驗證 curl 指令: Google Cloud

export ACCESS_TOKEN=$(gcloud auth print-access-token)

使用範例

下列範例使用 datalineage.googleapis.com 端點。

搜尋多層級、多專案沿襲

如要執行深度歷程搜尋,跨越多個圖表深度,並掃描不同專案,請定義下列變數: Google Cloud

  • limits.maxDepth 設為目標遍歷深度 (接受 1100 之間的值)。

  • locations 陣列中填入後端要交叉比對的目標區域 (例如 ["us", "us-east1"])。

C#

C#

在試用這個範例之前,請先按照「使用用戶端程式庫的 Knowledge Catalog 快速入門導覽課程」中的 C# 設定說明操作。詳情請參閱 Knowledge Catalog C# API 參考文件

如要向 Knowledge Catalog 進行驗證,請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

using Google.Api.Gax.Grpc;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.DataCatalog.Lineage.V1;
using System.Threading.Tasks;

public sealed partial class GeneratedLineageClientSnippets
{
    /// <summary>Snippet for SearchLineageStreaming</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public async Task SearchLineageStreamingRequestObject()
    {
        // Create client
        LineageClient lineageClient = LineageClient.Create();
        // Initialize request argument(s)
        SearchLineageStreamingRequest request = new SearchLineageStreamingRequest
        {
            ParentAsLocationName = LocationName.FromProjectLocation("[PROJECT]", "[LOCATION]"),
            Locations = { "", },
            RootCriteria = new SearchLineageStreamingRequest.Types.RootCriteria(),
            Direction = SearchLineageStreamingRequest.Types.SearchDirection.Unspecified,
            Filters = new SearchLineageStreamingRequest.Types.SearchFilters(),
            Limits = new SearchLineageStreamingRequest.Types.SearchLimits(),
        };
        // Make the request, returning a streaming response
        using LineageClient.SearchLineageStreamingStream response = lineageClient.SearchLineageStreaming(request);

        // Read streaming responses from server until complete
        // Note that C# 8 code can use await foreach
        AsyncResponseStream<SearchLineageStreamingResponse> responseStream = response.GetResponseStream();
        while (await responseStream.MoveNextAsync())
        {
            SearchLineageStreamingResponse responseItem = responseStream.Current;
            // Do something with streamed response
        }
        // The response stream has completed
    }
}

Java

Java

在試用這個範例之前,請先按照「使用用戶端程式庫的 Knowledge Catalog 快速入門導覽課程」中的 Java 設定說明操作。詳情請參閱 Knowledge Catalog Java API 參考文件

如要向 Knowledge Catalog 進行驗證,請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

import com.google.api.gax.rpc.ServerStream;
import com.google.cloud.datacatalog.lineage.v1.LineageClient;
import com.google.cloud.datacatalog.lineage.v1.LocationName;
import com.google.cloud.datacatalog.lineage.v1.SearchLineageStreamingRequest;
import com.google.cloud.datacatalog.lineage.v1.SearchLineageStreamingResponse;
import java.util.ArrayList;

public class AsyncSearchLineageStreaming {

  public static void main(String[] args) throws Exception {
    asyncSearchLineageStreaming();
  }

  public static void asyncSearchLineageStreaming() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (LineageClient lineageClient = LineageClient.create()) {
      SearchLineageStreamingRequest request =
          SearchLineageStreamingRequest.newBuilder()
              .setParent(LocationName.of("[PROJECT]", "[LOCATION]").toString())
              .addAllLocations(new ArrayList<String>())
              .setRootCriteria(SearchLineageStreamingRequest.RootCriteria.newBuilder().build())
              .setFilters(SearchLineageStreamingRequest.SearchFilters.newBuilder().build())
              .setLimits(SearchLineageStreamingRequest.SearchLimits.newBuilder().build())
              .build();
      ServerStream<SearchLineageStreamingResponse> stream =
          lineageClient.searchLineageStreamingCallable().call(request);
      for (SearchLineageStreamingResponse response : stream) {
        // Do something when a response is received.
      }
    }
  }
}

Node.js

Java

在試用這個範例之前,請先按照「使用用戶端程式庫的 Knowledge Catalog 快速入門導覽課程」中的 Java 設定說明操作。

如要向 Knowledge Catalog 進行驗證,請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  Required. The project and location to initiate the search from.
 */
// const parent = 'abc123'
/**
 *  Required. The locations to search in.
 */
// const locations = ['abc','def']
/**
 *  Required. Criteria for the root of the search.
 */
// const rootCriteria = {}
/**
 *  Required. Direction of the search.
 */
// const direction = {}
/**
 *  Optional. Filters for the search.
 */
// const filters = {}
/**
 *  Optional. Limits for the search.
 */
// const limits = {}

// Imports the Lineage library
const {LineageClient} = require('@google-cloud/lineage').v1;

// Instantiates a client
const lineageClient = new LineageClient();

async function callSearchLineageStreaming() {
  // Construct request
  const request = {
    parent,
    locations,
    rootCriteria,
    direction,
  };

  // Run request
  const stream = await lineageClient.searchLineageStreaming(request);
  stream.on('data', (response) => { console.log(response) });
  stream.on('error', (err) => { throw(err) });
  stream.on('end', () => { /* API call completed */ });
}

callSearchLineageStreaming();

Python

Python

在試用這個範例之前,請先按照「使用用戶端程式庫的 Knowledge Catalog 快速入門導覽課程」中的 Python 設定說明操作。詳情請參閱 Knowledge Catalog Python API 參考文件

如要向 Knowledge Catalog 進行驗證,請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

# This snippet has been automatically generated and should be regarded as a
# code template only.
# It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
#   client as shown in:
#   https://googleapis.dev/python/google-api-core/latest/client_options.html
from google.cloud import datacatalog_lineage_v1


def sample_search_lineage_streaming():
    # Create a client
    client = datacatalog_lineage_v1.LineageClient()

    # Initialize request argument(s)
    request = datacatalog_lineage_v1.SearchLineageStreamingRequest(
        parent="parent_value",
        locations=["locations_value1", "locations_value2"],
        direction="UPSTREAM",
    )

    # Make the request
    stream = client.search_lineage_streaming(request=request)

    # Handle the response
    for response in stream:
        print(response)

Ruby

Ruby

在試用這個範例之前,請先按照「使用用戶端程式庫的 Knowledge Catalog 快速入門導覽課程」中的 Ruby 設定說明操作。詳情請參閱 Knowledge Catalog Ruby API 參考文件

如要向 Knowledge Catalog 進行驗證,請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

require "google/cloud/data_catalog/lineage/v1"

##
# Snippet for the search_lineage_streaming call in the Lineage service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DataCatalog::Lineage::V1::Lineage::Client#search_lineage_streaming.
#
def search_lineage_streaming
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DataCatalog::Lineage::V1::Lineage::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DataCatalog::Lineage::V1::SearchLineageStreamingRequest.new

  # Call the search_lineage_streaming method to start streaming.
  output = client.search_lineage_streaming request

  # The returned object is a streamed enumerable yielding elements of type
  # ::Google::Cloud::DataCatalog::Lineage::V1::SearchLineageStreamingResponse
  output.each do |current_response|
    p current_response
  end
end

REST

如要搜尋資料歷程,請使用 searchLineageStreaming 方法

使用任何要求資料之前,請先修改下列項目的值:

  • PROJECT_ID:用於管理帳單和評估配額的 Google Cloud 專案 ID。
  • LOCATION_ID: Google Cloud 位置,例如 us-central1
  • SOURCE_PROJECT_ID:來源資料表所在的 Google Cloud 專案 ID。
  • DATASET_ID:BigQuery 資料集 ID。
  • TABLE_ID:BigQuery 資料表 ID。

HTTP 方法和網址:

POST https://datalineage.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID:searchLineageStreaming

JSON 要求主體:

{
  "parent": "projects/PROJECT_ID/locations/LOCATION_ID",
  "locations": [
    "LOCATION_ID",
    "us-east1",
    "us-central1"
  ],
  "rootCriteria": {
    "entities": {
      "entities": [
        {
          "fullyQualifiedName": "bigquery:SOURCE_PROJECT_ID.DATASET_ID.TABLE_ID"
        }
      ]
    }
  },
  "direction": "DOWNSTREAM",
  "limits": {
    "maxDepth": 10,
    "maxResults": 5000
  }
}

請展開以下其中一個選項,以傳送要求:

您應該會收到如下的 JSON 回覆:

{
  "links": [
    {
      "source": {
        "fullyQualifiedName": "bigquery:project-prod.dataset.source_table"
      },
      "target": {
        "fullyQualifiedName": "bigquery:project-prod.dataset.target_table"
      },
      "depth": 1,
      "location": "us"
    }
  ]
}

搜尋多個地理位置

如要限制或擴展歷程圖掃描範圍,請修改 locations 重複陣列欄位中傳遞的地理區域。

例如:

curl -H "Authorization: Bearer ${ACCESS_TOKEN}" \
-H "Content-Type: application/json" \
-X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
--data '{
  "parent": "projects/my-billing-project/locations/us",
  "locations": ["us", "europe-west1", "asia-south2"],
  "rootCriteria": {
    "entities": {
      "entities": [{
        "fullyQualifiedName": "bigquery:my-project.dataset.global_table"
      }]
    }
  },
  "direction": "DOWNSTREAM"
}'

根據預設,API 會省略程序資訊 (maxProcessPerLink預設為 0)。如要擷取建立資料連結的管道資源名稱,請將 limits.maxProcessPerLink 設為非零的正整數。

例如:

curl -H "Authorization: Bearer ${ACCESS_TOKEN}" \
-H "Content-Type: application/json" \
-X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
--data '{
  "parent": "projects/my-billing-project/locations/us",
  "locations": ["us"],
  "rootCriteria": {
    "entities": {
      "entities": [{
        "fullyQualifiedName": "bigquery:my-project.dataset.target_table"
      }]
    }
  },
  "direction": "UPSTREAM",
  "limits": {
    "maxProcessPerLink": 5
  }
}'

回應行為:產生的串流會在 links[].processes 欄位中填入程序訊息,其中只包含絕對系統資源名稱 (例如 projects/my-project/locations/us/processes/my-process)。

使用 FieldMask 擷取完整程序詳細資料

如果您需要管道的完整結構化中繼資料 (例如 displayName、系統 attributes 或執行 origin),而不只是資源名稱,則必須使用 API FieldMask

  1. limits.maxProcessPerLink 提供非零值。

  2. 在網址路徑中附加 fields 查詢參數,並指定 links.processes.process 和其他必填欄位。

例如:

curl -H "Authorization: Bearer ${ACCESS_TOKEN}" \
-H "Content-Type: application/json" \
-X POST "https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming?fields=links.processes.process,links.source,links.target,links.depth" \
--data '{
  "parent": "projects/my-billing-project/locations/us",
  "locations": ["us"],
  "rootCriteria": {
    "entities": {
      "entities": [{
        "fullyQualifiedName": "bigquery:my-project.dataset.target_table"
      }]
    }
  },
  "direction": "UPSTREAM",
  "limits": {
    "maxProcessPerLink": 5
  }
}'

搜尋資料表和資料欄層級的沿革

您可以在單一要求中搜尋資料表層級 (資產層級) 和資料欄層級 (欄位層級) 的歷程,方法是在 rootCriteria.entities.entities 清單中提供多個實體:

  • 如為資料表層級的歷程,請省略 field 陣列。

  • 如要取得資料欄層級的歷程資訊,請在 field 陣列中指定單一資料欄。

例如:

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json" \
     -X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
     --data '{
       "parent": "projects/my-billing-project/locations/us",
       "locations": ["us"],
       "rootCriteria": {
         "entities": {
           "entities": [
             {
               "fullyQualifiedName": "bigquery:my-project.dataset.table_a"
             },
             {
               "fullyQualifiedName": "bigquery:my-project.dataset.table_b",
               "field": ["email"]
             }
           ]
         }
       },
       "direction": "DOWNSTREAM"
     }'

使用萬用字元取得資料欄層級歷程

如要搜尋特定資料表的所有可用資料欄層級沿襲,而不需個別列出每個資料欄,請在 field 陣列中使用萬用字元 * 做為單一值。

例如:

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json" \
     -X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
     --data '{
       "parent": "projects/my-billing-project/locations/us",
       "locations": ["us"],
       "rootCriteria": {
         "entities": {
           "entities": [{
             "fullyQualifiedName": "bigquery:my-project.dataset.my_table",
             "field": ["*"]
           }]
         }
       },
       "direction": "DOWNSTREAM"
     }'

篩選歷程結果

您可以在要求主體中使用 filters 區塊,縮小沿革搜尋結果範圍。

依附屬項目類型篩選

如要將結果限制為特定依附元件類型,例如直接副本 (EXACT_COPY) 或篩選和分組等轉換 (OTHER),請使用 dependencyTypes 篩選器。

例如:

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json" \
     -X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
     --data '{
       "parent": "projects/my-billing-project/locations/us",
       "locations": ["us"],
       "rootCriteria": {
         "entities": {
           "entities": [{
             "fullyQualifiedName": "bigquery:my-project.dataset.my_table"
           }]
         }
       },
       "direction": "DOWNSTREAM",
       "filters": {
         "dependencyTypes": ["EXACT_COPY"]
       }
     }'

僅限資料表歷程

如要確保搜尋結果只會傳回資料表層級的歷程,並完全排除資料欄層級的歷程,請將 entitySet 篩選器設為 ENTITIES

例如:

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json" \
     -X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
     --data '{
       "parent": "projects/my-billing-project/locations/us",
       "locations": ["us"],
       "rootCriteria": {
         "entities": {
           "entities": [{
             "fullyQualifiedName": "bigquery:my-project.dataset.my_table"
           }]
         }
       },
       "direction": "DOWNSTREAM",
       "filters": {
         "entitySet": "ENTITIES"
       }
     }'

依時間範圍篩選

您可以將沿襲搜尋結果限制在特定時間間隔內。

舉例來說,如要搜尋在特定時間戳記之後建立的歷程資料,請使用下列要求:

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json" \
     -X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
     --data '{
       "parent": "projects/my-billing-project/locations/us",
       "locations": ["us"],
       "rootCriteria": {
         "entities": {
           "entities": [{
             "fullyQualifiedName": "bigquery:my-project.dataset.my_table"
           }]
         }
       },
       "direction": "DOWNSTREAM",
       "filters": {
         "timeRange": {
           "startTime": "2026-01-01T00:00:00Z"
         }
       }
     }'

處理無法連上的位置 (部分結果)

由於串流 API 會同時掃描一組分散的專案和位置,因此在執行期間,部分遠端區域可能會暫時停止運作、無法通訊或設定錯誤。

為保護資料完整性,searchLineageStreamingResponse 串流包含專屬的診斷欄位,稱為 unreachable

  • 欄位名稱:unreachable (以重複字串表示)

  • 值格式:projects/PROJECT_NUMBER/locations/LOCATION (例如 projects/123456789/locations/us-east1)

後續步驟