Manage pipelines
This document describes how to manage BigQuery pipelines, including how to schedule and delete pipelines.
This document also describes how to view and manage pipeline metadata in Dataplex Universal Catalog.
Pipelines are powered by Dataform.
Before you begin
- Create a BigQuery pipeline.
- To manage pipeline metadata in Dataplex Universal Catalog, ensure that the Dataplex API is enabled in your Google Cloud project.
Required roles
To get the permissions that you need to manage pipelines, ask your administrator to grant you the following IAM roles:
-
To delete pipelines:
Dataform Admin (
roles/dataform.Admin
) on the pipeline -
To view and run pipelines:
Dataform Viewer (
roles/dataform.Viewer
) on the project
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
To manage pipeline metadata in Dataplex Universal Catalog, ensure that you have the required Dataplex Universal Catalog roles
For more information about Dataform IAM, see Control access with IAM.
View all pipelines
To view a list of all pipelines in your project, do the following:
In the Google Cloud console, go to the BigQuery page.
In the Explorer pane, click
expand Pipelines.
View past manual runs
To view past manual runs of a selected pipeline, follow these steps:
In the Google Cloud console, go to the BigQuery page.
In the Explorer pane, expand your project and the Pipelines folder, and then select a pipeline.
Click Executions.
Optional: To refresh the list of past runs, click Refresh.
Configure alerts for failed pipeline runs
Each pipeline has a corresponding Dataform repository ID. Each BigQuery pipeline run is logged in Cloud Logging using the corresponding Dataform repository ID. You can use Cloud Monitoring to observe trends in Cloud Logging logs for BigQuery pipeline runs and to notify you when conditions you describe occur.
To receive alerts when a BigQuery pipeline run fails, you can create a log-based alerting policy for the corresponding Dataform repository ID. For instructions, see Configure alerts for failed workflow invocations.
To find the Dataform repository ID of your pipeline, do the following:
In the Google Cloud console, go to the BigQuery page.
In the Explorer pane, expand your project and the Pipelines folder, and then select a pipeline.
Click Settings.
The Dataform repository ID of your pipeline is displayed at the bottom of the Settings tab.
Delete a pipeline
To permanently delete a pipeline, follow these steps:
In the Google Cloud console, go to the BigQuery page.
In the Explorer pane, expand your project and the Pipelines folder. Find the pipeline that you want to delete.
Click
View actions next to the pipeline, and then click Delete.Click Delete.
Manage metadata in Dataplex Universal Catalog
Dataplex Universal Catalog lets you store and manage metadata for pipelines. Pipelines are available in Dataplex Universal Catalog by default, without additional configuration.
You can use Dataplex Universal Catalog to manage pipelines in all pipeline locations. Managing pipelines in Dataplex Universal Catalog is subject to Dataplex Universal Catalog quotas and limits and Dataplex Universal Catalog pricing.
Dataplex Universal Catalog automatically retrieves the following metadata from pipelines:
- Data asset name
- Data asset parent
- Data asset location
- Data asset type
- Corresponding Google Cloud project
Dataplex Universal Catalog logs pipelines as entries with the following entry values:
- System entry group
- The system entry group
for pipelines is
@dataform
. To view details of pipeline entries in Dataplex Universal Catalog, you need to view thedataform
system entry group. For instructions about how to view a list of all entries in an entry group, see View details of an entry group in the Dataplex Universal Catalog documentation. - System entry type
- The system entry type
for pipelines is
dataform-code-asset
. To view details of pipelines,you need to view thedataform-code-asset
system entry type, filter the results with an aspect-based filter, and set thetype
field insidedataform-code-asset
aspect toWORKFLOW
. Then, select an entry of the selected pipeline. For instructions about how to view details of a selected entry type, see View details of an entry type in the Dataplex Universal Catalog documentation. For instructions about how to view details of a selected entry, see View details of an entry in the Dataplex Universal Catalog documentation. - System aspect type
- The system aspect type
for pipelines is
dataform-code-asset
. To provide additional context to pipelines in Dataplex Universal Catalog by annotating data pipeline entries with aspects, view thedataform-code-asset
aspect type, filter the results with an aspect-based filter, and set thetype
field insidedataform-code-asset
aspect toWORKFLOW
. For instructions about how to annotate entries with aspects, see Manage aspects and enrich metadata in the Dataplex Universal Catalog documentation. - Type
- The type for data canvases is
WORKFLOW
. This type lets you filter pipelines in thedataform-code-asset
system entry type and thedataform-code-asset
aspect type by using theaspect:dataplex-types.global.dataform-code-asset.type=WORKFLOW
query in an aspect-based filter.
For instructions about how to search for assets in Dataplex Universal Catalog, see Search for data assets in Dataplex Universal Catalog in the Dataplex Universal Catalog documentation.
What's next
- Learn more about BigQuery pipelines.
- Learn how to create pipelines.
- Learn how to schedule pipelines.