BigQuery, Google Cloud's fully managed data platform, is a cost-effective and scalable solution for data integration and analysis.
Overview
ELT (Extract, Load, Transform) is Google Cloud's recommended pattern for data integration. ELT involves extracting data from source systems, loading it into BigQuery, and then transforming it into the desired format for analysis. Unlike ETL (Extract, Transform, Load), which involves transforming data before it is loaded into a data warehouse, the ELT approach enables you to use the full power of BigQuery to perform data transformations and any SQL user to effectively develop data integration pipelines.
The ELT approach accelerates time to market by offering a familiar, data-centric framework that minimizes the learning curve. ELT aslo eliminates the need for separate ETL infrastructure, reducing costs associated with data transformation before loading.
BigQuery's architecture allows for massive scalability and parallel processing, enabling efficient handling of large datasets and complex transformations. ELT facilitates data governance by centralizing data in BigQuery, enabling consistent data quality and security policies that are built into the platform.
BigQuery supports various data sources and formats, including Apache Iceberg, which provides flexibility in integrating diverse data. There is also flexibility in language choice in addition to data choice. A primary benefit of ELT is that SQL fits into a number of SDLC pipelines.
Bring data into BigQuery
Bring data into BigQuery | BigQuery's data integration path with ELT. |
---|---|
Stage | Solution |
Extract and load | Batch load: BigQuery Data Transfer Service (DTS) automates the bulk load of data from supported data sources into BigQuery. Streaming load: Pub/Sub BigQuery subscriptions writes Pub/Sub messages to an existing BigQuery table as they are received. Change data capture (CDC): Datastream enables non-intrusive change data capture (CDC) from databases into BigQuery. Federation to external data sources: BigQuery supports federation to a number of external data sources that don't require data movement. |
Transform | BigQuery's transform capabilities are built with Dataform, a tool for collaboratively building, testing, and documenting SQL data transformation with BigQuery data manipulation language (DML). To make it easier for you to transform data, we've created two powerful experiences that leverage Dataform: AI-powered data transformation: BigQuery data preparation (in Preview) provides intelligent suggestions for cleaning, transforming, and enriching data, which can reduce the time and effort required for data preparation tasks. SQL-based visual data transformation: BigQuery workflows (in Preview) provide visual user experience to interactively build simple workflows. |
Large scale data migration | On top of the patterns covered above, migrations to Google Cloud may involve mass data migrations from existing data warehouses to BigQuery. BigQuery Migration Service is a fully managed service allowing customers to load existing data warehouses to BigQuery for further data analysis. |
Bring data into BigQuery
BigQuery's data integration path with ELT.
Extract and load
Batch load: BigQuery Data Transfer Service (DTS) automates the bulk load of data from supported data sources into BigQuery.
Streaming load: Pub/Sub BigQuery subscriptions writes Pub/Sub messages to an existing BigQuery table as they are received.
Change data capture (CDC): Datastream enables non-intrusive change data capture (CDC) from databases into BigQuery.
Federation to external data sources: BigQuery supports federation to a number of external data sources that don't require data movement.
Transform
BigQuery's transform capabilities are built with Dataform, a tool for collaboratively building, testing, and documenting SQL data transformation with BigQuery data manipulation language (DML). To make it easier for you to transform data, we've created two powerful experiences that leverage Dataform:
AI-powered data transformation: BigQuery data preparation (in Preview) provides intelligent suggestions for cleaning, transforming, and enriching data, which can reduce the time and effort required for data preparation tasks.
SQL-based visual data transformation: BigQuery workflows (in Preview) provide visual user experience to interactively build simple workflows.
Large scale data migration
On top of the patterns covered above, migrations to Google Cloud may involve mass data migrations from existing data warehouses to BigQuery. BigQuery Migration Service is a fully managed service allowing customers to load existing data warehouses to BigQuery for further data analysis.
How It Works
BigQuery offers data engineers and application developers a rich portfolio of products and capabilities to help build, schedule, and manage pipelines. The broad set of offerings gives you the flexibility to choose between ETL and ELT paradigms.
Common Uses
BigQuery is an AI-ready unified data platform that enables you to connect all of your enterprise data with AI. BigQuery ingests data from diverse sources (messaging, databases, and more) via batch, streaming, or change data capture (CDC). BigQuery stores, computes, and manages data with an AI-powered transformation layer and a unified user experience for analysis.
BigQuery is an AI-ready unified data platform that enables you to connect all of your enterprise data with AI. BigQuery ingests data from diverse sources (messaging, databases, and more) via batch, streaming, or change data capture (CDC). BigQuery stores, computes, and manages data with an AI-powered transformation layer and a unified user experience for analysis.
Build a powerful ELT pipeline on Google Cloud:
This setup provides scalability, real-time capabilities, and efficient data transformation for comprehensive analysis.
Build a powerful ELT pipeline on Google Cloud:
This setup provides scalability, real-time capabilities, and efficient data transformation for comprehensive analysis.