Data integration and ELT

BigQuery, Google Cloud's fully managed data platform, is a cost-effective and scalable solution for data integration and analysis.

Overview

Data integration and ELT with BigQuery

ELT (Extract, Load, Transform) is Google Cloud's recommended pattern for data integration. ELT involves extracting data from source systems, loading it into BigQuery, and then transforming it into the desired format for analysis. Unlike ETL (Extract, Transform, Load), which involves transforming data before it is loaded into a data warehouse, the ELT approach enables you to use the full power of BigQuery to perform data transformations and any SQL user to effectively develop data integration pipelines.

Accelerated time to market and cost reduction with ELT

The ELT approach accelerates time to market by offering a familiar, data-centric framework that minimizes the learning curve. ELT aslo eliminates the need for separate ETL infrastructure, reducing costs associated with data transformation before loading.

Scalability and performance with built-in governance

BigQuery's architecture allows for massive scalability and parallel processing, enabling efficient handling of large datasets and complex transformations. ELT facilitates data governance by centralizing data in BigQuery, enabling consistent data quality and security policies that are built into the platform.

Data integration with flexibility and choice

BigQuery supports various data sources and formats, including Apache Iceberg, which provides flexibility in integrating diverse data. There is also flexibility in language choice in addition to data choice. A primary benefit of ELT is that SQL fits into a number of SDLC pipelines.


Bring data into BigQuery

Bring data into BigQueryBigQuery's data integration path with ELT.
StageSolution

Extract and load

Batch loadBigQuery Data Transfer Service (DTS) automates the bulk load of data from supported data sources into BigQuery.

Streaming load: Pub/Sub BigQuery subscriptions writes Pub/Sub messages to an existing BigQuery table as they are received.

Change data capture (CDC)Datastream enables non-intrusive change data capture (CDC) from databases into BigQuery.

Federation to external data sources: BigQuery supports federation to a number of external data sources that don't require data movement.

Transform

BigQuery's transform capabilities are built with Dataform, a tool for collaboratively building, testing, and documenting SQL data transformation with BigQuery data manipulation language (DML). To make it easier for you to transform data, we've created two powerful experiences that leverage Dataform:

AI-powered data transformation: BigQuery data preparation (in Preview) provides intelligent suggestions for cleaning, transforming, and enriching data, which can reduce the time and effort required for data preparation tasks.

SQL-based visual data transformation: BigQuery workflows (in Preview) provide visual user experience to interactively build simple workflows.

Large scale data migration

On top of the patterns covered above, migrations to Google Cloud may involve mass data migrations from existing data warehouses to BigQuery. BigQuery Migration Service is a fully managed service allowing customers to load existing data warehouses to BigQuery for further data analysis.

Bring data into BigQuery

BigQuery's data integration path with ELT.

Extract and load

Solution

Batch loadBigQuery Data Transfer Service (DTS) automates the bulk load of data from supported data sources into BigQuery.

Streaming load: Pub/Sub BigQuery subscriptions writes Pub/Sub messages to an existing BigQuery table as they are received.

Change data capture (CDC)Datastream enables non-intrusive change data capture (CDC) from databases into BigQuery.

Federation to external data sources: BigQuery supports federation to a number of external data sources that don't require data movement.

Transform

Solution

BigQuery's transform capabilities are built with Dataform, a tool for collaboratively building, testing, and documenting SQL data transformation with BigQuery data manipulation language (DML). To make it easier for you to transform data, we've created two powerful experiences that leverage Dataform:

AI-powered data transformation: BigQuery data preparation (in Preview) provides intelligent suggestions for cleaning, transforming, and enriching data, which can reduce the time and effort required for data preparation tasks.

SQL-based visual data transformation: BigQuery workflows (in Preview) provide visual user experience to interactively build simple workflows.

Large scale data migration

Solution

On top of the patterns covered above, migrations to Google Cloud may involve mass data migrations from existing data warehouses to BigQuery. BigQuery Migration Service is a fully managed service allowing customers to load existing data warehouses to BigQuery for further data analysis.

How It Works

BigQuery offers data engineers and application developers a rich portfolio of products and capabilities to help build, schedule, and manage pipelines. The broad set of offerings gives you the flexibility to choose between ETL and ELT paradigms.


ETL vs ELT
Watch this video to learn about the difference between ELT and ETL

Common Uses

End-to-end data integration with BigQuery

Data integration architecture

BigQuery is an AI-ready unified data platform that enables you to connect all of your enterprise data with AI. BigQuery ingests data from diverse sources (messaging, databases, and more) via batch, streaming, or change data capture (CDC). BigQuery stores, computes, and manages data with an AI-powered transformation layer and a unified user experience for analysis.


BigQuery data integration architecture

    Data integration architecture

    BigQuery is an AI-ready unified data platform that enables you to connect all of your enterprise data with AI. BigQuery ingests data from diverse sources (messaging, databases, and more) via batch, streaming, or change data capture (CDC). BigQuery stores, computes, and manages data with an AI-powered transformation layer and a unified user experience for analysis.


    BigQuery data integration architecture

      Build an ELT pipeline

      Build a scalable ELT pipeline with Google Cloud

      Build a powerful ELT pipeline on Google Cloud:

      1. Choose data sources: On-premises, cloud apps, or others.
      2. Ingest data: Use Datastream for real-time or BigQuery Data Transfer Service for full loads.
      3. Transform data: Clean with data preparation, or build pipelines with Dataform.
      4. Visualize with Looker Studio: Create reports and dashboards.

      This setup provides scalability, real-time capabilities, and efficient data transformation for comprehensive analysis.

      Build a ELT pipeline architecture

        Build a scalable ELT pipeline with Google Cloud

        Build a powerful ELT pipeline on Google Cloud:

        1. Choose data sources: On-premises, cloud apps, or others.
        2. Ingest data: Use Datastream for real-time or BigQuery Data Transfer Service for full loads.
        3. Transform data: Clean with data preparation, or build pipelines with Dataform.
        4. Visualize with Looker Studio: Create reports and dashboards.

        This setup provides scalability, real-time capabilities, and efficient data transformation for comprehensive analysis.

        Build a ELT pipeline architecture

          Start your proof of concept

          Get started with BigQuery

          Have a large project?

          Data integration with Google Cloud

          ETL data integration approach

          Explore data integration partners

          Google Cloud