Blueprints

Ingest Google Analytics data into a DuckDB database using dlt

Source

yaml
id: dlt-google-analytics-to-duckdb
namespace: company.team

tasks:
  - id: dlt_pipeline
    type: io.kestra.plugin.scripts.python.Script
    taskRunner:
      type: io.kestra.plugin.scripts.runner.docker.Docker
    containerImage: python:3.11
    beforeCommands:
      - pip install dlt[duckdb]
      - dlt --non-interactive init google_analytics duckdb
    warningOnStdErr: false
    env:
      SOURCES__GOOGLE_ANALYTICS__CREDENTIALS__PROJECT_ID: "{{ secret('GOOGLE_ANALYTICS_PROJECT_ID') }}"
      SOURCES__GOOGLE_ANALYTICS__CREDENTIALS__CLIENT_EMAIL: "{{ secret('GOOGLE_ANALYTICS_CLIENT_EMAIL') }}"
      SOURCES__GOOGLE_ANALYTICS__CREDENTIALS__PRIVATE_KEY: "{{ secret('GOOGLE_ANALYTICS_PRIVATE_KEY') }}"
      SOURCES__GOOGLE_ANALYTICS__PROPERTY_ID: "{{ secret('GOOGLE_ANALYTICS_PROPERTY_ID') }}"
    script: |
      import dlt
      from google_analytics import google_analytics

      QUERIES = [
          {
              "resource_name": "sample_analytics_data1",
              "dimensions": ["browser", "city"],
              "metrics": ["totalUsers", "transactions"],
          },
          {
              "resource_name": "sample_analytics_data2",
              "dimensions": ["browser", "city", "dateHour"],
              "metrics": ["totalUsers"],
          },
      ]

      pipeline = dlt.pipeline(
          pipeline_name="google_analytics_pipeline",
          destination="duckdb",
          dataset_name="google_analytics",
      )

      load_info = pipeline.run(google_analytics(queries=QUERIES))

About this blueprint

Ingest GCP

This flow demonstrates how to extract data from Google Analytics and load it into a DuckDB database using dlt. The entire workflow logic is contained in a single Python script that uses the dlt Python library to ingest the data into a DuckDB database. The credentials to access the Google Analytics API are stored using Secret.

Script

Docker

More Related Blueprints

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra