Blueprints

Use BigQuery and Python script running in Docker to analyze Wikipedia page views

Source

yaml
id: wikipedia-top10-python-pandas
namespace: company.team
description: Analyze top 10 Wikipedia pages

tasks:
  - id: query
    type: io.kestra.plugin.gcp.bigquery.Query
    sql: >
      SELECT DATETIME(datehour) as date, title, views FROM
      `bigquery-public-data.wikipedia.pageviews_2024` WHERE DATE(datehour) =
      current_date() and wiki = 'en' ORDER BY datehour desc, views desc LIMIT 10
    store: true
    projectId: test-project
    serviceAccount: "{{ secret('GCP_SERVICE_ACCOUNT_JSON') }}"

  - id: write_csv
    type: io.kestra.plugin.serdes.csv.IonToCsv
    from: "{{ outputs.query.uri }}"

  - id: pandas
    type: io.kestra.plugin.scripts.python.Script
    warningOnStdErr: false
    taskRunner:
      type: io.kestra.plugin.scripts.runner.docker.Docker
    containerImage: ghcr.io/kestra-io/pydata:latest
    inputFiles:
      data.csv: "{{ outputs.write_csv.uri }}"
    script: |
      import pandas as pd
      from kestra import Kestra

      df = pd.read_csv("data.csv")
      df.head(10)
      views = df['views'].max()
      Kestra.outputs({'views': int(views)})

About this blueprint

Metrics Python BigQuery Outputs

This flow will do the following:

  1. Use bigquery.Query task to query the top 10 wikipedia pages for the current day 2. Use IonToCsv to store the results in a CSV file. 3. Use python.Script task to read the CSV file and use pandas to find the maximum number of views. 4. Use Kestra outputs to track the maximum number of views over time. The Python script will run in a Docker container based on the public image ghcr.io/kestra-io/pydata:latest. The BigQuery task exposes (by default) a variety of metrics such as:
  • total.bytes.billed - total.partitions.processed - number of rows processed - query duration You can view those metrics on the Execution page in the Metrics tab.

Query

Ion To Csv

Script

Docker

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra