Blueprints

Orchestrate CloudQuery data ingestion syncs to integrate data about your AWS resources into Postgres and and analyze it in SQL

About this blueprint

AWS Ingest S3 SQL

CloudQuery is an open-source data integration platform built for developers. CloudQuery integrates seamlessly with Kestra.

The flow below syncs metadata about various AWS services and loads it to a Postgres database.

You can use the following command to start a Postgres instance using a Docker container:

bash
docker run -d --name mypostgres -v mypostgresdb:/var/lib/postgresql/data -p 5432:5432 -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=yourPassword1234 -e POSTGRES_DB=postgres postgres:latest

You can securely manage the Postgres password and AWS credentials in Kestra by using Secrets.

After you execute the sync, we can start querying information about AWS resources in Postgres, e.g. to find all S3 buckets that are permitted to be public.

Note that instead of only querying tables about S3 (on line 17), you can add many more tables such as: tables: ["aws_s3*", "aws_ec2*", "aws_ecs*", "aws_iam*", "aws_glue*", "aws_dynamodb*"]. Check the Cloud Query documentation for a full list of supported tables.

yaml
id: cloudQueryAWS
namespace: blueprint

tasks:
  - id: wdir
    type: io.kestra.core.tasks.flows.WorkingDirectory
    tasks:
      - id: config
        type: io.kestra.core.tasks.storages.LocalFiles
        inputs:
          config.yml: |
            kind: source
            spec:
              name: aws
              path: cloudquery/aws
              version: "v22.4.0"
              tables: ["aws_s3*"]
              destinations: ["postgresql"]
              spec:
            ---
            kind: destination
            spec:
              name: "postgresql"
              version: "v5.0.3"
              path: "cloudquery/postgresql"
              write_mode: "overwrite-delete-stale"
              spec:
                connection_string: ${PG_CONNECTION_STRING}

      - id: cloudQuery
        type: io.kestra.plugin.cloudquery.CloudQueryCLI
        commands:
          - cloudquery sync config.yml --log-console 
        env:
          AWS_ACCESS_KEY_ID: "{{ secret('AWS_ACCESS_KEY_ID') }}"
          AWS_SECRET_ACCESS_KEY: "{{ secret('AWS_SECRET_ACCESS_KEY') }}"
          AWS_DEFAULT_REGION: "{{ secret('AWS_DEFAULT_REGION') }}"    
          CLOUDQUERY_API_KEY: "{{ secret('CLOUDQUERY_API_KEY') }}"
          PG_CONNECTION_STRING: "postgresql://postgres:{{secret('DB_PASSWORD')}}@host.docker.internal:5432/postgres?sslmode=disable"
      
  - id: query-s3-metadata-in-postgres
    type: io.kestra.plugin.jdbc.postgresql.Query
    url: jdbc:postgresql://host.docker.internal:5432/postgres
    username: postgres
    password: "{{ secret('DB_PASSWORD') }}"
    fetch: true
    sql: |
      SELECT arn, region
      FROM public.aws_s3_buckets
      WHERE block_public_acls IS NOT TRUE
          OR block_public_policy IS NOT TRUE
          OR ignore_public_acls IS NOT TRUE
          OR restrict_public_buckets IS NOT TRUE;

Working Directory

Local Files

Commands

Query

More Related Blueprints

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra