Orchestrate CloudQuery data ingestion syncs to integrate data about your AWS resources into Postgres and and analyze it in SQL
About this blueprint
AWS Ingest S3 SQL
CloudQuery is an open-source data integration platform built for developers. CloudQuery integrates seamlessly with Kestra.
The flow below syncs metadata about various AWS services and loads it to a Postgres database.
You can use the following command to start a Postgres instance using a Docker container:
docker run -d --name mypostgres -v mypostgresdb:/var/lib/postgresql/data -p 5432:5432 -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=yourPassword1234 -e POSTGRES_DB=postgres postgres:latest
You can securely manage the Postgres password and AWS credentials in Kestra by using Secrets.
After you execute the sync, we can start querying information about AWS resources in Postgres, e.g. to find all S3 buckets that are permitted to be public.
Note that instead of only querying tables about S3 (on line 17), you can add many more tables such as: tables: ["aws_s3*", "aws_ec2*", "aws_ecs*", "aws_iam*", "aws_glue*", "aws_dynamodb*"]
. Check the Cloud Query documentation for a full list of supported tables.
id: cloudQueryAWS
namespace: blueprint
tasks:
- id: wdir
type: io.kestra.core.tasks.flows.WorkingDirectory
tasks:
- id: config
type: io.kestra.core.tasks.storages.LocalFiles
inputs:
config.yml: |
kind: source
spec:
name: aws
path: cloudquery/aws
version: "v22.4.0"
tables: ["aws_s3*"]
destinations: ["postgresql"]
spec:
---
kind: destination
spec:
name: "postgresql"
version: "v5.0.3"
path: "cloudquery/postgresql"
write_mode: "overwrite-delete-stale"
spec:
connection_string: ${PG_CONNECTION_STRING}
- id: cloudQuery
type: io.kestra.plugin.cloudquery.CloudQueryCLI
commands:
- cloudquery sync config.yml --log-console
env:
AWS_ACCESS_KEY_ID: "{{ secret('AWS_ACCESS_KEY_ID') }}"
AWS_SECRET_ACCESS_KEY: "{{ secret('AWS_SECRET_ACCESS_KEY') }}"
AWS_DEFAULT_REGION: "{{ secret('AWS_DEFAULT_REGION') }}"
CLOUDQUERY_API_KEY: "{{ secret('CLOUDQUERY_API_KEY') }}"
PG_CONNECTION_STRING: "postgresql://postgres:{{secret('DB_PASSWORD')}}@host.docker.internal:5432/postgres?sslmode=disable"
- id: query-s3-metadata-in-postgres
type: io.kestra.plugin.jdbc.postgresql.Query
url: jdbc:postgresql://host.docker.internal:5432/postgres
username: postgres
password: "{{ secret('DB_PASSWORD') }}"
fetch: true
sql: |
SELECT arn, region
FROM public.aws_s3_buckets
WHERE block_public_acls IS NOT TRUE
OR block_public_policy IS NOT TRUE
OR ignore_public_acls IS NOT TRUE
OR restrict_public_buckets IS NOT TRUE;
More Related Blueprints