Blueprints
Scrape API in a Python task running in a Docker container and load the JSON document to a MongoDB collection
Source
yaml
id: json-from-api-to-mongodb
namespace: company.team
tasks:
- id: generate_json
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.scripts.runner.docker.Docker
containerImage: ghcr.io/kestra-io/pydata:latest
outputFiles:
- output.json
script: |
import requests
import json
from kestra import Kestra
response = requests.get("https://api.github.com")
data = response.json()
with open("output.json", "w") as output_file:
json.dump(data, output_file)
Kestra.outputs({'data': data, 'status': response.status_code})
- id: load_to_mongodb
type: io.kestra.plugin.mongodb.Load
connection:
uri: mongodb://host.docker.internal:27017/
database: local
collection: github
from: "{{ outputs.generate_json.outputFiles['output.json'] }}"
About this blueprint
Ingest Python Outputs
This flow will scrape GitHub API in a Python task running in a Docker container, and will output the result to a JSON file. That JSON API payload is then loaded to a MongoDB collection.