Blueprints

Scrape StackOverflow using AutoScraper in Python

Source

yaml
id: autoscraper
namespace: company.team

tasks:
  - id: scrape
    type: io.kestra.plugin.scripts.python.Script
    beforeCommands:
      - pip install autoscraper kestra
    warningOnStdErr: false
    script: >
      from autoscraper import AutoScraper from kestra import Kestra

      url =
      "https://stackoverflow.com/questions/2081586/web-scraping-with-python"

      # You can also put urls here to retrieve urls. wanted_list = ["What are
      metaclasses in Python?"]

      scraper = AutoScraper() result = scraper.build(url, wanted_list)

      # get related topics of any stackoverflow page: related =
      scraper.get_result_similar(
          "https://stackoverflow.com/questions/606191/convert-bytes-to-a-string"
      )

      Kestra.outputs({"data": result, "related": related})

  - id: use_output_data
    type: io.kestra.plugin.core.debug.Return
    format: "{{ outputs.scrape.vars.data }}"

  - id: use_output_related
    type: io.kestra.plugin.core.debug.Return
    format: "{{ outputs.scrape.vars.related }}"

About this blueprint

Python

This flow shows how to scrape a web page using AutoScraper in Python. It uses the AutoScraper library to extract data from StackOverflow, and the Kestra Python SDK to send the output from a Python script to Kestra. This way, you can pass data between Python scripts and other Kestra tasks.

Script

Return

More Related Blueprints

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra