Blueprints

Download a PDF file and extract text from it using Apache Tika

Source

yaml
id: parse-pdf
namespace: company.team

tasks:
  - id: download_pdf
    type: io.kestra.plugin.core.http.Download
    uri: https://huggingface.co/datasets/kestra/datasets/resolve/main/pdf/app_store.pdf

  - id: parse_text
    type: io.kestra.plugin.tika.Parse
    from: "{{ outputs.download_pdf.uri }}"
    contentType: TEXT
    store: false

  - id: log_extracted_text
    type: io.kestra.plugin.core.log.Log
    message: "{{ outputs.parse_text.result.content }}"

About this blueprint

Ingest Outputs

This flow downloads a PDF file using the HTTP Download task. Then, it extracts text from the PDF file using Apache Tika. Finally, it logs the extracted text using the Log task.

Download

Parse

Log

More Related Blueprints

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra