Source
yaml
id: parse-pdf
namespace: company.team
tasks:
- id: download_pdf
type: io.kestra.plugin.core.http.Download
uri: https://huggingface.co/datasets/kestra/datasets/resolve/main/pdf/app_store.pdf
- id: parse_text
type: io.kestra.plugin.tika.Parse
from: "{{ outputs.download_pdf.uri }}"
contentType: TEXT
store: false
- id: log_extracted_text
type: io.kestra.plugin.core.log.Log
message: "{{ outputs.parse_text.result.content }}"
About this blueprint
Ingest Outputs
This flow downloads a PDF file using the HTTP Download task. Then, it extracts text from the PDF file using Apache Tika. Finally, it logs the extracted text using the Log task.
More Related Blueprints