Data Storage in Kestra
Understand where different data components (inputs, outputs, logs, etc.) are stored in Kestra's architecture.
Overview
Kestra processes and stores various data components, including flow definitions, workflow inputs, outputs, logs, execution metadata, and more. Understanding where these components are stored is beneficial for optimizing performance, configuring persistence, and integrating with external storage solutions.
Kestra data is stored in either a database such as PostgreSQL or internal storage, which by default is your local storage but can be configured to an S3 bucket or MinIO. You can read more about Kestra's architecture and internal storage in their dedicated documentation.
Data Storage Components
Below is a table view of many of the Kestra data storage components, where they are stored, and what they are.
Data Component | Storage Location | Description |
---|---|---|
Flows & Definitions | Database | Flows, tasks, and their configurations are stored in the database. |
Namespace | Database | Namespaces are used to organize workflows and manage access to secrets, plugin defaults, and variables. |
Namespace Files | Internal Storage | Namespace Files store code and configuration files directly in Kestra's internal storage backend. |
Executions & Metadata | Database | Each execution, including status, timestamps, and execution metadata, is stored in the database. |
Inputs | Internal Storage | Inputs provided to a flow execution are kept in internal storage. |
Input Files | Internal Storage | Additional files to pass to any script or CLI task. |
Outputs | Internal Storage | Outputs from tasks are stored in Kestra’s internal storage system, separate from the database. |
Output Files | Internal Storage | Generated files available for download and usable in downstream tasks. |
Key-Value Pairs | Internal Storage & Database (Metadata only) | KV Store holds data in a convenient, key-value format. You can create them directly from the UI, via dedicated tasks, Terraform, or through the API. |
Logs & Audit Logs (Enterprise) | Database | All logs generated by tasks are stored in the database. |
Task State & Variables | Database | Dynamic variables and task states within an execution are stored and retrieved as needed. |
Secrets | Database or External Secret Manager | Secrets can be managed through Kestra’s internal database or external secret managers like AWS Secrets Manager, HashiCorp Vault, or Google Secret Manager. |
Queues | Database | Internal communication between Kestra server components. |
Triggers | Database | Triggers are event-based mechanisms to automate the execution of your workflows. |
User Administration | Database | This includes RBAC and user management information such as invitations, groups, and roles. |
Kestra Internal Storage
Kestra uses Internal Storage to handle incoming and outgoing files in a scalable way. It stores files generated during a flow execution and used to pass data between tasks. Execution outputs and artifacts such as output files are stored separately from the database. This allows efficient retrieval of task results while keeping database storage optimized.
- Used for: Task inputs and outputs, temporary execution data, and artifacts such as Input, Output, and Namespace files.
- KV Store: Internal storage is used to store Key-Value pairs, as they may contain sensitive information. This can either be your local storage or private cloud bucket. The database only contains metadata about the object, such as the key, file URI, any attached metadata about the object like TTL, creation date, last updated timestamp, etc.
- Storage Backend: By default, Kestra’s internal storage is your local storage, but it can be configured to use cloud storage options for production such as:
- Amazon S3
- Google Cloud Storage
- Azure Blob Storage
- Any S3-compatible storage
Configuring Internal Storage
You can configure Kestra’s Internal Storage backend in the docker-compose.yaml
file, for example, like in the following with S3:
kestra:
storage:
type: s3
bucket: "kestra-internal-storage"
region: "us-east-1"
Check out the Configuration documentation for more on internal storage configuration.
Data Storage Additional Information
Flows & Execution Metadata
- Stored in PostgreSQL, MySQL, or H2 (not recommended for distributed components) as structured data.
- Includes:
- Flow definitions
- Execution details
- Execution Queues
- Historical metadata
- Accessible via the Kestra API and UI.
Flows and execution data are stored in a database to provide persistent data and historical data.
Logs
- Kestra Open Source: Stored in the database.
- Kestra Enterprise: Can use the same architecture as Kestra Open Source but also supports an Elasticsearch backend for storing logs.
- In the Enterprise Edition, Audit Logs are also stored in the database.
- Logs can be accessed via the UI, API, or through external logging systems when integrated (e.g., Log Shipper).
Queues
- Kestra Open Source: Stored in the database.
- Kestra Enterprise: Can use the database—same as Kestra Open Source—but also supports Kafka instances to replace the database for messaging between server components.
Secrets Management
- Secrets can be stored in:
- Kestra’s database (default).
- External secret managers, including AWS Secrets Manager, Google Secret Manager, and HashiCorp Vault.
- Secrets are encrypted and never exposed in logs.
You can manage secrets in your Kestra instance with the secret manager of your choice in your configuration file. For example, to add AWS Secret Manager, use the following:
kestra:
secret:
type: aws-secret-manager
awsSecretManager:
accessKeyId: mysuperaccesskey
secretKeyId: mysupersecretkey
sessionToken: mysupersessiontoken
region: us-east-1
For more configurations, check out the Secret Managers documentation.
Database Maintenance
Because the database is potentially storing lots of execution data and logs over time, it is beneficial for performance and capacity to utilize Purge tasks to keep the instance as tidy as possible when data is no longer needed.
Conclusion
Kestra’s storage architecture ensures efficient separation of execution data, logs, and artifacts. While the database handles structured execution metadata, internal storage is used for inputs, outputs, and task-generated files, preventing database overload. For large-scale deployments, cloud-based storage solutions can be used to optimize performance.
If the data components listed need to be broken out into, for example, separate Business Units, check out the Governance section of the Enterprise Edition to learn more about tenants.
For more details about storage architecture, refer to:
Was this page helpful?