How We Keep Upgrading Kestra Before 1.0

September 3 2025Solutions
Loïc Mathieu

Loïc Mathieu

Lead Developer

How We Keep Upgrading Kestra Before 1.0

Once again, we boosted performances with faster scheduling, improved JDBC queues, and nearly 2x execution throughput.

In 0.23, the engineering team focused on performance in multiple areas. You can find the details in this blog post: Optimizing Performance in Kestra in Version 0.23.

Today, we delivers even more substantial speed, efficiency, and system responsiveness enhancements.

Scheduler improvements with the help of Xiaomi

Wei Bo from Xiaomi contributed significant improvements to the Scheduler startup that apply to both the JDBC and Kafka runners.

When starting, the Scheduler loads all flows and triggers, then tries to find the corresponding trigger for each flow and update its state if needed. We previously iterated over the list of triggers for each flow, which was counterproductive.

On a benchmark with 100,000 flows and triggers, Xiaomi found that this stage of the Scheduler startup took 20 minutes, delaying the time when the Scheduler could process triggers. After discussing possible optimizations with our engineering team, a solution was found: to use a local cache of the triggers in a map to retrieve them by identifier.

The aforementioned change decreases the Scheduler’s startup time using 100,000 flows and triggers from 20 minutes to 8 seconds! For more details, refer to the PR #10424.

Then, our engineering team found the same kind of improvements for a step done in the scheduler’s evaluation loop, decreasing the time taken for this step with 1,000 flows and triggers (we usually didn’t benchmark at Xiaomi’s scale) from 120 milliseconds to 30 milliseconds. This step is only a small step in the evaluation loop but one small step in optimizing it. Check out the PR #10457 for more details.

Improve JDBC queues

Our JDBC queue has a polling mechanism that periodically checks for messages inside the queues table for processing. When the poll query returns results, it processes them and then waits for some time to redo the poll query (minimum poll period).

We now re-do the poll query immediately in case the poll query returns results, which allows better processing capability when the executor is under high load and reduces execution latency.

For more information, refer to the PR #9332.

Increase the number of threads used by the Kestra executor.

In version 0.22, the Kestra executor parallelized JDBC backend queue processing, delivering significant performance improvements. However, this comes with increased memory usage and higher load on the database.

In version 0.23, the approach to parallelizing JDBC backend queue processing was changed. Instead of running multiple parallel database queries, Kestra now makes a single query and processes the results in parallel. See PR #8648.

The default configuration for the number of threads used by the JDBC scheduler was also updated—from half the available CPU count to the full CPU count—allowing the executor to make better use of available resources.

Combined with the earlier improvement that re-polls immediately when a poll query returns results, these changes deliver an overall performance gain of nearly 2x.

In a controlled benchmark, performance improved as follows:

Throughput (exec/s)Latency in 0.23Latency in 0.24Improvement
10300ms150ms50% faster
25550ms300ms45% faster
5017s6s65% faster

Conclusion

We delivers major performance and scalability improvements, largely thanks to the Xiaomi Engineering team. Their production-scale usage, in-depth diagnostics, and targeted contributions—such as the scheduler enhancements—continue to shape the evolution of Kestra.

Stay tuned—there’s more to come as the focus on performance, resiliency, and scalability continues.

Get Kestra updates to your inbox

Stay up to date with the latest features and changes to Kestra