DBOS vs. AWS Step Functions Performance Benchmark

DBOS vs AWS Step Functions Performance Benchmark title
Peter Kraft
September 9, 2024

Increasingly, developers are using reliable workflows to help build applications. Reliable workflows are programs that always run to completion–if they’re interrupted, they automatically resume from where they left off. They make it much easier to write complex stateful programs, such as for payment processing, user onboarding, or exactly-once event processing.

Popular reliable workflow platforms, like AWS Step Functions, work via external orchestration. The idea is that you split your program into several steps, then write a workflow defining the control flow between steps. A central orchestrator executes the workflow, calling out to other services (e.g., AWS Lambda functions) to execute each step.

The problem with external orchestration is that it can be slow and costly. External orchestration leaves your program at the mercy of the central orchestrator, waiting for it to dispatch the next step in your program. The overhead from this queueing delay and additional communication can make programs that should run in tens of milliseconds take seconds instead, increasing your cloud costs in the process.

In this blog post, we’ll show you there’s a better way. We’ll present benchmarks comparing the DBOS Transact open source durable execution library to AWS Step Functions, showing how it makes your reliable workflows run up to 25x faster. Then, we’ll explain how DBOS Transact works: it automatically instruments your program to record the output of key steps in the database so it can automatically recover using those records if interrupted.

All benchmark code is open-source on GitHub.

Benchmark Experiments

To compare Step Functions and DBOS Transact, we constructed a benchmark. We put together a workflow with a varying number of sequential steps, each of which performs a database transaction reading then updating a single record (we use the same RDS database in all experiments on both systems). We implemented it in a Step Functions standard workflow (where each step is a Lambda) and in DBOS Transact running on DBOS Cloud (where each step is a transaction function). We then ran the workflow, varying the number of steps and measuring its end-to-end duration averaged over a thousand executions (shorter columns=faster performance):

AWS Step Functions Performance vs. DBOS Performance

DBOS Transact is 25x faster in every test. For example, a 5-step workflow that takes ~40 ms in DBOS takes over a second in Step Functions.

Now, Step Functions is aware of this problem and have released a higher-performance offering, express workflows. Express workflows trade off performance for reliability. Compared to standard Step Functions or to DBOS Transact, express workflows (documentation):

  • Don’t persist execution state internally, so if a workflow is interrupted, it restarts from the beginning instead of resuming from where it left off and some of your steps may execute multiple times.
  • Don’t manage workflow idempotency, so submitting a workflow multiple times may cause concurrent executions and data corruption.
  • Don’t track execution history, so you have to do all your logging yourself.

While express workflows are substantially faster than standard workflows, they’re still ~3x slower than DBOS Transact workflows, which have none of these downsides ((shorter columns=faster performance):

AWS Express Step Functions Performance vs. DBOS Performance

This high performance overhead makes external orchestration tools like Step Functions extremely costly to use. We have previously shown that hosted DBOS Cloud is 15x cheaper than express workflows. Standard workflows are even more expensive: at $0.025 per thousand state transitions (workflow steps). Following the calculations from our previous post, a workload invoking a four-step workflow 100 times per second would cost $25K/month on standard workflows as compared to $40/month on DBOS Cloud.

How to Make Reliable Workflows Fast

Under the hood, Step Functions is implemented as a state machine where each step of your program is a state. To execute a step, Step Functions: 

  1. Durably records its intent to execute the step,
  2. Schedules it for execution
  3. Waits for a worker to submit it for execution to AWS Lambda
  4. Waits for the execution to complete
  5. Durably records its output

The overhead of these many rounds of communication and queuing between the step functions executor, its persistent store, its task submission workers, and AWS Lambda adds overhead to each step execution. In this snippet of execution history, a single step takes 201 ms to execute end-to-end, even though the Lambda function (when measured separately) executes in ~20 ms.

AWS Step Functions performance overhead

By contrast, DBOS Transact does away with external orchestration entirely and implements durable execution directly in your application in native TypeScript. You simply write your workflow as a TypeScript function which calls your steps, implemented in other TypeScript functions. The library automatically instruments each step to record its output in the database after executing. This adds overhead of only a single database write per step, which takes less than a millisecond, but records all the information an interrupted workflow needs to automatically resume from where it left off.

Get Started with DBOS

If you’re interested in building reliable workflows with DBOS Transact, please check out our quickstart or look at our workflow documentation. We’d also love it if you joined our community:

© DBOS, Inc. 2024