Long-running agents on a stateless runtime

An agent that closes the books for the month is not a chat. It runs for two days, talks to seven systems, pauses three times to wait for a human, and resum

Devon Park ยท Apr 7, 2026

An agent that closes the books for the month is not a chat. It runs for two days, talks to seven systems, pauses three times to wait for a human, and resumes after lunch. It has to survive deploys, network failures, model timeouts, and the occasional approver going on vacation.

Building that on a stateless cloud runtime is harder than it looks. Here is how we do it.

Every running workflow is a tree of step records in our database. A worker picks up the next step that is ready, executes it, writes the result, and exits. The same worker, or a different one, will pick up the next step seconds or hours later. There is no in-memory state. The workflow is the database.

This is the same pattern Temporal and Inngest use. We built ours because we wanted to integrate it tightly with the model layer, the eval bus, and the audit log, and because we wanted the steps to be human-readable.

More NatorOS resources