Optimization & provability
The deepest structural payoff of xFlow isn't statecharts, isn't multi-writer, isn't browser participation. It's that a flow definition is typed IR — analyzable, transformable, signable, and compilable — instead of opaque host-language code. That single distinction unlocks an entire research direction LangGraph, WDK, and Temporal cannot reach: whole-flow optimization (DSPy-of-flows), target-substrate compilation (the ASIC metaphor), and verifiable execution. OpenServ's BRAID is the strongest external validation that plan-as-data is structurally important.
Thesis
Flows are typed IR. Everything follows.
When the flow shape is data — symbolic action references, structural states, expression-based guards — the entire toolchain that exists for compilers exists for flows: optimization passes, target codegen, static verification, formal proof generation. When the flow shape is host-language code, none of that toolchain applies.
// xFlow flow definition — typed IR, not arbitrary code.
defineFlow({
id: "agent.search-loop",
version: "1.0.0",
initial: "agent",
states: {
agent: {
invoke: {
src: "action:llm.chat@^2", // ← symbolic, registry-resolved
input: "$.messages",
},
on: {
TOOL_CALLS: "tools",
DONE: { target: "done", guard: "messageBudgetOk" },
},
},
tools: {
type: "parallel", // ← structural, analyzable
states: {
search: { invoke: { src: "action:search@^1" } },
fetch: { invoke: { src: "action:fetch@^1" } },
},
onDone: "agent",
},
done: { type: "final" },
},
})
// Why this is IR and not "JSON wrapping JS":
// • States, transitions, parallel regions, history are STRUCTURAL.
// • Actions are SYMBOLIC references (id@version), not inline code.
// • Guards and inputs are EXPRESSIONS over the run state, not closures.
// • The whole def is canonical JSON (RFC 8785) + SCXML — analyzable
// by any tool, in any language, without executing a body.Structural states
States, transitions, parallel regions, history are first-class structural objects. A pass can read, transform, and re-emit them.
Symbolic actions
Actions are id@version references resolved at runtime. The optimizer can substitute one for another with a compatible interface without touching the flow shape.
Expression guards
Guards are expressions over run state, not closures. Their thresholds and predicates can be learned from labeled bench data.
Optimization
DSPy-of-flows — xflow.optimize.
DSPy compiles prompt-and-module programs against a metric and dataset. xFlow does the same for entire flows — prompt search, branch reordering, action substitution, guard refinement, parallel-region inference, dead-state elimination — because the flow shape is data the optimizer can manipulate.
// xflow.optimize — DSPy-of-flows.
// Inputs: flow def + metric + dataset + pass set.
// Output: an equivalent-shaped flow def with rewritten internals.
const optimized = await xflow.optimize(def, {
metric: "agent.task-success", // measured against bench
bench: "datasets/triage/v3", // labelled cases
budget: { tokensPerCase: 8_000, dollars: 50 },
passes: [
"prompt-search", // search prompts attached to actions
"branch-reorder", // reorder transitions by expected value
"action-substitution", // swap action:llm.chat@^2 → :^3
"guard-refinement", // tighten guard predicates from data
"parallel-region-inference", // hoist independent branches
"dead-state-elimination",
],
})
// optimized is itself a flow def — content-addressable, signable,
// publishable. The optimization is reproducible from (def, metric, bench).Pass set
Each pass is a deterministic transformation on the IR with a metric improvement criterion. The optimizer is a search over pass sequences against a budget.
prompt-search
Prompt search
What. Search the prompt strings attached to LLM actions against a metric and dataset (DSPy-style), with budget caps.
Why. Prompts are the highest-leverage knob in agent flows. They're typically hand-tuned; they shouldn't be.
branch-reorder
Branch reorder
What. Reorder transitions out of a state by expected value (probability × payoff − cost) using bench data.
Why. Cheap-and-likely-correct branches should be tried first. Most flows have this implicitly hand-tuned and stale.
action-substitution
Action substitution
What. Swap one registered action for another at a satisfied interface — e.g. `action:llm.chat@^2` → `^3`, or `action:search@web` → `action:search@vector` based on input shape.
Why. Actions are addressable by id@version. The optimizer can pick a cheaper / faster / more accurate one when one exists.
guard-refinement
Guard refinement
What. Tighten or relax guard predicates by learning thresholds from labeled bench cases.
Why. Guards encode 'when do we transition?' — usually as hardcoded numbers. Learn them.
parallel-region-inference
Parallel region inference
What. Detect independent branches that can be hoisted into a `type: parallel` region with `onDone` join.
Why. Statechart parallel regions are free latency wins. Hand-authored flows almost always under-use them.
dead-state-elimination
Dead-state elimination
What. Remove states that no transition reaches and inputs/outputs that no consumer reads.
Why. Authoring drift accumulates. The IR should be cleaned up by passes, not by humans.
Compilation
ASIC metaphor — xflow.compile.
One flow IR, many substrate targets. The same definition compiles to a memory-substrate runner, a Postgres-backed worker, a WASM component, a flat-DAG WDK workflow, or a future hardware-accelerated executor — without re-authoring. Authors stop choosing the runtime at author time.
// xflow.compile — target-substrate codegen ("ASICs for flows").
// One IR, many backends. Pick at deploy time, not author time.
await xflow.compile(def, { target: "memory" }) // dev / tests
await xflow.compile(def, { target: "sqlite" }) // local CLI
await xflow.compile(def, { target: "postgres" }) // server prod
await xflow.compile(def, { target: "xsync-s3worm" }) // multi-writer + signed
await xflow.compile(def, { target: "wasm-component" }) // sandbox / portable
await xflow.compile(def, { target: "wdk-flat" }) // for Vercel WDK runtimes
await xflow.compile(def, { target: "halo2-circuit" })// future: ZK execution
// Each backend reads the same IR. xFlow runtime + its substrate adapters
// are reference codegen targets. Authors don't re-shape flows per target.Targets
Each target is a backend that consumes the IR and emits an executable artifact for that runtime. Memory / sqlite / postgres / xSync ship today. WASM-component and halo2-circuit are forward direction.
| Target | Runtime | Strength |
|---|---|---|
| memory | in-process map | tests, dev, deterministic snapshots |
| sqlite | embedded SQLite | local CLI, single-process apps, offline-first |
| postgres | managed Postgres | single-tenant or multi-tenant servers, hot data |
| xsync-s3worm | xSync over S3WORM-on-Storj | multi-writer, signed audit, browser participation, cold storage |
| wasm-component | WASM Component Model | sandboxed execution, source-language portability, future hardware acceleration |
| wdk-flat | open WDK runtime (Vercel or self-hosted) | interop with WDK-only runtimes; xFlow-rich features lower to flat steps |
| halo2-circuit | halo2 ZK proving system | future: zero-knowledge execution; structural trace + action attestations |
Provability
Verifiable execution — xflow.prove.
Because flow structure is signed JSON and the action layer is the only place untrusted code runs, verifiable execution becomes a tractable engineering target instead of a research dead-end. Prove the trace through the IR; sandbox or attest actions individually. The proof is portable: anyone with the proof, the def, and the state commit can verify.
// xflow.prove — verifiable execution over a flow trace.
// Given (def, runId), produce a witness/proof that the recorded
// trace is a valid execution of the IR.
const proof = await xflow.prove(def, runId, {
backend: "halo2", // or "stark", "groth16", ...
scope: {
structure: true, // trace follows the IR
actions: "attestation", // actions: signed attestations,
// or in-circuit for pure
state: "merkle", // run state in a Merkle commit
},
})
await proof.verify() // anyone with the proof + def + commitStructural trace
Every event in the run log is a transition in the IR. A circuit can verify in-IR-ness without re-executing actions.
Action attestation
For non-pure actions, signed attestations from the executing peer become the proof's leaves. Pure actions can be verified in-circuit when feasible.
State commitment
Run state is a Merkle commitment over the event log. The proof binds (def, log root) to (final state).
External validation
OpenServ BRAID — the closest existing reference.
OpenServ Labs' BRAID (Bounded Reasoning for Autonomous Inference and Decisions) is the strongest public evidence that plan-as-data is structurally important — not just a frontend convenience. BRAID validates the bet from the reasoning side; xFlow generalizes it across all workflows.
// BRAID (OpenServ Labs) — the closest external validation.
//
// Stage 1: agent generates a Guided Reasoning Diagram (GRD), a
// machine-readable flowchart in Mermaid syntax encoding
// the solution logic.
// Stage 2: agent EXECUTES the GRD deterministically.
//
// Reported results (per OpenServ + Coyotiv research):
// GSM-Hard: 99% accuracy, 74× lower cost (GPT-5)
// SCALE Multichallenge: 2.7× accuracy, 30.3× perf-per-$ (GPT-4o)
//
// The shared bet:
// Separate the PLAN from the EXECUTION.
// The plan-as-data is the audit and optimization surface.
//
// What xFlow generalizes:
// • Plan format BRAID: Mermaid flowchart xFlow: xState statechart
// (DAG-ish) (parallel + hierarchical + history)
// • Plan distribution per-problem, in-memory content-addressed registry,
// id@version, signed
// • Scope reasoning workflows any workflow (incl. reasoning)
// • Auditability inspect GRD before commit signed flow + signed event log
//
// xFlow is BRAID's structural pattern — generalized, persisted,
// statechart-shaped, and applied beyond reasoning to any workflow.What BRAID validates
Separating plan from execution and persisting the plan as a machine-readable artifact yields large measured gains: 99% accuracy on GSM-Hard at 74× lower cost with GPT-5; 2.7× accuracy + 30.3× perf-per-dollar on SCALE Multichallenge with GPT-4o.
Where xFlow generalizes
BRAID generates a Mermaid flowchart per problem; xFlow flows are authored once, content-addressed, signed, and registry-resolved. BRAID targets reasoning workflows; xFlow targets any workflow including reasoning. BRAID's flowchart is DAG-ish; xFlow uses xState statecharts (parallel + hierarchical + history).
Structural moat
Why competitors structurally cannot reach this layer.
The optimization, compilation, and provability story isn't a feature LangGraph or WDK could ship next sprint. It depends on the input language. Once nodes are arbitrary host-language functions, the toolchain that needs an IR has nothing to grab onto.
// Why competitors structurally can't reach this layer.
LangGraph
Nodes are arbitrary host-language functions.
→ no IR, no static analysis, no whole-flow optimization,
no portable compilation target, no path to ZK/verifiable execution.
Vercel WDK
Workflow body is a JS function with step() boundaries.
→ step boundaries are introspectable; the BODIES between them are not.
Definition lives in deploy artifact, not as data.
Temporal
Workflow body is sandboxed deterministic JS / Java / Go / Python.
→ world-class durability + replay; no notion of flow-as-data,
no optimizer, no portable IR.
Inngest / Trigger.dev / Mastra / Cloudflare Workflows
Step memoization or snapshot/resume over host-language bodies.
→ durability story; no optimizer, no portable IR.
xFlow
Flow body IS the data. Actions are symbolic refs.
→ optimizer, target codegen, and proof generator are tractable
because there's something to manipulate.The shared limit across LangGraph, WDK, Temporal, Inngest, Trigger.dev, Mastra, and Cloudflare Workflows is identical: the workflow body is host-language code. That choice trades away the optimizer, the codegen, and the proof system. xFlow is the inversion — flow body is data; only the action implementations are code, and those are addressable, signable, and replaceable by id@version.
Stakes
Why this matters in practice.
The optimization-and-provability story isn't an aesthetic preference. Each consequence below is a measurable axis production teams already care about; the IR-first design is what makes the consequence reachable instead of theoretical.
Cost
OpenServ's BRAID research reports up to 74× cost reduction at higher accuracy on math benchmarks by separating plan from execution. Whole-flow optimization passes find similar savings across any agent flow — prompts, branch ordering, action substitution, guard refinement.
Reliability
Optimized flows are more reproducible. The same IR produces the same trace given the same actions; substitutions that pass bench remain in the def. Drift becomes a measured quantity, not an unknown.
Auditability
A flow def is signed JSON. A run is a signed event log. Together they're a tamper-evident record of 'what was supposed to happen' and 'what actually happened.' Compliance becomes inspection, not reconstruction.
Portability
One IR, many substrate targets. Authors stop choosing the runtime at author time. Operations chooses at deploy time, and the choice can change without re-authoring.
Verifiability
ZK proofs over structured traces are tractable when the structure is data. xFlow's flow IR + signed action attestations are a natural input to proof systems. Verifiable AI compute moves from research demo to deployable.
Honest accounting
What this story does not yet deliver.
The IR makes optimization, compilation, and provability tractable. Tractable is not shipped. Where xFlow is today vs where the IR allows it to go, said plainly.
Optimization needs benches
DSPy-of-flows isn't magic. It needs a metric, a dataset, and a budget. If you don't have labeled cases for what 'better' means, the optimizer can't help. The cost of building a bench is the cost of admission.
Compile targets aren't free
Each substrate codegen target is a real engineering surface. xFlow ships memory, sqlite, postgres, and xSync today. WASM-component, halo2-circuit, and similar are forward direction — listed here because they're tractable from the IR, not because they're shipping today.
Provability is forward direction
Verifiable execution over flow IR is structurally tractable; it is not a current xFlow product. The optimization story is closer-in (months); the provability story is research-direction (years).
Action bodies stay opaque
The optimizer can substitute one action for another with a compatible interface, but it can't optimize inside an action body. Action authors still write code. The IR is the flow shape, not the action implementation.
Recap
One sentence.
xFlow's deepest payoff is that flows are typed IR. That single design choice — symbolic action references, structural states, expression-based guards, canonical JSON — turns the optimizer, the codegen, and the proof system into tractable engineering targets. BRAID validates the pattern from the reasoning side; xFlow generalizes it across workflows.
Statecharts, registry, multi-substrate, and signed history are the visible features. The optimizer, the compiler, and the prover are what they make possible.