A decade and a half ago, this problem went by a different name. Companies wired their systems together through an enterprise service bus, and dedicated teams spent their days coordinating calls between applications. With the rise of REST and microservices, that world was considered overcome. Every service was supposed to stand on its own, reachable through a cleanly cut API.
That very decomposition brings the old topic back in a new form. When one business task needs several of those small services, something has to put their calls back in order. An order checks inventory, authorizes a payment, and triggers shipping, and those three steps belong to one transaction. Whoever holds that transaction together is doing API orchestration, whether or not anyone on the team uses the word.
API orchestration is the central coordination of multiple API calls into one coherent business flow. A controlling instance calls the participating services in a defined order, passes data from one step to the next, and decides how to react to failures. Orchestration differs from choreography, where each service reacts to events on its own and no central instance steers the flow.
What API orchestration means
At its core, orchestration answers a simple question. Who is responsible for several service calls running in the right order with the right data? With a single call, the question never comes up. As soon as several calls belong together, something needs to know the overall flow.
The order example makes that concrete. Inventory gets checked, then the payment authorized, and finally the shipment triggered. Each of those steps is a separate service with a separate API. None of them knows the full flow. The inventory service knows nothing about the payment, the payment service nothing about shipping. Only an orchestrating layer ties the three into one transaction that either completes fully or fails in a controlled way.
That layer takes on three jobs. It fixes the order, it passes results along, the order ID to shipping, say, and it decides what happens on failure. If the payment fails, shipping must not start. This logic belongs neither in the inventory service nor in the payment service, because each only sees its own part. Orchestration is the answer to a gap that the decomposition into microservices creates in the first place.
Where this control lives is the important question. It can sit inside one of the participating services, which then calls the others, or in a layer of its own that knows nothing except the flow. Both variants are orchestration. The difference is visibility. When the control sits inside a business service, coordination blends into that service’s actual job. A dedicated layer separates the two and makes the flow recognizable as a thing of its own.
Why the topic is back
Three developments bring the topic back. The first is the sheer number of services. Where a monolith used to handle a task internally, five or ten services are involved today more often than not. The second is reuse. The same order flow gets triggered from a web application, a mobile app, and a partner interface, and the order has to hold everywhere. The third development is AI agents, which are supposed to execute business processes on their own and need a dependable description of the flow to do it.
Reuse is the strongest driver of the three. As long as a flow exists in one place only, the missing orchestration barely registers. Once the same transaction gets triggered from several channels, the risk multiplies. Every copy of the sequence can drift apart unnoticed, and a change to the flow has to land in several places at once. A shared orchestration keeps the same logic from running in slightly different variants side by side.
In most systems, orchestration has long existed anyway, just unspoken. It hides in whichever service happens to get called first and triggers the others from there. That service then carries a responsibility it was never designed for. Suddenly, it knows the order, the error handling, and the data of everyone else. A cleanly cut service quietly turns into a small monolith that carries its neighbors' flows around.
A look back at the enterprise service bus is worth it here. The ESB fell out of favor as central, heavyweight middleware that concentrated too much logic in one place and slowed teams down. Today’s answer looks different. Orchestration is understood as a thin, clearly bounded layer that knows nothing except the flow and leaves the business work to the individual services. The goal is a deliberate separation between the work and its coordination rather than a return to the ESB.
Orchestration, choreography, and the alternatives
Orchestration is one of several approaches to connecting multiple services. The terms get mixed in practice, yet they mean different things. A short overview sorts them out.
| Approach | Core idea | When it fits |
|---|---|---|
| Orchestration | A central instance controls the order of calls | Flows with a fixed order and strict failure logic |
| Choreography | Each service reacts to events on its own, with no central control | Loosely coupled, event-driven systems |
| API gateway | A single entry point bundles routing, authentication, and rate limiting | Cross-cutting concerns at the edge of the architecture |
| Composition | One call assembles the results of several services | Read access that gathers data from multiple sources |
The differences are large enough that each approach deserves its own look. When the two core models face off is covered in Orchestration vs. choreography. How orchestration differs from a gateway is the subject of API orchestration vs. API gateway. And where the three frequently blended terms part ways is laid out in Composition. For a first pass, one rule of thumb does the job. Orchestration actively drives a flow, while the other approaches solve other problems.
The building blocks of an orchestration
Regardless of tooling, an orchestrated flow consists of the same building blocks. It starts with the steps, each calling one service. Between the steps, data flows, because a later call often needs the result of an earlier one. Add the error handling that defines what happens when a step fails. And finally there is state, the intermediate results the flow carries until it ends.
Among these building blocks, the data flow is the least conspicuous and the most error-prone. Every step delivers results a later step depends on, and those handoffs have to be spelled out. In the order example, shipping needs the reservation ID from the first step. Leave that connection implicit and the flow works only by accident, for as long as the order happens to hold, and breaks the moment someone rearranges it. A good orchestration makes every data handoff visible, so the dependencies between steps are named outright.
These building blocks can be implemented in two ways. The first is code. A service holds the sequence as program logic, calls the others one after the next, and handles failures with the means of the programming language. The second way is a declarative description that captures the flow as data instead of program code. Since 2024, the OpenAPI Initiative has maintained a dedicated specification for exactly that, called Arazzo. What Arazzo is and how it describes workflows is covered in What is Arazzo.
workflows:
- workflowId: placeOrder
steps:
- stepId: reserveStock
operationId: reserveInventory
outputs:
reservationId: $response.body#/id
- stepId: authorizePayment
operationId: authorizePayment
outputs:
paymentId: $response.body#/id
- stepId: createShipment
operationId: createShipment
parameters:
- name: reservationId
in: query
value: $steps.reserveStock.outputs.reservationId
The example shows the order flow as a sequence of steps. Shipping receives the reservation ID from the first step, and the order is fixed. Whether a team implements such a flow as code or as a declarative spec depends on the situation. Code stays more flexible, while a spec is easier to verify, to document, and to feed into tools such as test runners or AI agents.
A fifth building block tends to matter only once the flow is in production, namely observability. An orchestrated flow consists of several calls, and when something fails, it has to be traceable at which step a transaction got stuck. A correlation ID that travels with every call of a transaction makes the path through the participating services visible. Without that bracket, a transaction falls apart into seemingly unrelated calls scattered across log files. Precisely because orchestration touches several systems, monitoring belongs in the design from day one rather than in the later debugging session.
Failures, retries, and compensation
The hardest part of an orchestration is rarely the happy path. When every step succeeds, the sequence is quickly programmed. Things get interesting once a step fails halfway through. In the order example, inventory is already reserved and the payment authorized, yet the shipment cannot be created. The transaction must neither simply abort nor carry on as if nothing happened.
Two mechanisms come into play here. The first is the retry. Many failures are transient, such as a service that is briefly unreachable. An orchestration retries such a step after a short wait before treating it as having failed for good. For a retry to be safe, the call has to be idempotent. A step executed twice must lead neither to a double reservation nor to a double payment.
The second mechanism is compensation. When a step cannot be executed after all, the steps already completed have to be undone. The reserved stock gets released, the authorized payment voided. This pattern, where every step knows a matching counter-action, is known as the saga. An orchestration knows the order of the steps and, with it, the order in which they get rolled back on failure.
Compensation has limits, though. Some actions cannot be cleanly taken back. A notification already sent reaches its recipient, a payout already triggered has landed. Two answers are common for steps like that. Either the irreversible action moves as late in the flow as possible, once all the risky steps have succeeded, or a business-level counter-action gets defined, a refund that cancels out the original effect, say. Which way fits is a business decision, and a purely technical view will miss it.
An orchestration without thought-through error handling leaves a half-finished state behind whenever something fails. Reserved stock, authorized payments, or created records then linger without their follow-up step. Retries and compensation are core parts of any production orchestration, never optional extras.
Short-lived and long-running flows
Orchestrations do not all finish in milliseconds. It helps to distinguish between two kinds of flows, because they put different demands on the controlling layer.
Short-lived flows complete within a single call. The order checks inventory, authorizes the payment, and triggers shipping, all in a few seconds. The orchestration keeps the state in memory for the duration of that one transaction. If something fails, the whole flow is still present and rolls back cleanly.
Long-running flows stretch over minutes, hours, or days. An approval process waits for a human sign-off, a provisioning waits for an external system to report back. Memory alone no longer covers that. The orchestration has to persist its state so a flow survives a restart of the controlling application. This is exactly where it shows why orchestration deserves a layer of its own. Holding the state of a days-long transaction on the side, inside a business service, overloads that service fast.
Short-lived orchestrations complete within a single call and keep their state in memory only. Long-running orchestrations stretch from minutes to days and have to persist their state, so a transaction survives a restart of the controlling application.
With duration, time limits grow in importance. A step waiting on an external response needs an upper bound after which the transaction counts as unsuccessful and moves into a defined state. Without such a bound, transactions pile up that quietly wait forever and tie down resources.
The new AsyncAPI support in Arazzo targets exactly these flows, because it describes steps that wait on an event. What version 1.1.0 brings along is covered in What’s new in Arazzo 1.1.
Orchestration as code or as a specification
That leaves the question of where the flow itself gets captured, in program code or in a declarative specification. It tends to get treated like a matter of faith, when it is really a case-by-case trade-off.
Code plays to its strength when a flow contains a lot of conditional logic that resists being pressed into a description. Branches, loops, and heavy computation read more naturally in a programming language. The price is that the sequence lies scattered through the code and reveals itself only by reading the program.
A declarative specification flips that relationship. The whole flow sits in one file and reads without programming knowledge. A tool can validate the description, derive tests from it, or hand it to an AI agent as a blueprint. Those advantages pay off most for flows that are stable and have to be understood across several teams. For highly dynamic logic with many special cases, code remains the better choice.
| Flow as code | Flow as specification |
|---|---|
| Strong with heavy conditional logic and many special cases | Strong for stable, cross-team flows |
| Sequence scattered across the program | Sequence readable in one place |
| Verification requires programming knowledge | Tools can validate, test, and generate from it |
In practice, the line often runs straight through a single transaction. The rough order of steps gets described declaratively, while individual steps keep their own code logic inside. How such a flow takes shape as a spec, step by step, is what Arazzo in practice walks through.
When orchestration pays off, and when it does not
Far from every flow needs a dedicated orchestration layer. Three questions help with the call.
- Do the steps genuinely depend on each other? When a later call needs the result of an earlier one, that argues for orchestration. If the calls are independent, a parallel fan-out without central control usually does the job.
- Does the order have to hold reliably? In a payment flow, the order is a business requirement. For loosely coupled notifications it may vary, and then choreography is the better fit.
- Is the same flow triggered from several places? When web, app, and partner interface kick off the same transaction, a central description keeps the logic from being maintained three times over.
Against a dedicated layer speaks a flow that consists of a single call, or steps that are independent of each other. In those cases, an extra layer creates more work than it saves. Orchestration pays off where the coordination already happens anyway, just invisibly, inside some service.
A simple marker is the number three. A transaction of two dependent calls can usually still live cleanly in the calling code. From three steps with data handoffs and their own failure logic onward, an explicit description gains value fast. The boundary is no hard number. It marks the point where the conversation about a dedicated orchestration becomes worth having.
In a project with a logistics company, the sequence of a multi-step shipment process lived inside a single service that had grown to call all the others over the years. Every change to the flow meant touching that service, even though the actual work happened in the neighboring systems. After the flow moved into its own declarative description, the order could change without touching the original service. The business services stayed untouched, and the flow became visible in one place for the first time.
Five common pitfalls
From architecture reviews of orchestrated flows, we know a set of typical trouble spots. The five most frequent deserve to be named.
- The hidden orchestration. A business service calls more and more of the others over time, until it has quietly become the central control. The logic exists, yet nobody has named it as orchestration or owns it.
- The overloaded layer. The orchestration absorbs business tasks that belong in the individual services. The thin control layer turns into a new monolith and repeats the original problem.
- Missing idempotency. Steps cannot be retried safely because a second call triggers a second effect. The first network failure that forces a retry produces double bookings or reservations.
- The blind spot on failure. The happy path is built carefully, the failure path only roughly. A flow that aborts midway then leaves a state behind that nobody can resolve with confidence.
- Missing monitoring. The flow runs, yet on failure the continuous trace is missing. Without a shared correlation ID, there is hardly a way to reconstruct at which step a transaction failed.
These patterns share one trait. Behind them is usually no wrong decision. Simply, no decision was made at all. Orchestration grows by creep, which is exactly why naming it early pays.
Orchestration and AI agents
AI agents are a new driver for the topic. An agent that is supposed to handle an order faces exactly the question a classic orchestrator answers. In which order does it call the participating services, and how does it react when a step fails?
Two approaches meet here. An agentic workflow lets the language model decide at runtime which step comes next. That is strong when the flow is genuinely variable. For a payment flow with a fixed order, that freedom is a risk. A declarative orchestration hands the agent a firm blueprint to execute, with no need to invent the sequence itself. The work on AI-ready APIs and on orchestration descriptions pulls toward the same goal, capturing flows so that machine consumers execute them dependably.
In practice, many teams combine both directions. An agent takes over the parts of a transaction that demand real judgment, assessing an edge case, say, and leans on a declarative description for the fixed core of the flow. The orchestration sets the dependable frame inside which the agent decides. The order stays fixed where it has to be fixed, and flexible where room to move adds value.
What this means for teams
Once orchestration becomes visible as a layer of its own, an organizational question follows. Who owns the flow? The individual services have clear owners. The transaction above them often has none. As long as the orchestration hides inside a business service, that service’s team carries the responsibility along without it ever being named.
An explicit orchestration description makes that responsibility visible. The flow gets a place, an owner, and its own versioning. That is more work up front, and it pays off wherever a transaction crosses team boundaries. For such a flow to stay findable, it belongs in the same catalog as the participating APIs, a topic that API catalogs in large organizations goes into.
It helps to treat the flow like an API of its own. It gets a name, a description, and a version, and changes to it are decided deliberately instead of slipping in through one team’s code. An implicit dependency between several services becomes a named artifact that teams can discuss and evolve together.
Look for the one service in your architecture that calls conspicuously many others. Odds are an orchestration is hiding there that nobody has named as such, and that exact service is the first candidate for making the flow explicit.
How to put this into practice
Getting started begins with an inventory, and a tool comes second. Take one business transaction that runs across several services today, draw the full sequence of calls once, and check which service currently holds the control without anyone having said so. That one sketch usually answers whether your architecture needs an explicit orchestration and where it has to start.
If you want an experienced outside view for that inventory, our Professional API Services pick up exactly there. We analyze grown API landscapes, surface hidden orchestrations, and work out a sustainable split between business work and flow control together with your team.