Picking sides. It’s only natural.
Us IT people, we invest a lot of time in certain languages, frameworks and even patterns. It’s completely normal that we look for reassurance that it was time well spent. When we finally get the hang of it, we’ll gladly spend an equal amount of time criticizing its competitors and arguing intensely with our peers about which one’s better. Java vs. C#, anyone?
Oh yeah, we can be a stubborn bunch too. Once we’ve picked a hill to climb, we’ll gladly die on it.
Not to mention that you’ll find an echo chamber for just about every opinion on the internet, and us ITers, we’re good at finding things on the internet, aren’t we?
I’m no exception. As soon as I heard the words ‘choreography’ and ‘orchestration’, I spent a few minutes furiously googling both terms and promptly decided that choreography just seemed more elegant. There. Decision made. Orchestration is clearly inferior.
The truth, as it turns out, is always more nuanced though, isn’t it? Let’s find out:
Our debate on choreography vs. orchestration isn’t one to be had at concert halls or ballet recitals. It’s one architects and developers started having in front of whiteboards, when distributed event-driven systems first started appearing.
So, what are we talking about here?
To maintain data consistency in a business process that flows through a distributed system, the saga design pattern can be used.
The saga is orchestrated when a single component, adeptly called ‘the orchestrator’, calls all the shots. It commands a service what to do, waits for its response and initiates the subsequent action(s) until the saga has been fully completed. A great plus for orchestration is observability. One component always has a clear overview on the status of the transaction. This is a major advantage when certain actions have the successful completion of another as a prerequisite. If something goes wrong, the orchestrator can easily halt the transaction, adjust or retry as necessary with compensating transactions until everything can carry on as before. Ideally, we don’t want a customer to be invoiced when it turns out that there’s no stock later on. Perhaps less ideal in orchestrations is the tight coupling between the orchestrator and every other service.
A choreographed saga has no central component. Only agreements and rules stand between order and total chaos. In theory a service needs to know just two things of the system in which it resides. First, it needs to know which events are interesting to consume and where to find them. Then, it needs to produce messages which other services might find interesting. That’s it. The rest of it, is none of its concern. When it works, it’s elegant and beautiful. It’s art. There is no central point to be a bottleneck in performance, availability or development. It enables microservices architecture from its most idealistic viewpoint. Complete independence.
But… Observability quickly becomes its Achilles’ heel. I’ve seen it happen. The amount of services grows fast. Documentation is neglected in favor of a faster time to market. There’s logging and monitoring, sure, but each day it’s growing wildly with millions of rows of unstandardized and unstructured information. We’ll fix that later… Later becomes never. Then, something breaks.
The concepts are fairly simple, getting your architecture right often isn’t. Making the right choice between choreography and orchestration is certainly a part of that.
Metaphorically speaking choreography and orchestration can be compared with a Jazz band and a classical music orchestra. While an orchestra is always tightly controlled by a conductor, a Jazz band often only has a certain baseline on which each player will build and improvise.
Full disclosure, I stole this great metaphor from a Medium blog by Chen Chen. Initially I wanted to expand on his vending machine example as well. It works great for the happy flow explanations of the saga pattern, but as it turns out, it doesn’t leave much room to talk about cost of change, compensating transactions and decision trees. A lack of imagination leaves me no choice but to return to the good old webshop product order flow!
A fictional webshop
Say our company were to choose for a choreographed approach for the order saga, this is what the happy flow might look like.
It might seem weird to start our order saga at payment, but everything that’s happened so far can be described as a series over synchronous transactions. A customer has browsed through our products, put a few items in his cart and ultimately, hopefully, heads for checkout. While these transactions and maybe a few others are certainly a prerequisite for the start of our saga, they’re not necessarily a part of it.
(1) After payment, everything can be handled asynchronously and that means kick-off for our saga. The payment service produces a ‘paymentCompleted’ (2) event to signal a successful payment.
The order service consumes the ‘paymentCompleted’ (3) event and creates an order with status ‘ordered’. To simplify things, our warehouse service holds all cart and product catalog information as well. In a full-fledged system these would probably be separate services. The order service caches stock information by listening to all stock related events from the warehouse microservice. More on that later. Finally, it sends out an ‘orderCreated’ (4) event.
The warehouse knows to listen for it (5), can do the order picking and send the order out for delivery with an ‘orderPrepared’ (6) event.
Our company is a truly vertically integrated one and handles even the deliveries. For this purpose, a separate delivery service exists. It takes all prepared orders (7) from the warehouse and produces an ‘orderDelivered’ event once it’s been successfully delivered to your doorstep.
The order service, finally, can now complete the order (9), changing its status to ‘delivered’.
An elegant choreographed solution, like a beehive or ant colony, can be a fully independent but coherent organism with minimal communication.
What if things go wrong? What if things change? What if the next transaction in the sequence depends on the outcome of multiple previous transactions?
We’ll look at all of these, just after the following orchestrated implementation.
I’ll not bore you with another run through the entire saga. Just notice that the orchestrator has nestled itself in between each service by ingesting each regular event and producing command events to weave everything together.
Happy flows are great, but systems should be designed for failure. Failure is inevitable. As mentioned before, the order service caches stock information by listening to all stock related events from the warehouse. The caching is very important. Besides orders, our order service holds all cart and product information too. This makes it quite tightly coupled to the front-end of our webshop. When a customer is browsing, adding products in his cart or creating orders, he expects immediate feedback on product availability. You could ask why the order service doesn’t just synchronously asks the warehouse service for stock information every time it’s needed. It could, but it’s a slippery slope when synchronous calls start creeping into the business process again. That was just the thing you were trying to replace.
Caching stock information in our order service provides a neat solution to this problem. The only caveat is that this information is second-hand data and only eventually consistent. Especially in systems with high traffic, as webshops tend to be, it’s only a matter of time before the system is caught in an inconsistency. It might just be that the cache didn’t synchronize fast enough. But let’s not forget the possibility of a human error. A stock count in the warehouse gone wrong, a product broken, … To set things right, compensating transactions are needed.
In our choreographed saga, the mistake is first noticed in the warehouse. A ‘stockDepleted’ (6) event is produced instead of an ‘orderPrepared’ event. Both de payment and the order service need to be aware (7) of this compensating transaction and act accordingly. The payment service initiates a refund (8), the order service adapts stock cache and cancels the standing order (9).
I know what it looks like. Just a few paragraphs ago I said that compensating transactions were easier to manage in an orchestrated saga and now you’re seeing a lot more green on the image above. I suppose this only accounts for mistakes that can’t or aren’t yet ‘automated away’. If we had no such event as ‘stockDepleted’, the warehouse could still turn to the orchestrator to set things right or at least see which other services might be impacted. In a choreographed approach, there would be no such central entity.
Besides functional errors, there’s technical failures too. An event schema validation error, a broken database connection, a time-out on a rare but still necessary synchronous call. It’s a lot easier to have a built-in retry mechanism in an orchestrator than it is to have each service catch and compensate for failed transactions.
Additionally, our example isn’t ‘the real thing’. It’s overly simplified because it’s patterns we want to talk about, not webshops. Subtle differences are magnified tenfold in more complex, realistic distributed systems. Something to keep in mind for our next ‘cost of change’ example as well.
Cost of change
To look at the differences in cost of change, we’ll shift our fictional company away from regular customers to more of a B2B approach. Aiming our shop at other businesses, it would make more sense to send invoices and not demand immediate payment.
A change in our order saga is needed. The invoice should only be sent after the creation of the order.
In our choreographed example, just the payment service and the order service are impacted. The warehouse isn’t, as billing and order picking can easily be done in parallel.
A little more shuffling around is needed in the orchestrated example. Just like in compensating transactions, this subtle difference will probably be amplified in a more complex, realistic environment. From a technical standpoint, an orchestration will always have a higher cost of change, as there is an extra service and its command events to deal with. I do believe however the same isn’t true from a functional standpoint. It’s a lot more obvious in an orchestrated saga where changes are needed. A component in a choreographed saga is very decoupled. So decoupled in fact, that it might not even know which services consume its events. This leaves a lot of room for mistakes when making changes, which might just cancel out the pure technical cost.
Decisions, decisions. Luckily you don’t have to make one between choreography and orchestration when decision trees come in to play. It’s simply not possible to choreograph your way out of this. Take a look.
Time to leave our webshop behind for something a bit more abstract. Service A and service B can produce either a negative or a positive result for the business logic they implement. They communicate the outcome to service C with an event. The services can produce these outcomes at any time (service A sends an event at abstract time 0 (T:0) and service B sends an event at abstract time 16 (T:16)). Service C needs to be able to correlate these events and based on its contents, produce a decision. It has no choice but to consume both events and keep state, which sort of makes it an orchestrator by default.
While the list above can serve as a guide, I wouldn’t overanalyze the possible advantages or disadvantages of a specific implementation. Some of these differences aren’t as extreme as the list might make them seem and most of them can be somewhat mitigated.
Often these conclusions tend to end in the same two words: it depends. It does, it always does. However, I feel that from the experiences I’ve had, I owe you a more decisive proposition.
In the real world, a hybrid of the two will usually be the right answer. Mix and match where you see fit! I still prefer the low coupling of choreography in truly large systems, but when complex flows, perhaps with decision trees come into play, some smaller orchestrations are your best bet.