Join us for GraphQL Summit, October 10-12 in San Diego. Use promo code ODYSSEY for $400 off your pass.
Launch GraphOS Studio

Federation is not a saga orchestrator


Apollo Federation is a powerful system for orchestrating queries across a distributed system. For example, you can use Federation to:

Federation applies these same orchestration capabilities to s (and their selection sets), but managing state changes across distributed systems typically involves additional requirements. Federation does not provide common functionality for these requirements, such as compensating transactions or data propagation between sibling s.

You need to implement your own orchestration logic within a single to implement these requirements. This is due to the design of GraphQL itself.

Sequencing mutations in GraphQL

The only difference between the root Query and Mutation types is that mutation fields are executed serially instead of in parallel. At first glance, this seems like a way to sequence a set of related s.

mutation DoMultipleThings {
createAccount {
account {
validateAccount {
setAccountStatus {

The GraphQL spec ensures that validateAccount and setAccountStatus will not execute if createAccount fails. At first glance, this seems like a way to transactionally execute a set of s, but it has some significant downsides:

  • We don't have a way to roll back the changes made by createAccount if one of validateAccount or setAccountStatus fails.
  • We don't have a way to propagate data between s. For example, if Mutation.createAccount returns a Account with an id, we can't use that id in validateAccount or setAccountStatus.
  • Clients have to contend with a number of failure scenarios by inspecting the success of each and determining the appropriate course of action.

Even if all three of these s are implemented in the same service, it's difficult to implement the appropriate transactional semantics when run within the GraphQL execution engine. We should instead implement this as a single mutation so that we can handle failures and rollbacks in one function.

In this JavaScript example, we're implementing a simplistic "saga" that orchestrates state changes across multiple systems.

const resolvers = {
Mutation: {
async createAndValidateAccount() {
const account = createAccount();
try {
setAccountStatus(account, 'ACTIVE');
} catch (e) {
return {success: false};
return {success: true, account};

By representing this in a single , we can properly handle failure, rollback, and data propagation in one function. We also remove complex error management from our client applications and our more clearly expresses the client intent.

Distributed orchestration

But what if the state changes occur in different services? Because GraphQL and Federation do not provide semantics for distributed transactions, we still need to implement orchestration logic within a single .

This requirement leads to a few challenging questions:

  • Which team or domain owns this ?
  • In which service should we implement the ?
  • How does the communicate with the data services?

There are no universally correct answers to these questions. Your answers will depend on your organization, your product requirements, and the capabilities of your system.

One potential strategy is to create one or more s specifically for orchestrating s. These subgraphs can communicate with data services using REST or RPC. Experience or product teams are often the most appropriate owners of these subgraphs.

It's tempting to have the orchestrating s call back to the Apollo to execute domain-specific s. This approach is feasible, but it requires careful attention to a number of details:

  • All calls should go through the Apollo , not directly to s, because it's responsible for tracking GraphQL usage (and subgraphs should never accept traffic directly.)
  • The calls to domain-specific s no longer originate in client applications, so you must propagate client identity and other request metadata.
  • will view these calls as separate s.
  • If using cloud-native telemetry, you must ensure that the Apollo receives the trace context to correlate spans.
  • Be wary of circular dependencies. Consider using contracts to create s that expose only the domain-specific s and eliminate the potential for loops in an .
Edit on GitHubEditForumsDiscord