Deployment Best Practices
Best practices and workflows for deploying federated supergraphs
This article covers deployment best practices, including:
Updating and removing subgraphs
Advanced deployment workflows, including blue-green and canary deployments
Before diving into those topics, it's important to understand the rover subgraph publish lifecycle that happens whenever you publish a subgraph's schema changes to GraphOS.
The rover subgraph publish lifecycle
Whenever you run the command rover subgraph publish for a particular subgraph, it updates the subgraph's schema and the router's configuration.
Because your graph is dynamically changing and multiple subgraphs might be updated simultaneously, it's possible for changes to cause composition errors, even if rover subgraph check was successful. For this reason, updating a subgraph re-triggers composition in GraphOS, ensuring that all subgraphs still compose to form a complete supergraph before updating the supergraph configuration. The workflow behind the scenes can be summed up as follows:
The subgraph schema is uploaded to GraphOS and indexed.
The subgraph is updated in the registry to use its new schema.
All subgraphs are composed in GraphOS to produce a new supergraph schema.
If composition fails, the command exits and emits errors.
If composition succeeds, Apollo Uplink begins serving the updated supergraph schema.
The router sits on the other side of the equation. The router regularly polls Apollo Uplink for changes to its configuration. The lifecycle of dynamic configuration updates is as follows:
The router polls Uplink for updates to its configuration.
On update, the router downloads the updated configuration, including the new supergraph schema.
The router uses the new supergraph schema to update its query planning logic. The router also prewarms the query plan cache with known queries in a separate thread.
The router continues resolving in-flight requests with the previous configuration and uses the updated configuration for all new requests once prewarming has completed.
Alternatively, instead of getting its configuration from Apollo Uplink, the router can specify a local file to a supergraph schema upon its deployment. This static configuration is useful when you want the router to use a schema different than the latest validated schema from Uplink or when you don't have connectivity to Apollo Uplink. For an example of this workflow, see an example of configuring the router for blue-green deployment.
Updating subgraphs safely
When rolling out changes to a subgraph, use the following workflow:
Confirm the backward compatibility of each schema change by running
rover subgraph checkin your CI pipeline.
Refer to the backward incompatible section on schema change management for more details.
Merge backward compatible changes that successfully pass schema checks.
Deploy changes to the subgraph in your infrastructure.
Wait until all replicas finish deploying.
Only publish schema changes to GraphOS after all replicas of that subgraph are deployed. You publish a subgraph's schema by running
rover subgraph publish:Bashrover subgraph publish my-supergraph@my-variant \ --schema ./accounts/schema.graphql \ --name accounts \ --routing-url https://my-running-subgraph.com/api
Waiting to publish the updated schema until after these steps ensures that:
Resolvers are in place for all operations that are executable against your graph.
Operations can't attempt to access fields that don't yet exist.
Changes affecting query planner performance
Certain changes to your subgraph schemas pass rover subgraph check, meaning the updated subgraph schema can successfully be composed into the supergraph. Despite this, some of these changes may harm the query planner's performance.
Examples of changes that can impact query planning include:
Modifying
@key,@requires,@provides, or@shareabledirective usageAdding or removing a type implementation from an interface
Using
interfaceObjectand adding new fields to an interface
Approach subgraph field and type migrations as you would database migrations. The example scenarios below provide guidance on handling these types of changes.
Example change to @requires
Consider the following example of a Products subgraph and a Reviews subgraph:
type Product @key(fields: "upc") {
upc: ID!
nameLowerCase: String!
}type Product @key(fields: "upc") {
upc: ID!
nameLowercase: String! @external
reviews: [Review]! @requires(fields: "nameLowercase")
}Suppose you want to deprecate the nameLowercase field and replace it with the name field, like so:
type Product @key(fields: "upc") {
upc: ID!
nameLowerCase: String! @deprecated
name: String!
}type Product @key(fields: "upc") {
upc: ID!
nameLowercase: String! @external
name: String! @external
reviews: [Review]! @requires(fields: "name")
}To perform this migration in place:
Modify the
Productssubgraph to add the new field usingrover subgraph publishto push the new subgraph schema.Deploy a new version of the
Reviewssubgraph with a resolver that accepts eithernameLowercaseornamein the source object.Modify the Reviews subgraph's schema in the registry so that it
@requires(fields: "name").Deploy a new version of the
Reviewssubgraph with a resolver that only accepts thenamein its source object.
Alternatively, you can perform this operation with an atomic migration at the subgraph level by modifying the subgraph's URL:
Modify the
Productssubgraph to add thenamefield (as usual, first deploy all replicas, then userover subgraph publishto push the new subgraph schema).Deploy a new set of
Reviewsreplicas to a new URL that reads fromname.Register the
Reviewssubgraph with the new URL and the schema changes above.
With this atomic strategy, the query planner resolves all outstanding requests to the old subgraph URL that relied on nameLowercase with the old query-planning configuration, which @requires the nameLowercase field. All new requests are made to the new subgraph URL using the new query-planning configuration, which @requires the name field.
Example interface type implementation removal
Suppose you define a Channel interface in one subgraph and other types that implement Channel in two other subgraphs:
interface Channel @key(fields: "id") {
id: ID!
}type WebChannel implements Channel @key(fields: "id") {
id: ID!
webHook: String!
}type EmailChannel implements Channel @key(fields: "id") {
id: ID!
emailAddress: String!
}To safely remove the EmailChannel type from your supergraph schema:
Perform a
rover subgraph publishof theemailsubgraph that removes theEmailChanneltype from its schema.Deploy a new version of the subgraph that removes the
EmailChanneltype.
The first step causes the query planner to stop sending fragments ...on EmailChannel, which would fail validation if sent to a subgraph that isn't aware of the type.
If you want to keep the EmailChannel type but remove it from the Channel interface, the process is similar. Instead of removing the EmailChannel type altogether, only remove the implements Channel addendum to the type definition. This is because the query planner expands queries to interfaces or unions into fragments on their implementing types.
For example, a query like this:
1query FindChannel($id: ID!) {
2 channel(id: $id) {
3 id
4 }
5}generates two queries, one to each subgraph, like so:
1query {
2 _entities(...) {
3 ...on EmailChannel {
4 id
5 }
6 }
7}1query {
2 _entities(...) {
3 ...on WebChannel {
4 id
5 }
6 }
7}Currently, the router expands all interfaces into implementing types.
Removing a subgraph
To "de-register" a subgraph with Apollo, call rover subgraph delete:
rover subgraph delete my-supergraph@my-variant --name accountsThe next time it starts up or polls, your router obtains an updated configuration that reflects the removed subgraph.
Advanced deployment workflows
With managed federation, you can control which version of your schema your router fleet uses. In most cases, rolling over all of your router instances to a new schema version is safe, assuming you've used schema checks to confirm that your changes are backward compatible. Your deployment model, however, may require an advanced workflow to deploy a specific schema version.
Two types of advanced deployment workflows:
Blue-green deployment workflow. For deployments that require progressive rollout, such as blue-green deployments, you can configure your environments to refer to a single graph variant by pinning each environment's supergraph schema to your routers at deployment time. Using a single variant between different production environments enables GraphOS Studio to get usage reports, analyze the combined production traffic of all environments, and provide a consistent changelog of your schema over time.
Graph variant workflow. Changes at the router level might involve various different updates, such as migrating entities from one subgraph to another. If your infrastructure requires a more advanced deployment process to handle the different router updates, you can use graph variants to manage router fleets running with different configurations.
A common use for graph variants is contracts, for example, to create separate contract variants for the public and private APIs of a supergraph schema.
Example blue-green deployment
A blue-green deployment strategy uses two environments: one environment (blue) serves the schema variant for live traffic, and the other environment (green) uses a variant for a new release that's under development. When the new release is ready, traffic is migrated from the blue to the green environment. This cycle repeats with each new release.
As an example, follow these steps to deploy with a supergraph schema of a new release (green) environment; the example uses the GraphOS Platform API to perform custom GraphOS actions:
Publish all the release's subgraphs at once using the Platform API
publishSubgraphsmutation.GraphQL1## Publish multiple subgraphs together in a batch 2## and retrieve the associated launch, along with any downstream launches synchronously. 3mutation PublishSubgraphsMutation( 4 $graphId: ID! 5 $graphVariant: String! 6 $revision: String! 7 $subgraphInputs: [PublishSubgraphsSubgraphInput!]! 8) { 9 graph(id: $graphId) { 10 publishSubgraphs( 11 graphVariant: $graphVariant 12 revision: $revision 13 subgraphInputs: $subgraphInputs 14 downstreamLaunchInitiation: "SYNC" 15 ) { 16 launch { 17 id 18 downstreamLaunches { 19 id 20 graphVariant 21 status 22 } 23 } 24 } 25 } 26}
This initiates a launch, as well as any downstream launches necessary for contracts. It returns the launch IDs, with downstream launch IDs configured to return synchronously (downstreamLaunchInitiation: "SYNC") with the mutation.
downstreamLaunches { graphVariant }. When querying for a specific launch, be sure to pass the variant associated with the launch in the following steps.Poll for the completed launch and any downstream launches.
GraphQL1## Poll for the status of any individual launch by ID 2query PollLaunchStatusQuery($graphId: ID!, $graphVariant: String!, $launchId: ID!) { 3 graph(id: $graphId) { 4 variant(name: $graphVariant) { 5 launch(id: $launchId) { 6 status 7 } 8 } 9 } 10} 11noteWhen polling for a contract, the$graphVariantargument of this query must refer to the contract variant rather than the base variant. You can get it from the query in step 1, fromLaunch.graphVariant / downstreamLaunches { graphVariant }.After the launch and downstream launches have completed, retrieve the supergraph schema of the launch.
GraphQL1## Fetch the supergraph SDL by launch ID. 2query FetchSupergraphSDLQuery($graphId: ID!, $graphVariant: String!, $launchId: ID!) { 3 graph(id: $graphId) { 4 variant(name: $graphVariant) { 5 launch(id: $launchId) { 6 build { 7 result { 8 ... on BuildSuccess { 9 coreSchema { 10 coreDocument 11 } 12 } 13 } 14 } 15 } 16 } 17 } 18} 19noteWhen retrieving for a contract, the$graphVariantargument of this query must refer to a contract variant. You can get it from the query in step 1, fromLaunch.graphVariant / downstreamLaunches { graphVariant }.Deploy your routers with the
-sor--supergraphoption to specify the supergraph schema.Specifying the
-sor--supergraphoption disables polling for the schema from Uplink.For an example using the option in a
docker runcommand, see Specifying the supergraph.
If you need to roll back to a previous blue-green deployment, ensure the previous deployment is available and shift traffic back to the previous deployment.
A router image must use an embedded supergraph schema via the
--supergraphflag.A deployment should include both router and subgraphs to ensure resolvers and schemas are compatible.
If a previous deployment can't be redeployed, repeat steps 3 and 4 with the
launchIDyou want to roll back to. Ensure the deployed subgraphs are compatible with the supergraph schema, then redeploy the router with a newly fetched supergraph schema for your targetlaunchID. Before considering only rolling back the supergraph schema, see its caveats.
Example canary deployment
A canary deployment applies graph updates in an environment separate from a live production environment and validates its updates starting with a small subset of production traffic. As updates are validated in the canary deployment, more production traffic is routed to it gradually until it handles all traffic.
To configure your canary deployment, you can fetch the supergraph schema for a launchID for the canary deployment, then have that canary deployment report metrics to a prod variant. Similar to the blue-green deployment example, your canary deployment is pinned to the same graph variant as your other, live deployment, so metrics from both deployments are reported to the same graph variant. As your canary deployment is scaled up, it will eventually become the stable deployment serving all production traffic, so we want that deployment reporting to the live prod variant.
To configure a canary deployment for the prod graph variant:
Publish all the canary deployment's subgraphs at once using the Platform API
publishSubgraphsmutation.GraphQL1## Publish multiple subgraphs together in a batch 2## and retrieve the associated launch, along with any downstream launches synchronously. 3mutation PublishSubgraphsMutation( 4 $graphId: ID! 5 $graphVariant: String! 6 $revision: String! 7 $subgraphInputs: [PublishSubgraphsSubgraphInput!]! 8) { 9 graph(id: $graphId) { 10 publishSubgraphs( 11 graphVariant: "prod" ## name of production variant 12 revision: $revision 13 subgraphInputs: $subgraphInputs 14 downstreamLaunchInitiation: "SYNC" 15 ) { 16 launch { 17 id 18 downstreamLaunches { 19 id 20 graphVariant 21 status 22 } 23 } 24 } 25 } 26}
This initiates a launch, as well as any downstream launches necessary for contracts. It returns the launch IDs, with downstream launch IDs configured to return synchronously (downstreamLaunchInitiation: "SYNC") with the mutation.
downstreamLaunches { graphVariant }. When querying for a specific launch, be sure to pass the variant associated with the launch in the following steps.Poll for the completed launch and any downstream launches.
GraphQL1## Poll for the status of any individual launch by ID 2query PollLaunchStatusQuery($graphId: ID!, $graphVariant: String!, $launchId: ID!) { 3 graph(id: $graphId) { 4 variant(name: $graphVariant) { 5 launch(id: $launchId) { 6 status 7 } 8 } 9 } 10} 11noteWhen polling for a contract, the$graphVariantargument of this query must refer to the contract variant rather than the base variant. You can get it from the query in step 1, fromLaunch.graphVariant / downstreamLaunches { graphVariant }.After the launch and downstream launches have completed, retrieve the supergraph schema of the launch.
GraphQL1## Fetch the supergraph SDL by launch ID. 2query FetchSupergraphSDLQuery($graphId: ID!, $graphVariant: String!, $launchId: ID!) { 3 graph(id: $graphId) { 4 variant(name: $graphVariant) { 5 launch(id: $launchId) { 6 build { 7 result { 8 ... on BuildSuccess { 9 coreSchema { 10 coreDocument 11 } 12 } 13 } 14 } 15 } 16 } 17 } 18} 19noteWhen retrieving for a contract, the$graphVariantargument of this query must refer to a contract variant. You can get it from the query in step 1, fromLaunch.graphVariant / downstreamLaunches { graphVariant }.Deploy your routers with the
-sor--supergraphoption to specify the supergraph schema.Specifying the
-sor--supergraphoption disables polling for the schema from Uplink.For an example using the option in a
docker runcommand, see Specifying the supergraph.
If you need to roll back, ensure the previous deployment is available and shift traffic back to the live deployment.
A router image must use an embedded supergraph schema via the
--supergraphflag.A deployment should include both router and subgraphs to ensure resolvers and schemas are compatible.
If a previous deployment can't be redeployed, repeat steps 3 and 4 with the
launchIDyou want to roll back to. Ensure the deployed subgraphs are compatible with the supergraph schema, then redeploy the router with a newly fetched supergraph schema for your targetlaunchID. Before considering only rolling back the supergraph schema, see its caveats.
With your canary deployment reporting metrics to GraphOS, you can use GraphOS Studio to verify a canary's performance before rolling out changes to the rest of the graph.