Docs
Launch GraphOS Studio

Deployment best practices

Best practices and workflows for deploying with managed federation


When rolling out changes to a subgraph, we recommend the following workflow:

  1. Confirm the backward compatibility of each change by running rover subgraph check in your CI pipeline.
  2. Merge backward compatible changes that successfully pass .
  3. Deploy changes to the in your infrastructure.
  4. Wait until all replicas finish deploying.
  5. Run rover subgraph publish to update your configuration:
rover subgraph publish my-supergraph@my-variant \
--schema ./accounts/schema.graphql \
--name accounts \
--routing-url https://my-running-subgraph.com/api

Pushing configuration updates safely

Whenever possible, you should update your configuration in a way that is backward compatible to avoid downtime. As suggested above, the best way to do this is to run rover subgraph check before updating. You should also generally seek to minimize the number of breaking changes you make to your schemas.

Additionally, call rover subgraph publish for a subgraph only after all replicas of that subgraph are deployed. This ensures that are in place for all that are executable against your , and operations can't attempt to access that do not yet exist.

In the rare case where a configuration change is not backward compatible with your 's , you should update your registered before you deploy your updated code.

You should also perform configuration updates that affect ning prior to (and separately from) other changes. This helps avoid a scenario where the query planner generates queries that fail validation in downstream services or violate your .

Examples of this include:

  • Modifying @key, @requires, or @provides s
  • Removing a type implementation from an interface

In general, always exercise caution when pushing configuration changes that affect your 's , and consider how those changes will affect your other .

Example scenario

Let's say we define a Channel interface in one , and we define types that implement Channel in two other :

# channel subgraph
interface Channel @key(fields: "id") {
id: ID!
}
# web subgraph
type WebChannel implements Channel @key(fields: "id") {
id: ID!
webHook: String!
}
# email subgraph
type EmailChannel implements Channel @key(fields: "id") {
id: ID!
emailAddress: String!
}

To safely remove the EmailChannel type from your :

  1. Perform a rover subgraph publish of the email that removes the EmailChannel type from its schema.
  2. Deploy a new version of the that removes the EmailChannel type.

The first step causes the to stop sending ...on EmailChannel, which would fail validation if sent to a that isn't aware of the type.

If you want to keep EmailType but remove it from the Channel interface, the process is similar. Instead of removing the EmailChannel type altogether, only remove the implements Channel addendum to the type definition. This is because the expands queries to interfaces or unions into on their implementing types.

For example, a such as...

query FindChannel($id: ID!) {
channel(id: $id) {
id
}
}

...generates two queries, one to each , like so:

# Generated by the query planner
# To email subgraph
query {
_entities(...) {
...on EmailChannel {
id
}
}
}
# To web subgraph
query {
_entities(...) {
...on WebChannel {
id
}
}
}

Currently, the expands all interfaces into implementing types.

Removing a subgraph

To "de-register" a with Apollo, call rover subgraph delete:

This action cannot be reversed!

rover subgraph delete my-supergraph@my-variant --name accounts

The next time it starts up or polls, your obtains an updated configuration that reflects the removed .

Advanced deployment workflows

With , you can control which version of your schema your fleet uses. In most cases, rolling over all of your router instances to a new schema version is safe, assuming you've used schema checks to confirm that your changes are backward compatible. Your deployment model, however, may require an advanced workflow to deploy a specific version of a schema.

Two types of advanced deployment workflows:

  • Blue-green deployment workflow. For deployments that require progressive rollout, such as blue-green deployments, you can configure your environments to refer to a single graph variant by pinning each environment's to your at deployment time. Using a single between different production environments enables to get usage reports and analyze the combined production traffic of all environments, as well as providing a consistent changelog of your schema over time.

  • Graph variant workflow. Changes at the level might involve a variety of different updates, such as migrating entities from one to another. If your infrastructure requires a more advanced deployment process to handle the different updates, you can use graph variants to manage fleets running with different configurations.

    A common use for is contracts, for example, to create separate for the public and private APIs of a .

Example blue-green deployment

This feature is in preview. Your questions and feedback are highly valueddon't hesitate to get in touch with your Apollo contact or on the official Apollo GraphQL Discord.

A blue-green deployment strategy uses two environments: one environment (blue) serves the schema for live traffic, and the other environment (green) uses a variant for a new release that's under development. When the new release is ready, traffic is migrated from the blue to the green environment. This cycle repeats with each new release.

As an example, follow these steps to deploy with a of a new release (green) environment; the example uses the GraphOS Platform API to perform custom actions:

  1. Publish all the release's at once using the Platform API publishSubgraphs mutation.

    ## Publish multiple subgraphs together in a batch
    ## and retrieve the associated launch, along with any downstream launches synchronously.
    mutation PublishSubgraphsMutation(
    $graphId: ID!
    $graphVariant: String!
    $revision: String!
    $subgraphInputs: [PublishSubgraphsSubgraphInput!]!
    ) {
    graph(id: $graphId) {
    publishSubgraphs(
    graphVariant: $graphVariant
    revision: $revision
    subgraphInputs: $subgraphInputs
    downstreamLaunchInitiation: "SYNC"
    ) {
    launch {
    id
    downstreamLaunches {
    id
    graphVariant
    status
    }
    }
    }
    }
    }

    This initiates a launch, as well as any downstream necessary for contracts. It returns the IDs, with downstream launch IDs configured to return synchronously (downstreamLaunchInitiation: "SYNC") with the .

    NOTE

    For , you can also request that any downstream return the associated with each launch, for example, downstreamLaunches { graphVariant }. When for a specific , be sure to pass the associated with the launch in the following steps.

  2. Poll for the completed and any downstream launches.

    ## Poll for the status of any individual launch by ID
    query PollLaunchStatusQuery($graphId: ID!, $graphVariant: String!, $launchId: ID!) {
    graph(id: $graphId) {
    variant(name: $graphVariant) {
    launch(id: $launchId) {
    status
    }
    }
    }
    }

    NOTE

    When polling for a , the $graphVariant of this must refer to the rather than the base variant. You can get it from the query in step 1, from Launch.graphVariant / downstreamLaunches { graphVariant }.

  3. After the and downstream launches have completed, retrieve the of the launch.

    ## Fetch the supergraph SDL by launch ID.
    query FetchSupergraphSDLQuery($graphId: ID!, $graphVariant: String!, $launchId: ID!) {
    graph(id: $graphId) {
    variant(name: $graphVariant) {
    launch(id: $launchId) {
    build {
    result {
    ... on BuildSuccess {
    coreSchema {
    coreDocument
    }
    }
    }
    }
    }
    }
    }
    }

    NOTE

    When retrieving for a , the $graphVariant of this must refer to a . You can get it from the query in step 1, from Launch.graphVariant / downstreamLaunches { graphVariant }.

  4. Deploy your with the -s or --supergraph option to specify the .

    • Specifying the -s or --supergraph option disables polling for the schema from Uplink.

    • For an example using the option in a docker run command, see Specifying the supergraph.

  5. If you need to roll back to a previous blue-green deployment, ensure the previous deployment is available and shift traffic back to the previous deployment.

    • A image must use an embedded via the --supergraph flag.

    • A deployment should include both and to ensure and schemas are compatible.

    • If a previous deployment can't be redeployed, repeat steps 3 and 4 with the launchID you want to roll back to. Ensure the deployed are compatible with the , then redeploy the with a newly fetched supergraph schema for your target launchID. Before considering only rolling back the , see its caveats.

Example canary deployment

A canary deployment applies updates in an environment separate from a live production environment and validates its updates starting with a small subset of production traffic. As updates are validated in the canary deployment, more production traffic is routed to it gradually until it handles all traffic.

To configure your canary deployment, you can fetch the for a for the canary deployment, then have that canary deployment report metrics to a prod . Similar to the blue-green deployment example, your canary deployment is pinned to the same as your other, live deployment, so metrics from both deployments are reported to the same graph variant. As your canary deployment is scaled up, it will eventually become the stable deployment serving all production traffic, so we want that deployment reporting to the live prod .

To configure a canary deployment for the prod :

  1. Publish all the canary deployment's at once using the Platform API publishSubgraphs mutation.

    ## Publish multiple subgraphs together in a batch
    ## and retrieve the associated launch, along with any downstream launches synchronously.
    mutation PublishSubgraphsMutation(
    $graphId: ID!
    $graphVariant: String!
    $revision: String!
    $subgraphInputs: [PublishSubgraphsSubgraphInput!]!
    ) {
    graph(id: $graphId) {
    publishSubgraphs(
    graphVariant: "prod" ## name of production variant
    revision: $revision
    subgraphInputs: $subgraphInputs
    downstreamLaunchInitiation: "SYNC"
    ) {
    launch {
    id
    downstreamLaunches {
    id
    graphVariant
    status
    }
    }
    }
    }
    }

    This initiates a launch, as well as any downstream necessary for contracts. It returns the IDs, with downstream launch IDs configured to return synchronously (downstreamLaunchInitiation: "SYNC") with the .

    NOTE

    For , you can also request that any downstream return the associated with each launch, for example, downstreamLaunches { graphVariant }. When for a specific , be sure to pass the associated with the launch in the following steps.

  2. Poll for the completed and any downstream launches.

    ## Poll for the status of any individual launch by ID
    query PollLaunchStatusQuery($graphId: ID!, $graphVariant: String!, $launchId: ID!) {
    graph(id: $graphId) {
    variant(name: $graphVariant) {
    launch(id: $launchId) {
    status
    }
    }
    }
    }

    NOTE

    When polling for a , the $graphVariant of this must refer to the rather than the base variant. You can get it from the query in step 1, from Launch.graphVariant / downstreamLaunches { graphVariant }.

  3. After the and downstream launches have completed, retrieve the of the launch.

    ## Fetch the supergraph SDL by launch ID.
    query FetchSupergraphSDLQuery($graphId: ID!, $graphVariant: String!, $launchId: ID!) {
    graph(id: $graphId) {
    variant(name: $graphVariant) {
    launch(id: $launchId) {
    build {
    result {
    ... on BuildSuccess {
    coreSchema {
    coreDocument
    }
    }
    }
    }
    }
    }
    }
    }

    NOTE

    When retrieving for a , the $graphVariant of this must refer to a . You can get it from the query in step 1, from Launch.graphVariant / downstreamLaunches { graphVariant }.

  4. Deploy your with the -s or --supergraph option to specify the .

    • Specifying the -s or --supergraph option disables polling for the schema from Uplink.

    • For an example using the option in a docker run command, see Specifying the supergraph.

  5. If you need to roll back, ensure the previous deployment is available and shift traffic back to the live deployment.

    • A image must use an embedded via the --supergraph flag.

    • A deployment should include both and to ensure and schemas are compatible.

    • If a previous deployment can't be redeployed, repeat steps 3 and 4 with the launchID you want to roll back to. Ensure the deployed are compatible with the , then redeploy the with a newly fetched supergraph schema for your target launchID. Before considering only rolling back the , see its caveats.

With your canary deployment reporting metrics to GraphOS, you can use GraphOS Studio to verify a canary's performance before rolling out changes to the rest of the .

Modifying query-planning logic

Treat migrations of your -planning logic similarly to how you treat database migrations. Carefully consider the effects on downstream services as the changes, and plan for "double reading" as appropriate.

Consider the following example of a Products and a Reviews :

# Products subgraph
type Product @key(fields: "upc") {
upc: ID!
nameLowerCase: String!
}
# Reviews subgraph
type Product @key(fields: "upc") {
upc: ID!
reviews: [Review]! @requires(fields: "nameLowercase")
nameLowercase: String! @external
}

Let's say we want to deprecate the nameLowercase and replace it with the name , like so:

# Products subgraph
type Product @key(fields: "upc") {
upc: ID!
nameLowerCase: String! @deprecated
name: String!
}
# Reviews subgraph
type Product @key(fields: "upc") {
upc: ID!
nameLowercase: String! @external
name: String! @external
reviews: [Review]! @requires(fields: "name")
}

To perform this migration in-place:

  1. Modify the Products to add the new . (As usual, first deploy all replicas, then use rover subgraph publish to push the new .)
  2. Deploy a new version of the Reviews with a that accepts either nameLowercase or name in the source object.
  3. Modify the Reviews 's schema in the registry so that it @requires(fields: "name").
  4. Deploy a new version of the Reviews with a that only accepts the name in its source object.

Alternatively, you can perform this with an atomic migration at the level, by modifying the subgraph's URL:

  1. Modify the Products to add the name (as usual, first deploy all replicas, then use rover subgraph publish to push the new ).
  2. Deploy a new set of Reviews replicas to a new URL that reads from name.
  3. Register the Reviews with the new URL and the schema changes above.

With this atomic strategy, the resolves all outstanding requests to the old URL that relied on nameLowercase with the old -planning configuration, which @requires the nameLowercase . All new requests are made to the new URL using the new -planning configuration, which @requires the name .

Reliability and security

Your fetches its configuration by polling Apollo Uplink, an Apollo-hosted endpoint specifically for serving configs. In the event that your updated config is inaccessible due to an outage in Uplink, your router continues to serve its most recently fetched configuration.

If you restart a instance or spin up a new instance during an Uplink outage, that instance can't fetch its configuration until Apollo resolves the outage.

The subgraph publish lifecycle

Whenever you call rover subgraph publish for a particular , it both updates that subgraph's registered schema and updates the 's managed configuration.

Because your is dynamically changing and multiple might be updated simultaneously, it's possible for changes to cause errors, even if rover subgraph check was successful. For this reason, updating a re-triggers in the cloud, ensuring that all subgraphs still compose to form a complete before updating the configuration. The workflow behind the scenes can be summed up as follows:

  1. The is uploaded to Apollo and indexed.
  2. The is updated in the registry to use its new schema.
  3. All are composed in the cloud to produce a new .
  4. If fails, the command exits and emits errors.
  5. If succeeds, Apollo Uplink begins serving the updated .

On the other side of the equation sits the . The router can regularly poll Apollo Uplink for changes to its configuration. The lifecycle of dynamic configuration updates is as follows:

  1. The polls for updates to its configuration.
  2. On update, the downloads the updated configuration, including the new .
  3. The uses the new to update its ning logic.
  4. The continues to resolve in-flight requests with the previous configuration, while using the updated configuration for all new requests.

Alternatively, instead of getting its configuration from Apollo Uplink, the can specify a path to a upon its deployment. This static configuration is useful when you want the router to use a schema different than the latest validated schema from Uplink, or when you don't have connectivity to Apollo Uplink. For an example of this workflow, see an example of configuring the router for blue-green deployment.

Rolling back a deployment

When rolling back a deployment, you must ensure the and version are compatible with the deployed and in the target environment, so all possible can be successfully executed.

Roll forward to revert

Rollbacks are typically implemented by rolling forward to a new version that reverts the changes in the code repository, then performing the full release process (publishing the and rolling out the new code together) as outlined in the change management tech note. This ensures the exposed by the matches the underlying . It's the safest approach when using the standard schema delivery pipeline where Apollo Uplink provides the to the for continuous deployment of new launches.

Roll back entire deployment

For blue-green deployment scenarios, where the and in a deployment have versioned Docker container images, you may be able to roll back the entire deployment (assuming no underlying database schema changes). Doing so ensures that the embedded in the router image is compatible with underlying subgraphs in the target environment. This kind of rollback is typically what happens when a blue-green deployment is aborted if post-promotion analysis fails.

Roll back supergraph schema only

In rare circumstances where a backwards compatible -only change is made (for example, setting progressive @override percentage), it may be possible to only rollback the by pinning the fleet to the supergraph schema for a specific launchID using the --supergraph flag.

This approach is only suitable for short term fixes for a limited set of schema-only changes. It requires the to pin to a specific launchID, as republishing the underlying will result in a new being generated.

Given the issues with this approach, in general we recommend implementing rollbacks by rolling forward to a new version.

Rollback guidelines

A summary of rollback guidelines:

  • Any rollback must ensure the 's is compatible with the underlying deployed in the target environment.

  • 's standard CI/CD schema delivery pipeline is the best choice for most environments seeking continuous deployment and empowerment of teams to ship both independently and with the safety of checks to prevent breaking changes. For details, see the change management tech note.

  • In environments with existing blue-green or canary deployments that rely on an immutable infrastructure approachwhere no in-place updates, patches, or configuration changes can be made on production workloadsthe image can use an embedded . The supergraph schema is set for the router with the --supergraph flag for a specific launchID that's generated by publishing the for the specific subgraph image versions used in a blue-green or canary deployment. In this way, a blue-green or canary deployment can be made immutable as a whole, so rolling back to a previous deployment ensures the 's is compatible with the underlying subgraphs deployed in the target environment.

  • In general, we don't recommend rolling back only the on the in isolation. compatibility must also be taken into account. Subsequent publishing of subgraphs generates a new supergraph schema that may lose rolled back changes, so in general it's better to fix the problem at the source of truth in the subgraph repository.

Previous
Federated schema checks
Next
Opt-in error reporting
Edit on GitHubEditForumsDiscord

© 2024 Apollo Graph Inc.

Privacy Policy

Company