Try Apollo Studio

Improving Apollo Gateway performance


If your supergraph gateway uses Apollo Server with the @apollo/gateway library, this article provides recommendations for improving your gateway's performance.

Consider moving to the Apollo Router

The Apollo Router is a high-throughput, low-latency Apollo Federation GraphQL gateway written in Rust. Its performance and consistency significantly exceed any Node.js-based gateway.

Migrating to Apollo Router from @apollo/gateway

Disable or reduce ftv1 inline tracing

Inline tracing (also known as federated tracing or ftv1) provides helpful field-level latency information in Apollo Studio, however it comes at a cost: trace information can be expensive to calculate, and it's attached to subgraph responses, increasing payload size.

In Apollo Server 3.6 and later, you can set the fieldLevelInstrumentation option to disable inline tracing entirely, or limit tracing to a representative sample of all requests.

  • During load testing, we recommend disabling tracing entirely.
  • For normal production traffic, we recommend setting a low sample rate such as 0.01 (one percent of requests).

Usage Reporting plugin docs for fieldLevelInstrumentation

Change the default maxSockets setting

The maxSockets setting controls the maximum number of connections that Apollo Gateway's fetch implementation can create to each subgraph host and port. With higher values, fewer subgraph requests are queued in the gateway, but the event loop might become overloaded serializing responses from subgraphs.

In Apollo Gateway versions prior to v0.51.0, the default maxSockets setting is 15, which is too low for many systems. In later versions, the default is Infinity, which lends itself to the overload scenario.

The following ApolloGateway constructor sets maxSockets to 50:

import {ApolloGateway, RemoteGraphQLDataSource} from '@apollo/gateway';
import fetcher from 'make-fetch-happen';
const gateway = new ApolloGateway({
buildService({name, url}) {
return new RemoteGraphQLDataSource({
fetcher: fetcher.defaults({maxSockets: 50})

Configuring the subgraph fetcher

Instrument Gateway middleware and plugins

If you've installed middleware (such as Express middleware) or Apollo Server plugins, it's worth using a code profiling tool such as clinicjs to investigate potential issues, such as synchronous code blocking the main thread or expensive plugin functions.

Use OpenTelemetry tracing

OpenTelemetry is the CNCF standard for instrumenting distributed systems. By tracing Apollo Gateway, subgraph services, data sources, and any other systems in the request path, you might discover that the cause of performance issues occurs outside of Apollo Gateway.

OpenTelemetry in Apollo Gateway

Investigate subgraph latency

Try running load tests against the subgraphs directly. You can simulate requests from the Gateway by querying the _entities field:

query MyQuery__subgraphA__0($representations: [_Any!]!) {
_entities(representations: $representations) {
... on EntityA {
... on EntityB {

By testing _entities queries, you might discover that you need to implement a DataLoader in your reference resolvers.

The slower your subgraphs, the more time the Gateway spends keeping track of concurrent in-flight requests. The Gateway's maximum requests per second drops linearly as subgraph latencies rise.

Tweak CPU and memory limits

You might need to set higher CPU and memory limits on your Apollo Gateway pods. Determining the best limits for your system depends on a number of factors:

  • Size of operations
  • Cardinality of operations (number of unique shapes)
  • Complexity of query plans
  • Subgraph latency
  • Use of shared caches

In general, look at increasing memory limits as Node.js is single-threaded and cannot use more than one CPU. That being said, we recommend keeping CPU utilization below 50%. At higher CPU utilization percentages, you may start to see event loop delays increase relative. Monitoring both memory usage and CPU upsage metrics can help provide a clearer image of bottlenecks beyond just code profiling alone.

If CPU throttling becomes a problem, consider removing the limits on your gateway pods as discussed in this blog post on Kubernetes resource limits.

Check query plans

Query plans (the steps the Gateway takes to resolve an operation from multiple subgraphs) are almost always reasonable approximations of the original operation. They're also highly cacheable so they don't incur much of a performance penalty. But it's worth looking at them to make sure they're of an appropriate size and complexity. Fragment and abstract type usage may cause undesirable query plans in very specific cases.

const gateway = new ApolloGateway({
debug: true // print query plans to stdout

Add more instances

Apollo Gateway is designed to scale horizontally. If you've tried the previous strategies and performance still isn't where it should be, distribute the load across more instances of Apollo Gateway.

Edit on GitHub