Join us for GraphQL Summit, October 10-12 in San Diego. Use promo code ODYSSEY for $400 off your pass.
Launch GraphOS Studio

Load testing a federated GraphQL API


When it comes to load testing a GraphQL API, the process is similar to that of any other type of API, and the majority of considerations will remain the same. However, there are certain distinctions specific to GraphQL and the Apollo ecosystem that are worth considering before conducting a load test.

It's important to note that the purpose of this article is to bring attention to these unique considerations for load testing with GraphQL, rather than serving as a comprehensive guide on the setup and execution of GraphQL load tests.

What to Load Test

When load testing a GraphQL API, there are two main ways to run the test:

  • Testing your entire deployed application stack
  • Testing each deployed service in isolation

Both approaches have benefits and tradeoffs. Testing services in isolation allows you to test the limits of each service independently and provides valuable data for making performance improvements for each independent service. However, it may not give insight into where to focus efforts to make the most impact on the end-user experience. Running a full stack test is better suited for finding these types of bottlenecks, but may offer little insight into the limits of other services. We recommend focusing on testing services in isolation to maximize insight into not only the current bottlenecks, but also potential ones. This doesn't mean full stack tests should be avoided, as they have their place.

Use Observability Tooling for Performance Insights

A load test is only as valuable as the information obtained after running it. Having the proper observability setup is crucial for a successful load test. When load testing a GraphQL API, you want metrics that are GraphQL-aware and provide insight into the runtime and execution behavior of production s. Apollo GraphOS' metrics engine provides this deep insight into s by allowing you to see things like slow operations and field-level performance metrics on every .

Gathering trace data can have a noticable impact on performance, so consider running tests with tracing disabled to measure peak load. Enabling traces during loads tests will help illuminate which parts of your graph are slow and what s could be optimized. Our docs on tracing performance considerations go into more detail about how to modify tracing behavior. The can emits metrics for time spent processing a request, outside of waiting for external or requests. Combined, both the and tracing metrics provide plenty of detail to diagnose performance issues after a load test.

Apollo offers first-class support for sending metrics outside Apollo Studio to any system that supports the Open Telemetry protocol (OTLP) as well as a built-in Datadog integration for enterprise customers.

Don't Pollute Production Metrics

When load testing a graph in production, exclude load test metrics from live production metrics. For metrics sent to , consider sending load test traffic to a dedicated , i.e. My-Graph@load-variant. This will allow you to cleanly separate production metrics from load tests.

consider excluding load test metrics from other parts of your system, as it applies.

Use a Realistic Load Pattern that Resembles Production Usage

Load test results are most helpful if they closely match the patterns of production traffic. Take a graph that only has a set number of internal client applications as an example. The applications have a pre-defined number of s that hit production. It wouldn't make sense to generate 200,000 unique operations as part of a load test if the total number actual operations is several orders of magnitude less than that. It's important to simulate realistic patterns during load testing to ensure the results are relevant and useful for improving performance. In addition, consider testing for different scenarios such as peak usage, unexpected traffic spikes, and long-term usage patterns to ensure the GraphQL API can handle various types of load.

Potential Bottlenecks

Performance bottlenecks will likley be at the application level (s and beyond), not in the or GraphQL execution engine. When we created our first load tests for the early alpha version of the Router, we had to create test suites to run against the Router and the Gateway. To fully reach the bottlenecks in the Gateway using realistic load patterns, we had to remove almost all latency in the s. For most real world setups, the time spent in the subgraph resolver code and the underlying s will be the bottleneck. The most likely scenario in which you'll see degraded performance at these layers is when the number of unique s is high.

Operation Cardinality

When a GraphQL API encounters a new , the operation must be parsed into the proper format, validated against the existing , and then finally executed. The goes through the additional step of query planning as well. These steps can account for some noticeable runtime overhead and latency if the volume of unique s is high enough. All of these steps must happen for the initial request of every unique operation against your graph.

Most of these steps are highly cacheable. Apollo has several built-in features that take advantage of this, such as Automatic Persisted Queries (APQs). APQ's are on by default in Apollo Server/Gateway and the Apollo , and just a simple configuration setting in Apollo Client to enable them. A graph which processes a high cardinality of s that are only executed once, can't take advantage of most of these features. This isn't a big concern in most real world applications as operations are normally executed more than once.

Edit on GitHubEditForumsDiscord