In-Memory Caching

Configure router caching for query plans and automatic persisted queries

Both GraphOS Router and Apollo Router Core use an in-memory LRU cache to store the following data:

You can configure certain caching behaviors for generated query plans and APQ (but not introspection responses).

tip

If you have a GraphOS Enterprise plan, you can also configure a Redis-backed distributed cache that enables multiple router instances to share cached values. For details, see Distributed caching in the GraphOS Router.

Performance improvements vs stability

The router is a highly scalable and low-latency runtime. Even with all caching disabled, the time to process operations and query plans will be very minimal (nanoseconds to milliseconds) when compared to the overall supergraph request, except in the edge cases of extremely large operations and supergraphs. Caching offers stability to those running a large graph so that your overhead for given operations stays consistent, not that it dramatically improves. If you would like to validate the performance wins of operation caching, check out the traces and metrics in the router to take measurements before and after. In extremely large edge cases though, we have seen the cache save 2-10x time to create the query plan, which is still a small part of the overall request.

Caching query plans

Whenever your router receives an incoming GraphQL operation, it generates a query plan to determine which subgraphs it needs to query to resolve that operation.

By caching previously generated query plans, your router can skip generating them again if a client later sends the exact same operation. This improves your router's responsiveness.

The GraphOS Router enables query plan caching by default. In your router's YAML config file, you can configure the maximum number of query plan entries in the cache like so:

YAML

router.yaml

1supergraph:
2  query_planning:
3    cache:
4      in_memory:
5        limit: 512 # This is the default value.

Cache warm-up

When loading a new schema, a query plan might change for some queries, so cached query plans cannot be reused.

To prevent increased latency upon query plan cache invalidation, the router precomputes query plans for the most used queries from the cache when a new schema is loaded.

Precomputed plans will be cached before the router switches traffic over to the new schema.

tip

You can also send the header Apollo-Expose-Query-Plan: dry-run for generating query plans at runtime which can be used to warm up your cache instances with a custom defined operation list.

By default, the router warms up the cache with 30% of the queries already in cache, but it can be configured as follows:

YAML

router.yaml

1supergraph:
2  query_planning:
3    # Pre-plan the 100 most used operations when the supergraph changes
4    warmed_up_queries: 100

(In addition, the router can use the contents of the persisted query list to prewarm the cache. By default, it does this when loading a new schema but not on startup; you can configure it to change either of these defaults.)

To get more information on the planning and warm-up process use the following metrics (where <storage> can be redis for distributed cache or memory):

counters:
- apollo.router.cache.hit.time.count{kind="query planner", storage="<storage>"}
- apollo.router.cache.miss.time.count{kind="query planner", storage="<storage>"}
histograms:
- apollo.router.query_planning.plan.duration: time spent planning queries
  - planner: The query planner implementation used (rust or js)
  - outcome: The outcome of the query planning process (success, timeout, cancelled, error)
- apollo.router.schema.load.duration: time spent loading a schema
- apollo.router.cache.hit.time{kind="query planner", storage="<storage>"}: time to get a value from the cache
- apollo.router.cache.miss.time{kind="query planner", storage="<storage>"}
gauges
- apollo.router.cache.size{kind="query planner", storage="memory"}: current size of the cache (only for in-memory cache)
- apollo.router.cache.storage.estimated_size{kind="query planner", storage="memory"}: estimated storage size of the cache (only for in-memory query planner cache)

Typically, we would look at apollo.router.cache.size and the cache hit rate to define the right size of the in memory cache, then look at apollo.router.schema.load.duration and apollo.router.query_planning.plan.duration to decide how much time we want to spend warming up queries.

Cache warm-up with distributed caching

If the router is using distributed caching for query plans, the warm-up phase will also store the new query plans in Redis. Since all Router instances might have the same distributions of queries in their in-memory cache, the list of queries is shuffled before warm-up, so each Router instance can plan queries in a different order and share their results through the cache.

Cache warm-up with headers

Requires ≥ Router v1.61.0

With router v1.61.0+ and v2.x+, if you have enabled exposing query plans via --dev mode or plugins.experimental.expose_query_plan: true, you can pass the Apollo-Expose-Query-Plan header to return query plans in the GraphQL response extensions. You must set the header to one of the following values:

true: Returns a human-readable string and JSON blob of the query plan while still executing the query to fetch data.
dry-run: Generates the query plan and aborts without executing the query.

After using dry-run, query plans are saved to your configured cache locations. Using real, mirrored, or similar to production operations is a great way to warmup the caches before transitioning traffic to new router instances.

Caching automatic persisted queries (APQ)

Automatic Persisted Queries (APQ) enable GraphQL clients to send a server the hash of their query string, instead of sending the query string itself. When query strings are very large, this can significantly reduce network usage.

The router supports using APQ in its communications with both clients and subgraphs:

In its communications with clients, the router acts as a GraphQL server, because it receives queries from clients.
In its communications with subgraphs, the router acts as a GraphQL client, because it sends queries to subgraphs.

Because the router's role differs between these two interactions, you configure these APQ settings separately.

APQ with clients

The router enables APQ caching for client operations by default. In your router's YAML config file, you can configure the maximum number of APQ entries in the cache like so:

YAML

router.yaml

1apq:
2  router:
3    cache:
4      in_memory:
5        limit: 512 # This is the default value.

You can also disable client APQ support entirely like so:

YAML

router.yaml

1apq:
2  enabled: false

APQ with subgraphs

By default, the router does not use APQ when sending queries to its subgraphs.

In your router's YAML config file, you can configure this APQ support with a combination of global and per-subgraph settings:

YAML

router.yaml

1apq:
2  subgraph:
3    # Disables subgraph APQ globally except where overridden per-subgraph
4    all:
5      enabled: false
6    # Override global APQ setting for individual subgraphs
7    subgraphs:
8      products:
9        enabled: true

In the example above, subgraph APQ is disabled except for the products subgraph.

Log Exporters

Metrics Exporters

Trace Exporters

Instrumentation

AWS Lattice

Log Exporters

Metrics Exporters

Trace Exporters

Instrumentation

AWS Lattice