Multi-Region Edge Cache Architecture

Deploy GraphOS Router with Redis as a globally distributed edge cache

This guide describes a reference architecture for deploying GraphOS Router with Redis as part of a globally distributed edge caching system. Use this pattern when you need low-latency GraphQL responses across multiple geographic regions with consistent cache invalidation.

When to use this architecture

Consider this architecture when you need:

Global low-latency responses: Serve users from the nearest region with cached data
High availability: Regional failures don't take down your entire GraphQL API
Shared cache across router instances: Multiple router replicas in a region share cached data
Consistent invalidation: Changes propagate to all regions quickly

This pattern is more complex than a single-region deployment. For simpler use cases, see the Response Caching Quickstart.

Architecture overview

The following diagram shows a multi-region deployment with tiered caching:

Architecture components

Layer	Component	Purpose
Edge	Global Load Balancer + CDN	Route requests to nearest region, cache GET responses at edge
Edge	WAF (Web Application Firewall)	Protect against malicious requests
L1	In-process cache	Query plan caching, hot data with microsecond latency
L2	Regional Redis	Shared response cache across router replicas in a region
L3	Global distributed store	Optional cross-region cache for expensive computations
Control	Pub/Sub	Broadcast invalidation events to all regions
Control	Change Data Capture	Trigger invalidations from database changes

Cache tiers

L1: In-process cache

Each router instance maintains an in-process cache for:

Query plans: Avoid re-planning identical queries
APQ (Automatic Persisted Queries): Map query hashes to full query text

This cache is local to each router instance and provides microsecond-level latency. Configure query plan caching:

YAML

router.yaml

1supergraph:
2  query_planning:
3    cache:
4      in_memory:
5        limit: 512 # Number of query plans to cache

For high-traffic deployments, you can also back the query plan cache with Redis for sharing across instances. See Query Plan Caching.

L2: Regional Redis

Regional Redis provides a shared cache for all router instances in a region. This is where response caching stores cached subgraph responses.

YAML

router.yaml

1response_cache:
2  enabled: true
3  subgraph:
4    all:
5      enabled: true
6      ttl: 3600s # 1 hour default TTL
7      redis:
8        urls: ["redis://redis.us-east1.internal:6379"]
9        pool_size: 10
10        namespace: "router:response_cache"

For multi-region deployments, configure each region's routers to use the regional Redis instance:

YAML

router.yaml (us-east1)

1response_cache:
2  subgraph:
3    all:
4      redis:
5        urls: ["redis://redis.us-east1.internal:6379"]

YAML

router.yaml (europe-west1)

1response_cache:
2  subgraph:
3    all:
4      redis:
5        urls: ["redis://redis.europe-west1.internal:6379"]

tip

Use environment variables to configure region-specific Redis URLs:

YAML

router.yaml

1response_cache:
2  subgraph:
3    all:
4      redis:
5        urls: ["${env.REDIS_URL}"]

Redis high availability

For production deployments, use Redis with high availability:

Redis Cluster: Horizontal scaling with automatic sharding
Redis Sentinel: Automatic failover for single-primary setups
Managed Redis: Cloud provider managed services (AWS ElastiCache, GCP Memorystore, Azure Cache for Redis)

YAML

router.yaml (Redis Cluster)

1response_cache:
2  subgraph:
3    all:
4      redis:
5        urls: ["redis-cluster://node1:6379?node=node2:6379&node=node3:6379"]

See Redis URL Configuration for connection string formats.

L3: Global distributed store (optional)

For data that's expensive to compute and rarely changes, you can add a global L3 cache tier using a distributed database like Cloud Bigtable, DynamoDB Global Tables, or CockroachDB.

The L3 tier is not a built-in router feature. You would implement it as:

A coprocessor that checks L3 before forwarding to subgraphs
Custom logic in your subgraphs that checks L3 before querying origin databases

This tier is only necessary for specific use cases where cross-region cache sharing provides significant cost savings.

Subgraph caching

Subgraphs can maintain their own Redis cache, independent of the router's response cache. This is useful when:

Subgraphs have expensive data fetching operations
Multiple fields share the same underlying data
You want caching at the resolver level

The router's response cache and subgraph caches serve different purposes:

Cache	What it caches	Invalidation
Router response cache	Subgraph HTTP responses (entity representations)	Via router invalidation API
Subgraph cache	Resolver-level data, database query results	Subgraph-specific logic

Event-driven invalidation

In a multi-region deployment, cache invalidation must propagate to all regions. Use a pub/sub system to broadcast invalidation events.

Invalidation flow

Router invalidation endpoint

Configure each router to expose an invalidation endpoint:

YAML

router.yaml

1response_cache:
2  enabled: true
3  invalidation:
4    listen: "0.0.0.0:4000"
5    path: "/invalidation"
6  subgraph:
7    all:
8      enabled: true
9      redis:
10        urls: ["${env.REDIS_URL}"]
11      invalidation:
12        enabled: true
13        shared_key: "${env.INVALIDATION_SHARED_KEY}"

caution

Only expose the invalidation endpoint to internal networks. Use network policies or service mesh to restrict access.

Invalidation service

Create an invalidation service in each region that:

Subscribes to the pub/sub topic
Transforms events into router invalidation requests
Calls the router's invalidation endpoint

Example invalidation request:

Bash

1curl --request POST \
2  --header "Authorization: ${INVALIDATION_SHARED_KEY}" \
3  --header "Content-Type: application/json" \
4  --url http://router:4000/invalidation \
5  --data '[{
6    "kind": "cache_tag",
7    "subgraphs": ["products"],
8    "cache_tag": "product-42"
9  }]'

See Cache Invalidation for all invalidation methods.

Change data capture

Use change data capture (CDC) to automatically trigger invalidations when database records change:

Debezium: Open source CDC for various databases
Cloud-native CDC: AWS DMS, GCP Datastream, Azure Data Factory

CDC captures database changes and publishes them to your pub/sub system, which then triggers cache invalidation across all regions.

Edge layer integration

CDN caching

A CDN can cache GraphQL responses at the edge for read-heavy workloads. This works best for:

GET requests: Queries sent as GET requests with query parameters
Public data: Responses without user-specific content
High cache hit rates: Popular queries requested by many users

Configure your CDN to:

Cache responses based on the full URL (including query parameters)
Respect Cache-Control headers from the router
Forward cache misses to the nearest router region

The router includes Cache-Control headers in responses based on the minimum TTL of cached entities.

APQ with GET requests

Automatic Persisted Queries (APQ) enable sending queries as GET requests, making them cacheable by CDNs:

YAML

router.yaml

1apq:
2  enabled: true
3  router:
4    cache:
5      redis:
6        urls: ["${env.REDIS_URL}"]

With APQ, clients send a query hash instead of the full query text. The CDN can cache responses by hash, and the router resolves hashes to full queries from Redis.

Multi-region deployment

Regional router configuration

Each region needs routers configured with:

Regional Redis URL
Regional subgraph endpoints (or cross-region if subgraphs aren't deployed locally)

Use environment variables or a configuration management system to manage region-specific settings.

Cross-region subgraph routing

In the architecture diagram, europe-west1 doesn't have local subgraphs—it routes to us-east1 subgraphs cross-region. This is a valid pattern when:

Some regions only need router + cache (read-heavy, latency-tolerant)
Subgraph deployment is expensive or complex
Data residency requirements allow it

Configure cross-region routing with longer timeouts to account for network latency:

YAML

router.yaml

1traffic_shaping:
2  all:
3    timeout: 30s # Longer timeout for cross-region calls
4  subgraphs:
5    products:
6      timeout: 45s # Even longer for slow subgraphs

Monitoring

Monitor cache effectiveness across regions:

YAML

router.yaml

1telemetry:
2  instrumentation:
3    instruments:
4      cache:
5        apollo.router.operations.response_cache:
6          attributes:
7            subgraph.name:
8              subgraph_name: true

Key metrics to track:

Metric	What it tells you
`apollo.router.operations.response_cache.hit`	Cache hit rate by subgraph
`apollo.router.operations.response_cache.miss`	Requests hitting origin
`apollo.router.cache.storage.estimated_size`	Cache memory usage
Redis latency	Network overhead for cache operations

See Response Cache Observability for detailed monitoring guidance.

Implementation checklist

Use this checklist when implementing the architecture:

Redis per region: Deploy Redis with high availability in each region
Router fleet: Deploy multiple router replicas per region behind a load balancer
Invalidation endpoint: Configure and secure the invalidation endpoint
Pub/Sub: Set up pub/sub topics for invalidation events
Invalidation services: Deploy subscribers in each region
CDC (optional): Configure change data capture for automatic invalidation
CDN: Configure CDN caching rules for GET requests
Monitoring: Set up dashboards for cache metrics across regions
Alerting: Alert on cache hit rate drops, Redis connectivity issues

Telemetry Data

Metrics

Traces

Usage Guides

Subgraph Observability

Client Observability

Telemetry Exporters

Metrics Exporters

Log Exporters

Trace Exporters

APM Guides

Datadog

Connecting to Datadog

Datadog Agent

New Relic

Prometheus

Zipkin

Jaeger

Dynatrace

AWS Lattice

Telemetry Data

Metrics

Traces

Usage Guides

Subgraph Observability

Client Observability

Telemetry Exporters

Metrics Exporters

Log Exporters

Trace Exporters

APM Guides

Datadog

Connecting to Datadog

Datadog Agent

New Relic

Prometheus

Zipkin

Jaeger

Dynatrace

AWS Lattice