Router Instruments
Standard metric instruments for the router's request lifecycle
Standard Metric Instruments
GraphOS Router and Apollo Router Core provide a set of standard router instruments that expose detailed information about the router's request lifecycle. You can consume the metrics they capture by configuring a metrics exporter.
Standard router instruments are different than OpenTelemetry (OTel) instruments or custom instruments:
Router instruments provide standard metrics about the router request lifeycle and have names starting with
apollo.routerorapollo_router.OTel instruments provide metrics about the HTTP lifecycle and have names starting with
http.Custom instruments provide customized metrics about the router request lifecycle.
The rest of this reference lists the available standard router instruments.
Measuring router overhead
Measuring overhead in the router can be challenging because it consists of multiple components, each executing tasks in parallel. Subgraph latency, cache performance, and customization plugins influence performance and have the potential to cause back pressure. Limitations to CPU, memory, and network bandwidth can all create bottlenecks that hinder request processing. External factors such as request rate, operation complexity, and response sizes all heavily affect the router’s overall load.
You can find the activity of a particular request in its trace spans. Spans have the following attributes:
busy_ns- time in which the span is actively executingidle_ns- time in which the span is alive, but not actively executing
These attributes represent how a span spends time (in nanoseconds) over its lifetime and how it contributes to the total request overhead. Your APM provider can use this trace data to generate synthetic metrics, enabling you to approximate how particular spans affect router performance.
Router overhead metric
The apollo.router.overhead histogram provides a more direct measurement of router processing overhead. This metric tracks the time the router spends on tasks other than waiting for downstream HTTP requests—including:
GraphQL parsing
GraphQL validation
Query planning
Response parsing and composition
Plugin execution, including Coprocessors and Rhai scripts
The overhead calculation excludes time spent waiting for downstream HTTP services (subgraphs and connectors), giving you visibility into the router's actual processing time versus downstream latency. This metric helps identify when the router itself is a bottleneck versus when delays are caused by downstream services.
Default attributes:
subgraph.active_requests: A boolean indicating whether any subgraph requests were active at the time the overhead was calculated. This attribute is critical for filtering meaningful overhead measurements.For operations that stream results (such as queries with
@defer), the overhead metric becomes less meaningful because the router is in a waiting state instead of actively processing. When analyzing overhead to identify router processing bottlenecks, exclude measurements wheresubgraph.active_requests: trueto focus only on pure router processing time without subgraph wait time interference.
Configuration example:
1telemetry:
2 instrumentation:
3 instruments:
4 router:
5 apollo.router.overhead: trueYou can attach custom attributes using router selectors.
Important considerations
Version variability: Router overhead might vary between router versions. For example, a correctness fix or security improvement might result in higher or lower overhead. Always compare overhead measurements within the same router version, using the metric as a trend indicator.
Configuration requirements: To provide meaningful and consistent overhead measurements, configure operation limits and traffic shaping. Without these controls, unbounded request complexity or traffic spikes can skew overhead measurements.
CPU saturation: High overhead values often indicate CPU saturation which may be caused by other parts of the system. When the router's CPU resources are exhausted, processing time, and therefore the overhead, increases significantly. Monitor CPU utilization alongside overhead metrics to identify resource constraints.
Coprocessor requests: Coprocessor request time is currently included in the overhead calculation. In a future release, coprocessor time might be excluded, similar to subgraphs and connectors.
Common questions
What should your target be for "optimal" router overhead?
Apollo provides best practices for monitoring overhead, but many other factors determine overhead for a given supergraph, subgraphs, and request traffic. Instead, track the delta or percentage change of the metric in your production deployment rather than a fixed number or specific variance.
What should your target be for "optimal" router CPU utilization?
Using the various metrics, you can better understand CPU usage. What is optimal for your team depends on your expected load, spikes in traffic, time to scale infrastructure, and other factors. Apollo cannot provide generalized recommendations but encourages you to perform your own validations and share your results with the Apollo Community.
How is this different from
http.server.request.duration?The OTel standard instruments include the entire time, as seen by your clients, to process a request which will include the delays caused by subgraph response times.
How do I measure subgraph response time not included in the overhead metric?
See the docs for measuring instruments for the subgraph service.
Why is there a difference from Router v1
apollo_router_processing_time?Router v2 was refactored in how it processes requests through its internal pipelines. Changes to the underlying architecture are now reflected in the new metric. To track particular parts of the Router request pipeline, see the metrics below.
How do I find the overhead for a particular operation type (query, mutation, subscription) or operation name?
To filter overhead by specific operation type or operation ID, look at trace and span attributes and use your APM to create specific metric views for
busy_nsandidle_ns. Becauseapollo.router.overheadis a histogram metric, it is aggregated over all operations.
GraphQL
apollo.router.graphql_error- counts GraphQL errors in responses. Also counts errors which occur during the response validation phase, which are represented in client responses asextensions.valueCompletioninstead of actual GraphQL errors. Attributes:code: error code, includingRESPONSE_VALIDATION_FAILEDin the case of a value completion error.
Session
apollo.router.session.count.active- Number of in-flight GraphQL requests
Cache
apollo.router.cache.size— Number of entries in the cacheapollo.router.cache.hit.time- Time to hit the cache in secondsapollo.router.cache.hit.time.count- Number of cache hitsapollo.router.cache.miss.time- Time to miss the cache in secondsapollo.router.cache.miss.time.count- Number of cache missesapollo.router.cache.storage.estimated_size- The estimated storage size of the cache in bytes (query planner in memory only).
All cache metrics listed above have the following attributes:
kind: the cache being queried (apq,query planner,introspection)storage: The backend storage of the cache (memory,redis)
Redis cache
When using Redis as a cache backend, additional Redis-specific metrics are available:
apollo.router.cache.redis.clients- Number of Redis clients activeapollo.router.cache.redis.command_queue_length- Number of Redis commands buffered and not yet sentapollo.router.cache.redis.commands_executed- Total number of Redis commands executedapollo.router.cache.redis.redelivery_count- Number of Redis command redeliveries due to connection issuesapollo.router.cache.redis.errors- Number of Redis errors by error type and cache kindexperimental.apollo.router.cache.redis.latency_avg- Average Redis command latency in secondsexperimental.apollo.router.cache.redis.network_latency_avg- Average Redis network latency in secondsexperimental.apollo.router.cache.redis.request_size_avg- Average Redis request size in bytesexperimental.apollo.router.cache.redis.response_size_avg- Average Redis response size in bytes
All Redis metrics, except for apollo.router.cache.redis.clients, include the following attribute:
kind: the cache being queried (apq,query planner,introspection,entity)
The apollo.router.cache.redis.errors metric also includes an error_type attribute with possible values:
config- Configuration errors (invalid Redis settings)auth- Authentication errors (wrong credentials)routing- Cluster routing errorsio- Network I/O errorsinvalid_command- Invalid Redis commandsinvalid_argument- Invalid command argumentsnot_found- Missing key in Redisurl- Invalid Redis URL formatprotocol- Redis protocol errorstls- TLS/SSL connection errorscanceled- Canceled operationsunknown- Unknown errorstimeout- Operation timeoutscluster- Redis cluster state errorsparse- Data parsing errorssentinel- Redis Sentinel errorsbackpressure- Backpressure/overload errors
Coprocessor
apollo.router.operations.coprocessor- Total operations with coprocessors enabled.coprocessor.succeeded: boolcoprocessor.stage: string (RouterRequest,RouterResponse,SubgraphRequest,SubgraphResponse)
apollo.router.operations.coprocessor.duration- Time spent waiting for the coprocessor to answer, in seconds.coprocessor.stage: string (RouterRequest,RouterResponse,SubgraphRequest,SubgraphResponse)
Performance
apollo_router_schema_load_duration- Time spent loading the schema in seconds.
Query planning
apollo.router.query_planning.warmup.duration- Time spent warming up the query planner queries in seconds.apollo.router.query_planning.plan.duration- Histogram of plan durations isolated to query planning time only.apollo.router.query_planning.total.duration- Histogram of plan durations including queue time.apollo.router.query_planning.plan.evaluated_plans- Histogram of the number of evaluated query plans.
Compute jobs
apollo.router.compute_jobs.queued- A gauge of the number of jobs queued for the thread pool dedicated to CPU-heavy components like GraphQL parsing and validation, and the query planner.apollo.router.compute_jobs.queue_is_full- A counter of requests rejected because the queue was full.apollo.router.compute_jobs.duration- A histogram of time spent in the compute pipeline by the job, including the queue and query planning.job.type: (QueryPlanning,QueryParsing,Introspection)job.outcome: (ExecutedOk,ExecutedError,ChannelError,RejectedQueueFull,Abandoned)
apollo.router.compute_jobs.queue.wait.duration- A histogram of time spent in the compute queue by the job.job.type: (QueryPlanning,QueryParsing,Introspection)
apollo.router.compute_jobs.execution.duration- A histogram of time spent to execute job (excludes time spent in the queue).job.type: (QueryPlanning,QueryParsing,Introspection)
apollo.router.compute_jobs.active_jobs- A gauge of the number of compute jobs being processed in parallel.job.type: (QueryPlanning,QueryParsing,Introspection)
Uplink
apollo.router.uplink.fetch.duration.seconds- Uplink request duration, attributes:url: The Uplink URL that was polledquery: The query that the router sent to Uplink (SupergraphSdlorLicense)kind: (new,unchanged,http_error,uplink_error)code: The error code depending on type (if an error occurred)error: The error message (if an error occurred)
apollo.router.uplink.fetch.count.totalstatus: (success,failure)query: The query that the router sent to Uplink (SupergraphSdlorLicense)
Subscriptions
apollo.router.opened.subscriptions- Number of different opened subscriptions (not the number of clients with an opened subscriptions in case it's deduplicated). This metric containsgraphql.operation.namelabel to know exactly which subscription is still opened.apollo.router.skipped.event.count- Number of subscription events that has been skipped because too many events have been received from the subgraph but not yet sent to the client.
Batching
apollo.router.operations.batching- A counter of the number of query batches received by the router.apollo.router.operations.batching.size- A histogram tracking the number of queries contained within a query batch.
GraphOS Studio
apollo.router.telemetry.studio.reports- The number of reports submitted to GraphOS Studio by the router.report.type: The type of report submitted: "traces" or "metrics"report.protocol: Either "apollo" or "otlp", depending on the otlp_tracing_sampler configuration.
Telemetry
apollo.router.telemetry.batch_processor.errors- The number of errors encountered by exporter batch processors.name: One ofapollo-tracing,datadog-tracing,jaeger-collector,otlp-tracing,zipkin-tracing.error: One ofchannel closed,channel full.
apollo.router.telemetry.metrics.cardinality_overflow- A count of how often a telemetry metric hit otel's hard cardinality limit.
Internals
apollo.router.pipelines- The number of request pipelines active in the routerschema.id- The Apollo Studio schema hash associated with the pipeline.launch.id- The Apollo Studio launch id associated with the pipeline (optional).config.hash- The hash of the configuration
Server
apollo.router.open_connections- The number of open connections to the Router.schema.id- The Apollo Studio schema hash associated with the pipeline.launch.id- The Apollo Studio launch id associated with the pipeline (optional).config.hash- The hash of the configuration.server.address- The address that the router is listening on.server.port- The port that the router is listening on if not a unix socket.http.connection.state- Eitheractiveorterminating.