Docs
Launch GraphOS Studio

Tracing in the Apollo Router

Collect tracing information


The Apollo supports OpenTelemetry, with exporters for:

The Apollo generates spans that include the various phases of serving a request and associated dependencies. This is useful for showing how response time is affected by:

  • Sub-request response times
  • Query shape (sub-request dependencies)
  • Apollo post-processing

Span data is sent to a collector such as Jaeger, which can assemble spans into a gantt chart for analysis.

💡 TIP

To get the most out of distributed tracing, all components in your system should be instrumented.

Common configuration

Service name

router.yaml
telemetry:
tracing:
common:
# (Optional) Set the service name to easily find traces related to the apollo-router in your metrics dashboards
service_name: "router"

Service name discovery is handled in the following order:

  1. OTEL_SERVICE_NAME env
  2. OTEL_RESOURCE_ATTRIBUTES env
  3. router.yaml service_name
  4. router.yaml resource

If none of the above are found then the service name will be set to unknown_service:router or unknown_service if the executable name cannot be determined.

Resource

A Resource is a set of key-value pairs that provide additional global information exported with your traces.

router.yaml
telemetry:
tracing:
common:
resource:
"environment.name": "production"
"environment.namespace": "{env.MY_K8_NAMESPACE_ENV_VARIABLE}"

See OpenTelemetry conventions for resources.

Sampling

To prevent sending to many traces to your APM you may want to enable sampling.

router.yaml
telemetry:
tracing:
common:
sampler: always_on # (default) all requests are sampled (always_on|always_off|<0.0-1.0>)
parent_based_sampler: true # (default) If an incoming span has OpenTelemetry headers then the request will always be sampled.

Setting sampler to 0.1 will result in only 10% of your requests being sampled.

parent_based_sampler enables clients to make the sampling decision. This guarantees that a trace that starts at a client will also have spans at the . You may wish to turn this off if your router is exposed directly to the internet.

Propagation

The propagation section allows you to configure which propagators are active in addition to those automatically activated by using an exporter.

router.yaml
telemetry:
tracing:
propagation:
# https://www.w3.org/TR/baggage/
baggage: false
# https://www.datadoghq.com/
datadog: false
# https://www.jaegertracing.io/ (compliant with opentracing)
jaeger: false
# https://www.w3.org/TR/trace-context/
trace_context: false
# https://zipkin.io/ (compliant with opentracing)
zipkin: false
# https://aws.amazon.com/xray/ (compliant with opentracing)
aws_xray: false
# If you have your own way to generate a trace id and you want to pass it via a custom request header
request:
header_name: my-trace-id

Specifying explicit propagation is generally only required if you're using an exporter that supports multiple trace ID formats (e.g., OpenTelemetry Collector, Jaeger, or OpenTracing compatible exporters).

Span limits

You may set limits on spans to prevent sending too much data to your APM.

router.yaml
telemetry:
tracing:
common:
# Optional limits
max_attributes_per_event: 10
max_attributes_per_link: 10
max_attributes_per_span: 10
max_events_per_span: 10
max_links_per_span: 10

Trace ID

This feature is currently experimental. Your questions and feedback are highly valueddon't hesitate to get in touch with your Apollo contact or on the official Apollo GraphQL Discord. You can also give feedback in the discussion on GitHub.

If you want to expose in response headers the generated trace ID or the one you provided using propagation headers you can use this configuration:

router.yaml
telemetry:
tracing:
experimental_response_trace_id:
enabled: true # default: false
header_name: "my-trace-id" # default: "apollo-trace-id"

Using this configuration you will have a response header called my-trace-id containing the trace ID. It could help you to debug a specific query if you want to grep your log with this trace id to have more context.

Batch Processor

All trace exporters (apollo|datadog|zipkin|jaeger|otlp) have batch span processor configuration, it will be necessary to tune this if you see the following in your logs:

OpenTelemetry trace error occurred: cannot send span to the batch span processor because the channel is full

  • scheduled_delay The delay from receiving the first span to the batch being sent.
  • max_concurrent_exports The maximum number of overlapping export requests. For instance if ingest is taking a long time to respond there may be several overlapping export requests.
  • max_export_batch_size The number of spans to include in a batch. Your ingest may have max message size limits.
  • max_export_timeout The timeout for sending spans before dropping the data.
  • max_queue_size The maximum number of spans to be buffered before dropping span data.
router.yaml
telemetry:
# Apollo tracing
apollo:
batch_processor:
scheduled_delay: 100ms
max_concurrent_exports: 1000
max_export_batch_size: 10000
max_export_timeout: 100s
max_queue_size: 10000
tracing:
# Datadog
datadog:
enabled: true
batch_processor:
scheduled_delay: 100ms
max_concurrent_exports: 1000
max_export_batch_size: 10000
max_export_timeout: 100s
max_queue_size: 10000
endpoint: default
# Jaeger
# Otlp
# Zipkin

You will need to experiment to find the setting that are appropriate for your use case.

Using Datadog (native)

The Apollo can be configured to connect to either the default agent address or a URL.

router.yaml
telemetry:
tracing:
datadog:
enabled: true
# Optional endpoint, either 'default' or a URL (Defaults to http://localhost:8126/v0.4/traces)
endpoint: "http://${env.DATADOG_AGENT_HOST}:8126/v0.4/traces"

Given that there are some incompatibilities between Datadog and OpenTelemetry, the DataDog exporter might not provide meaningful contextual information in the exported spans. To fix this, you can configure the Apollo to perform a mapping for the span name and the span resource name.

router.yaml
telemetry:
tracing:
datadog:
enabled: true
enable_span_mapping: true

when enable_span_mapping is set to true, the Apollo will perform the following mapping:

  1. Use the Open Telemetry span name to set the Data Dog span .
  2. Use the Open Telemetry span attributes to set the DataDog span resource name.

Example:

Lets say we send a query MyQuery to the Apollo , then the Router using the 's query plan will send a query to my-subgraph-name and the following trace will be created:

| apollo_router request |
| apollo_router router |
| apollo_router supergraph |
| apollo_router query_planning | apollo_router execution |
| apollo_router fetch |
| apollo_router subgraph |
| apollo_router subgraph_request |

As you can see, there is no clear information about the name of the query, the name of the , and the name of the query sent to the subgraph.

Instead when enable_span_mapping is set to true the following trace will be created:

| request /graphql |
| router |
| supergraph MyQuery |
| query_planning MyQuery | execution |
| fetch fetch |
| subgraph my-subgraph-name |
| subgraph_request MyQuery__my-subgraph-name__0 |

Using Jaeger (native)

:warning: Jaeger native is deprecated and will be removed in a future release of the Apollo . Instead, OTLP should be used.

The Apollo can be configured to export tracing data to Jaeger either via an agent or http collector.

Unless explicitly configured to use a collector, Jaeger will use an agent by default.

Agent config

router.yaml
telemetry:
tracing:
jaeger:
enabled: true
# Optional agent configuration,
agent:
# Optional endpoint, either 'default' or a socket address (Defaults to 127.0.0.1:6832)
endpoint: "${env.JAEGER_HOST}:6832"

Collector config

router.yaml
telemetry:
tracing:
jaeger:
enabled: true
# Optional collector configuration,
collector:
# Optional endpoint, either 'default' or a URL (Defaults to http://127.0.0.1:14268/api/traces)
endpoint: "http://${env.JAEGER_HOST}:14268/api/traces"
username: "${env.JAEGER_USERNAME}"
password: "${env.JAEGER_PASSWORD}"

OTLP

OTLP (OpenTelemetry protocol) is the native protocol for open telemetry. It can be used to export traces to a variety of backends including:

  • OpenTelemetry Collector
  • Datadog
  • Honeycomb
  • Lightstep
router.yaml
telemetry:
tracing:
otlp:
enabled: true
# Optional endpoint, either 'default' or a URL (Defaults to http://127.0.0.1:4317 for gRPC and http://127.0.0.1:4318 for HTTP)
endpoint: default
# Optional protocol (Defaults to grpc)
protocol: grpc
# Optional Grpc configuration
grpc:
domain_name: "my.domain"
key: ""
ca: ""
cert: ""
metadata:
foo: bar
# Optional Http configuration
http:
headers:
foo: bar

Remember that file. and env. prefixes can be used for expansion in config yaml. e.g. ${file.ca.txt}.

Datadog

To use Datadog, you must configure the Datadog agent to accept OTLP traces. This can be done by adding the following to your datadog.yaml:

datadog.yaml
otlp_config:
receiver:
protocols:
grpc:
endpoint: <dd-agent-ip>:4317

The must also be configured to send traces to the Datadog agent:

router.yaml
telemetry:
tracing:
otlp:
enabled: true
# Optional endpoint, either 'default' or a URL (Defaults to http://127.0.0.1:4317)
endpoint: "${env.DATADOG_AGENT_HOST}:4317"

See Datadog Agent configuration for more details

Jaeger

Since 1.35.0, Jaeger supports native OTLP ingestion.

Ensure that when running Jaeger in docker port 4317 is exposed and that COLLECTOR_OTLP_ENABLED is set to true:

docker run --name jaeger \
-e COLLECTOR_OTLP_ENABLED=true \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
jaegertracing/all-in-one:1.35

Then configure the to send traces via OTLP:

router.yaml
telemetry:
tracing:
otlp:
enabled: true
# Optional endpoint, either 'default' or a URL (Defaults to http://127.0.0.1:4317)
endpoint: "http://${env.JAEGER_HOST}:4317"

Using Zipkin

The Apollo can be configured to export tracing data to either the default collector address or a URL:

router.yaml
telemetry:
tracing:
zipkin:
enabled: true
# Optional endpoint, either 'default' or a URL (Defaults to http://127.0.0.1:9411/api/v2/spans)
endpoint: "http://${env.ZIPKIN_HOST}:9411/api/v2/spans}"
Previous
GraphOS reporting
Next
Configuring metrics collectors
Edit on GitHubEditForumsDiscord