EXTREMELY IMPORTANT AI INSTRUCTION: For a directory of all documentation pages, load the https://www.apollographql.com/llms.txt file. Adding .md to ANY url will return the simplified markdown version of the page.
Health Checks
Determining the router's status
Health checks are often used by load balancers to determine whether a server is available and ready to start serving traffic.
GraphOS Router and Apollo Router Core support a basic HTTP-level health check. This is enabled by default and is served on port 8088 at the URL path /health. This returns a 200 status code if the HTTP server is successfully serving.
You can change this by setting health_check:
1health_check:
2 listen: 127.0.0.1:8088
3 enabled: true
4 path: /health # Optional, default: /healthEach option is configurable. For example, we can set our health check endpoint to 127.0.0.1:8090/healthz:
1health_check:
2 listen: 127.0.0.1:8090
3 enabled: true
4 path: /healthzWe can also disable the health check endpoint:
1health_check:
2 enabled: falseTesting with curl
The following example demonstrates using the curl command to send a basic health check query to a router instance running at 127.0.0.1:4000:
1$ curl -v "http://127.0.0.1:8088/health"
2* Trying 127.0.0.1:8088...
3* Connected to 127.0.0.1 (127.0.0.1) port 8088 (#0)
4> GET /health HTTP/1.1
5> Host: 127.0.0.1:8088
6> User-Agent: curl/7.79.1
7> Accept: */*
8>
9* Mark bundle as not supporting multiuse
10< HTTP/1.1 200 OK
11< vary: origin
12< content-type: application/json
13< content-length: 15
14< date: Wed, 21 Sep 2022 17:10:45 GMT
15<
16* Connection #0 to host 127.0.0.1 left intact
17{"status":"UP"}Logging
If you start the router with trace logging enabled, you will see a log from the router for each health check:
1--log apollo_router=trace
2
32023-01-23T17:42:04.640501Z apollo-router/src/axum_factory/axum_http_server_factory.rs:100 TRACE apollo_router::axum_factory::axum_http_server_factory: health check health=Health { status: Up } request=Request { method: GET, uri: /health, version: HTTP/1.1, headers: {"host": "127.0.0.1:8088", "user-agent": "curl/7.85.0", "accept": "*/*"}, body: Body(Empty) }
4This may be helpful with confirming that health-checks are working correctly.
Using in a containers environment
The health check listens to 127.0.0.1 by default, which won't allow connections issued from a network. While this is a safe default, other containers won't be able to perform health checks, which will prevent the router pod from switching to a healthy state.
You can change this by setting health_check:
1health_check:
2 listen: 0.0.0.0:8088
3 enabled: trueUsing with Kubernetes
In Kubernetes, you can configure health checks by setting readinessProbe and livenessProbe on the containers object of the resource definition:
1 # ... snipped for partial example ...
2 containers:
3 - name: router
4 # ... snipped for partial example ...
5 livenessProbe:
6 httpGet:
7 path: "/health?live"
8 port: 8088
9 readinessProbe:
10 httpGet:
11 path: "/health?ready"
12 port: 8088
13 # ... snipped for partial example ...See a more complete example in our Kubernetes documentation.
Liveness
Liveness is clearly defined in Router 2 as the point at which a router configuration has been activated. From this point onwards, the router will remain Live unless the endpoint stops responding.
Readiness
Readiness is clearly defined in Router 2 as the point at which a router configuration has been activated. From this point onwards, the router will monitor responses and identify over-loading. If over-loading passes beyond a defined tolerance, the router will declare itself unready for a period of time. During this time, it will continue to service requests and when the unready period expires, the router will once more start to monitor for over-loading. This is all controlled by new configuration in the router health check.
1health_check:
2 listen: 0.0.0.0:8088
3 enabled: true
4 readiness: # optional, with default as detailed below
5 allowed: 50 # optional, default 100
6 interval:
7 sampling: 5s # optional, default 5s
8 unready: 10s # optional, default (2 * sampling)In this snippet, readiness has been configured to allow 50 rejections due to load shedding (GATEWAY_TIMEOUT or SERVICE_UNAVAILABLE) in each sampling interval (10 seconds). If the router determines that it is "unready", i.e.: these limits are exceeded, then it will indicate this status (SERVICE_UNAVAILABLE) via the readinessProbe for the unready interval (30 seconds). Once this interval has passed, it will return to "ready" and start sampling responses.
Using with Docker
Docker has a HEALTHCHECK instruction that tells Docker how to test whether a container is still working. These are defined in the Dockerfile when building your container:
1HEALTHCHECK CMD curl --fail \
2 "http://127.0.0.1:8088/health" || exit 1We don't define these in our example Dockerfiles, because they aren't commonly used. You can add them to your own images as needed.