Subgraph traffic control

6. Subgraph traffic control

Overview

We've looked at configuration on the part of limiting inbound traffic to the router; now let's shift gears and set some controls at the subgraph level.

In this lesson, we will:

Apply subgraph-level timeouts and rate limiting

Subgraph performance

When building our API using GraphQL federation, different components can end up with varying performance profiles.

One or two subgraphs could be responsible for top-level operations (defining commonly-used fields on the Query type for instance), and as a result would need high reliability. Even so, downstream subgraphs could receive a lot more traffic due to the query amplification effect (such as requesting the reviews for each product, and the account details of the user who wrote each review).

When a query requires data from multiple subgraphs, a failure in one service does not cause a failure in the entire operation. Instead, GraphQL comes with the benefit of partial responses. This gives us the flexibility to limit traffic at the level of a particular subgraph—we can test out receiving partial responses from more responsive subgraphs, while lowering latency across the board.

Let's return to our router.yaml configuration file. First of all, we'll reset our router with a timeout of 1s, and capacity and interval of 500 and 1s respectively.

router.yaml

traffic_shaping:
  router:
    timeout: 1s
    global_rate_limit:
      capacity: 500
      interval: 1s

Going back to our Grafana dashboard, we'll see a noticeable effect.

First of all, a longer timeout allows more client requests to succeed with a 200 status code.

The HTTP request count panel showing an increase in 200 status code (successful) requests

This means that more requests will pass on successfully to the subgraphs; and we can see this reflected in our Subgraph requests panel, where the number of requests has increased.

Subgraph request count and latency

Consider the metrics for each of our four subgraphs.

What we should see is that both reviews and inventory have reasonable request latency: each should have a p99 below 150ms. (With 99% of requests resolving in 150 milliseconds or less.)

Panels showing low request latency for both reviews and inventory subgraph requests

The accounts subgraph, on the other hand, shows a p99 latency of one second at about 870 requests per second.

Panels showing number of requests to accounts and the latency

And products has a p99 latency of one second at 400 requests per second. (The other percentiles are quite high too! This means that the majority of requests are taking much longer to resolve—with 50% taking close to 500 milliseconds or less.)

Panels showing latency of products subgraph requests

This gives us a good place to start; accounts and products are both good candidates for timeouts and rate limiting.

Subgraph level timeout

To tweak these settings for our subgraphs, we can once again return to our router.yaml configuration file. Under traffic_shaping, we'll add a new property, all, at the same indentation level as the router key.

router.yaml

traffic_shaping:
  router:
    timeout: 1s
    global_rate_limit:
      capacity: 500
      interval: 1s
  all:

The settings that we apply under all will affect all subgraphs. Let's go ahead and set a global timeout for all of our subgraphs by adding a new timeout property, and a value of 500ms.

traffic_shaping:
  router:
    timeout: 1s
    global_rate_limit:
      capacity: 500
      interval: 1s
  all:
    timeout: 500ms

As of now, all of our subgraphs will officially have a timeout of 500 milliseconds. But we can get more granular and tweak them one-by-one as needed—by adding a subgraphs property next. We'll add this under the all section, but at the same level of indentation.

traffic_shaping:
  router:
    timeout: 1s
    global_rate_limit:
      capacity: 500
      interval: 1s
  all:
    timeout: 500ms
  subgraphs:

Under subgraphs we can identify specific subgraphs and give them more precise timeouts as needed. Let's add an accounts key, and give this service a timeout of 100ms.

traffic_shaping:
  router:
    timeout: 1s
    global_rate_limit:
      capacity: 500
      interval: 1s
  all:
    timeout: 500ms
  subgraphs:
    accounts:
      timeout: 100ms

We now observe an effect on client requests. Overall latency is dropping, but still quite high.

Effect of subgraph timeout on client requests

But we get a curious result on subgraph requests:

Effect of subgraph timeout on subgraph requests

While accounts and products subgraphs are rejecting more requests (504 status code), latency is actually increasing for the reviews and inventory subgraphs.

Increasing latency for reviews and inventory

Take some time to investigate why this is happening. Test various values for timeout, see what effect they can have. Check the latency of various queries. Are you noticing anything?

Most of the queries we use here are requesting data from multiple subgraphs in a row, so changing the performance profile of one subgraph will affect others. Let's take the following query:

query GetRecommendedProducts {
  recommendedProducts {
    name
    price
    shippingEstimate
    upc
    inStock
    reviews {
      id
      body
      author {
        id
        name
        username
      }
    }
  }
}

Broken down by subgraph responsibility, we can see the query consists of fields spread across our API.

A diagram of the query broken down by subgraph responsibility per field

This query will go first through the products subgraph, then in parallel through inventory (for the inStock field) and reviews, then the accounts subgraph for the author details.

A diagram showing how the query will move through the subgraphs, collecting data for each field

The benchmarking client is executing that query (along with others) in a loop, so when it finishes, it sends it again. If the accounts subgraph fails due to the timeout, the client will retry the query sooner. This means it begins the trip back through products again, then immediately to reviews and inventory, slightly increasing their traffic. If those requests are expensive for those subgraphs, that load can be enough to increase their latency.

This is an effect that can be observed in many GraphQL Federation deployments: subgraphs are not isolated from each other, and can affect each other's performance.

And it's possible to see this effect even without timeouts. Let's say a subgraph is often called in the middle of complex query, but is slow to respond, to the point where clients tend to abandon the query. Improving that subgraph's performance might mean that client gets the response faster, but then other subgraphs called down the line would see an increase in traffic that they are not ready for.

Here we suffer from a similar issue to the client-side traffic shaping: when applying timeouts in isolation, all of the queries are still reaching the subgraphs. We may need to use rate limiting along with timeouts to better control the traffic.

Subgraph level rate limiting

We will use the same traffic shaping options, but this time we will set a rate limit of 500 requests per second for accounts:

traffic_shaping:
  router:
    timeout: 1s
    global_rate_limit:
      capacity: 500
      interval: 1s
  all:
    timeout: 500ms
  subgraphs:
    accounts:
      timeout: 100ms
      global_rate_limit:
        capacity: 500
        interval: 1s

Latency in reviews and inventory immediately drops, so this helped!

Effect of subgraph rate limiting on subgraph requests

And p99 latency for client requests, which was quite volatile between 2s and 4s, is now stabilizing near 1s.

Effect of subgraph rate limiting on client requests

We are finally starting to get a handle on this infrastructure. Limiting traffic on the right subgraph improves overall performance for the entire deployment, at the cost of missing data in some client responses.

Practice

Subgraph traffic control

We can set a global timeout for all of our subgraphs using the

configuration property. More precise timeouts can be applied by specifying each subgraph individually beneath the

property. In a federated architecture, subgraphs are

from one another. This means that a change in traffic shaping for one

impact other services.

Drag items from this box to the blanks above

accounts
isolated
subgraphs
global
all
cannot
not isolated
can

Key takeaways

We can use the router's traffic shaping configuration to limit traffic at a subgraph level.
Any settings we apply at the all level will apply to all subgraphs the router communicates with.
We can use the subgraphs key to specify particular timeout and rate limit values for individual subgraph services.

Up next

Client requests are now getting a stable latency, but at one second, it's still not great. Coming up next, we'll get a huge performance boost with query deduplication.

Share your questions and comments about this lesson

This course is currently in

beta

. Your feedback helps us improve! If you're stuck or confused, let us know and we'll help you out. All comments are public and must follow the Apollo Code of Conduct. Note that comments that have been resolved or addressed may be removed.

You'll need a GitHub account to post below. Don't have one? Post in our Odyssey forum instead.