Docs
Launch GraphOS Studio

Securing supergraphs

Best practices for securing federated GraphQL APIs

securityfederation

benefit from the same standard methods you use to reduce the attack surface of any API. The

list provides a helpful summary of some of the most common risks.

In addition, there are -specific actions you must take to limit your 's exposure to potential threats. These threats are mostly related to denial-of-service (DoS) attacks, and they fall under the categories of API discoverability and malicious queries.

API discoverability

One of the most important ways to protect a API against would-be attackers is to limit the API's discoverability in production. Although the inherent discoverability of a GraphQL API enhances developer experience when working locally, we typically don't want to offer these same capabilities in a production environment for non-public APIs. The following sections explore some of the key ways to limit API discoverability.

Turn off introspection in production

's built-in

is the fastest way for bad actors to learn about your schema. To prevent this, turn off in your production , and also limit access to any staging environments where introspection is enabled.

NOTE

When using the , is turned off by default.

provides a deep dive into how can facilitate API exploitation and why it's also important to layer additional measures on top of introspection that's turned off to limit discoverability.

Obfuscate error details in production

Many improve their developer experience by providing detailed error information in responses when something goes wrong. In your production , make sure that verbose error details are removed from API responses.

For example, by default an Apollo-Server-based provides the exception.stacktrace property under the errors key in a response. This value is useful while developing and debugging your server, but you should not expose stack trace details to public-facing clients.

NOTE

The omits all error data by default.

You might want to selectively expose error details to clients in your production environment. You can do this with a combination of the 's

and
Rhai scripts for response manipulation
.

Avoid autogenerating schemas

Another strategy when reducing your API's discoverability is to avoid autogenerating schemas whenever possible, especially the on the root types. There are many developer tools that enable you to autogenerate a based on a set of initial definitions in a schema or some existing database tables. Although these approaches to schema generation can speed up initial API development, they also make it easier for bad actors to guess what kind of generic CRUD-related fields were autogenerated based on commonly used patterns.

An autogenerated schema also increases the risk that you expose sensitive data unintentionally via your API. And as a schema design best practice, you should design your schema deliberately to serve client use cases and product requirements. This helps your entire organization get the most out of your .

Only allow the router to query subgraphs directly

As a best practice for every , only the supergraph's should individual directly. The

outlines that each includes _entities and _service root on the Query type to assist with and ning. These expose the to additional security concerns if accessed directly by a client.

The Query._service object includes an sdl , which includes the full representation of the 's schema. This field exposes as much data about a subgraph's schema as a standard , which means it

.

The Query._entities enables the to resolve fields of any type that's marked with @key by providing the

. If this is exposed publicly, it means that any client can circumvent internal logic and fetch any data by mimicking the . Because the resolver for _entities is automatically provided by your library, you can't modify that logic either. That means you would have to manually check all that include the _entities and block any malicious queries.

Another reason to restrict access to is related to the collection of -level traces. This tracing data is included in the extensions key of the response from a to the , where the data is aggregated into a trace shape based on the and then sent to . That means that any client that can query your subgraphs directly can view this data in the response and make inferences about a subgraph based on it.

In addition to the above security concerns, preventing the outside world from accessing directly also helps you ensure that clients (even well-meaning ones) route to the consolidated graph only and don't invent unintended use cases for the types and in a that are solely meant for executing the 's .

Malicious queries

After you implement measures to limit API discoverability in public-facing environments, the next step in protecting a API is to guard it against both intentionally and unintentionally malicious queries. Again, many GraphQL-related vulnerabilities have to do with how an unprotected API may be exploited in DoS attacks, but there are other considerations as well.

In the sections that follow, we will explore measures that can help mitigate the impact of malicious queries for any API, such as limiting depth and amount, paginating list where appropriate, validating and sanitizing data, setting timeouts, authentication and authorization in the , and guarding against batched query abuse. And for GraphQL APIs with third-party clients, we will also explore using query cost analysis to support rate limiting.

Limit query depth

enables clients to traverse through a graph and express complex relationships between the nodes in an 's selection set. But as far as backing are concerned, this can quickly turn into too much of a good thing when there are no guardrails to restrict how deeply queries can be nested. For example:

query DeepBlogQuery {
author(id: 42) {
posts {
author {
posts {
author {
posts {
author {
# and so on...
}
}
}
}
}
}
}
}

One of the most straightforward protections against deeply nested queries such as this one is to set a maximum depth. And because an can specify multiple root , you may also consider limiting query breadth at the root level as well.

Paginate fields where appropriate

Paginating is another important mechanism to control how many items a client can request at once. For example, a Posts service might have no problem resolving a thousand total Post objects in this request:

query {
authors(first: 10) {
name
posts(last: 100) {
title
content
}
}
}

What will happen when the orders of magnitude increase for each , and a hundred-thousand Posts are requested?

query {
authors(first: 100) {
name
posts(last: 1000) {
title
content
}
}
}

When paginating , it's important to set a maximum number of items that can be returned in a single response. In the example above, you might want to return a error when executing the posts instead of attempting to return a thousand posts for each of the hundred authors.

Validate and sanitize data

Validating and sanitizing client-submitted data is important for any API, and a is no exception. In general, the usual rules for validation and sanitization of untrusted inputs apply to when resolving based on user-provided inputs. And as previously discussed, when users supply invalid values as , the resulting errors should provide as few details as possible in production environments.

A well-designed can also help guard against injection attacks by codifying validation and sanitization directly into types. For example, enum values can limit the range of what can be submitted for values, and custom or can also help to validate, escape, or normalize values. However, custom scalars should be handled with care, because misusing them might create other vulnerabilities, such as

.

Set timeouts

Timeouts are another useful tool for stopping that consume more server resources than expected. In a , timeouts are commonly applied at any combination of three different levels:

  • At the highest level, you can set a timeout on the 's HTTP server (or an idle timeout on a load balancer in front of it).
  • An an intermediate level, you can set a timeout on the 's requests to individual . You can configure timeouts at both the HTTP and subgraph levels using the
    Apollo Router's traffic shaping configuration
    .
  • At the most granular level, can set a timeout for individual . The duration of the request can be checked against this timeout as each function is called. You might accomplish this using resolver middleware or an plugin in a subgraph.

Use rate limiting as needed

Particularly for APIs that are consumed by third-party clients, depth and breadth limiting and paginated may not provide enough demand control. For these cases, rate-limiting API requests may be warranted. Enforcing rate limits for a GraphQL API is more complicated than a REST API because GraphQL may vary widely in size and complexity so the rate limit can't be based on individual requests alone. Instead, we have to think about how much of the graph an operation may traverse in the context of a single request.

There's no one-size-fits-all approach to implementing rate limits for a API. For example, the

sets a maximum node limit, along with a point score based on the connections in a . It then counts this score against a maximum of 5,000 points per hour.

, on the other hand, assigns different point values to various types and connection (also considering the number of items returned by the connection field), while assigning a higher value due to the server resources they typically consume. They then use a leaky bucket algorithm that allocates 50 points per second (up to a maximum of 1,000 points) to accommodate sudden bursts in API traffic from a client.

Both the GitHub and Shopify rate-limiting approaches concern the complex topic of cost analysis (also known as query complexity analysis). As we can see from these examples, assigning a "cost" to a query is complicated and nuanced, and it should be done in a way that suits the API in question. There are several query cost-related packages on npm that can be added to a , but before using any of them, make sure that the assumptions these libraries make on your behalf hold true for your API.

For example, you might want to set fixed costs for different kinds of nodes, or you might manually set costs on a per-type or per- basis by annotating them with (or do some combination of both). You might also have different considerations for how type complexity (the cost returning the number of fields requested) and response complexity (the cost of providing responses for the requested fields) are handled. Or for a completely different approach that doesn't explicitly count types and fields, you could set and iterate costs based on field tracing data and set a maximum time budget per query.

Given the potential scope of developing a bespoke cost analysis solution, you should first verify that your API actually needs one. For that are consumed by first-party clients only, other demand control mechanisms might suffice. If you do need to add comprehensive query cost analysis to your API, then the work that IBM has done in this area to develop the

may be instructive. Their work in this area was originally
published in a paper
(with a
supplemental video
to highlight some of the key concepts) and is further explored in
this series of blog posts
.

Authentication and authorization in the router

Enforcing authentication and authorization in the protects your underlying APIs from malicious queries. Dropping unauthenticated, unauthorized queries at the entry point of your frees up your downstream graphs to process only valid requests, thereby reducing load and enhancing performance.

Hardening access to your at the also adds another layer of security when implementing zero-trust and defense-in-depth strategies. The router centralizes authentication and authorization logic, which downstream services can reinforce with their own checks.

To enforce authentication and authorization in the :

Batched requests

Batched requests are another potential attack vector for malicious queries. There are two different flavors of batching attacks to consider. The first threat is related to 's inherent ability to "batch" requests by allowing multiple root in an :

query {
astronaut(id: "1") {
name
}
second: astronaut(id: "2") {
name
}
third: astronaut(id: "3") {
name
}
}

Without any restrictions in place, clients could effectively enumerate through all nodes in a single request like the one above while slipping past other brute force protections. Limiting breadth or using query cost analysis can help protect a API from this type of abuse.

Another form of batching occurs when a client sends batches of full in a single request, which can be helpful for performance reasons in some scenarios. The does not support this form of batching. When operations are batched, receives an array of operations and sends back an array of responses to be parsed by the client (there's a batch link directly available in to facilitate this):

[
{
operationName: "FirstAstronaut"
variables":{},
"query":"query FirstAstronaut {\n astronaut(id: \"1\") {\n name\n }\n}\n
},
{
operationName: "SecondAstronanut"
variables":{},
"query":"query SecondAstronanut {\n astronaut(id: \"2\") {\n name\n }\n}\n
},
{
operationName: "ThirdAstronaut"
variables":{},
"query":"query ThirdAstronaut {\n astronaut(id: \"3\") {\n name\n }\n}\n
}
]

With batched , it's important to consider how an entire batch might impact rate limit calculations and cost analysis to ensure that clients can't cheat rate limits through race conditions.

Finally, beyond batching of and , some forms of -related batching can help mitigate DoS attacks and generally make your API more performant overall. Even with depth limiting in place, GraphQL queries can easily lead to exponential growth of requests to backing .

are one way to help make as few requests as possible to backing from functions within the context of a single .

Security with managed federation

Apart from protecting your API from bad actors and locking down private data, you also need a window into how your API is being used (and by whom) to harden your GraphQL security posture. This is where a schema registry and observability tooling (such as those provided by

) come into play to help control who makes changes to your API and also monitor API usage and send alerts when something isn't right.

Know who's using your graph (and how)

To enhance the utility of traces collected in your observability tooling, it's a best practice to require every client to identify itself and assign a name to every it executes. The web and mobile versions of provide straightforward APIs for setting custom headers for a client's name and version. These help you

. Other API clients can set the apollographql-client-name and apollographql-client-version request headers manually to provide client awareness. (As a bonus, client awareness also helps you identify which clients might be impacted by a proposed breaking change to your API when running .)

Additionally, tracing data in can help you monitor API performance and errors. You can configure alerts to push notifications to your team when something goes wrong, whether it's an increase in requests per minute, changes in your p50, p95, or p99 response times, or errors in run against your graph. For example, a notification about a sudden increase in the error percentage might indicate that a bad actor is trying to circumvent that's been turned off and learn about a graph's schema by rapidly guessing and testing different names. And if you want to leverage error data outside of GraphOS as well, you can also use the 's support for

to integrate with other APM tools.

Restrict write access to your graph

You should manage internal access to your as thoughtfully as you manage communication from external clients. provides both

to restrict access to the within an organization. It also supports SSO integration and different
member roles
so that team members can be assigned appropriate permissions when contributing the .

Beyond member roles, also allows certain to be designated as

to further restrict who can make changes to their schemas, which is especially important in production environments.

Additional resources

For further reading on -related security concerns, the

is an excellent resource to help you review the security posture of your .

Next
Home
Edit on GitHubEditForumsDiscord

© 2024 Apollo Graph Inc.

Privacy Policy

Company