July 14, 2023

Mitigate scraping and bot attacks with GraphOS

Shane Myrick

Shane Myrick

This post is a part of our “How to build connected travel apps with Apollo GraphOS” series. Also in this series:


In this digital age, where data and connectivity are a vital part of a company’s business, APIs have become essential tools for developers to enhance their applications. GraphQL is also becoming the leader for powering the API layer exposed to user-facing apps, providing access to all the data needed to power customer experiences.

However, this accessibility also invites malicious actors like bots to exploit these APIs, including data scraping, spamming, and executing custom queries. To strike a balance between security and utility, it is crucial to implement measures to block abusive bots from consuming your GraphQL API while allowing legitimate ones like search engines to scrape your websites for indexing purposes.

Shielding a GraphQL API from bots

GraphQL is a powerful technology but its specification is intentionally limited to describe just the schema types and operation definitions to define how the API should be run. If you want to add additional security features, you will either need to implement these measures from scratch in your GraphQL server or use Apollo GraphOS, an all-in-one solution to build, monitor, and secure your GraphQL API. GraphOS also provides a number of built-in runtime security features that can help travel companies from malicious bots as travel data is a primary target for scraping.

In order for your front end to use your API from a web browser or mobile app it needs to be exposed on the public internet, but if the API is accessible from the public internet that means anyone can view the requests and responses being sent over the wire. If you have not turned off introspection malicious actors can retrieve the full schema info from your public endpoint and use that knowledge to make additional custom requests to extract custom data sets. You can also add authentication to your Apollo Router so that requests have to be made with a valid user tokens, but malicious bots can also simulate real users or use multiple accounts.

Ultimately, you need to strike the right balance between limiting the execution of random, authenticated requests while still allowing the developers of known client applications to craft their unique operations and iterate the shape of those operations over time to accommodate new features.

Extracting your operations

Most of the time, front-end developers build their components with the queries already defined in the UI code. As a result, anyone navigating through a web app’s source code or network requests using a browser’s developer tools could parse your schema and perform the custom operations by tweaking the request made or implementing a custom call. If we could extract all the operations from our known clients in advance, then we could know ahead of time what operations are the “real” ones versus potentially malicious ones from other sources.

Using @apollo/generate-persisted-queries enables your clients to extract their operations ahead of time. You can then assign a specific id to each operation and have an id to reference at runtime. This also gives you the ability to discuss with your client teams when they want to build new operations or change existing ones when they are building new features, if the change they want to make are appropiate

Blocking Unknown Operations

Once you have your operations extracted, you can then register them in GraphOS with Persisted Queries Safelisting. The allow the Apollo Router to get the list of approved operations and run in a few different modes, like auditing the unknown operations:

# router.yaml
preview_persisted_queries:
  enabled: true
  log_unknown: true

Or blocking any operations at all that are not sent by their registered id:

# router.yaml
preview_persisted_queries:
  enabled: true
  log_unknown: true

Security in obscure environments

Safelisting works well in situations where you can communicate with your clients teams and provide them access to register their operations, but their are other scenarios where that might not be possible. Travel companies often provide a partner API so other businesses can extract the data they need or upload their company data to display on the travel apps. For these third-party uses cases you still want to protect your API, so using Apollo Router features like Operation Limits, Traffic Shaping , and Coprocessor will allow you to limit the request rate per client. You can even introspect the incoming operation and dynamically calculate the operation cost to have more granular control on how much data partners can query for in a given time-period.

Get started with a travel supergraph today

Beyond protecting your API from scraping attacks and securing access to your data, the best way to see the possibilities of a supergraph is to try one out. You can explore a travel supergraph schema and run real queries against it here.

We also have additional posts in this series of travel best practices that dive into different elements of this schema to illustrate how Apollo GraphOS help power essential features of modern travel applications.

If you’d like to talk to an Apollo expert about how a supergraph can power your travel experience, please reach out to us.

Written by

Shane Myrick

Shane Myrick

Read more by Shane Myrick