Data loaders under the hood

3. Data loaders under the hood

Overview

In this lesson, we will:

Discuss what a data loader is
Review the requirements of a data loader
Walk through how a data loader works under the hood

Data loaders

To solve the n+1 problem in our application, we'll use data loaders.

A data loader's primary job is to replace multiple similar requests with a single batched request. In our example, we saw three near-identical requests that used a particular listing ID to return amenity data. With a data loader, this becomes a single request that fetches data for all three listings at once.

In the DGS framework, we use data loaders inside of our datafetcher methods. This is because in the process of resolving a query, a particular datafetcher method might be invoked several times with different parameters.

Let's illustrate this with the following query.

A query for featured listings and their amenities

query GetFeaturedListingsAmenities {
  featuredListings {
    id
    title
    amenities {
      name
    }
  }
}

For each featured listing object this query resolves, the Listing.amenities datafetcher will be invoked. Currently, this datafetcher uses a listing's ID to resolve each request independently. (That's what results in three separate network requests to the same endpoint!)

The behavior we want is quite different: we want the data loader to collect all the listing IDs involved in the query (also called keys), and execute a single request.

In our example, this means that when the Listing.amenities datafetcher is called using each listing's ID, it won't call our REST API directly anymore; instead, it will pass the parameters to the data loader to collect.

Once the individual listing IDs are gathered in one list, the data loader can assume the responsibility of calling the data source. It's able to dispatch a single request to the REST API endpoint for all of the IDs at once—a huge performance boost over letting the datafetcher make a network request for each!

Best of all, with DGS, our data loaders automatically deduplicate the identifiers we pass them. This means if our query included multiple listings with the same ID, we'll only request the listing's amenities once.

What a data loader needs

Data loaders are exactly what we need to solve the performance issues in our app—but they come with a few requirements for us to consider. Let's walk through each of these one by one.

Data for multiple objects at once

Let's imagine our data loader has collected all of the different keys involved in our query, and it's ready to fire off a request for data. What does it need next?

Well, if we think about the REST API endpoint we've used previously to return amenity data, we'll quickly see the problem: right now, the Listing.amenities datafetcher sends each listing ID individually to the GET /listings/{listing_id}/amenities endpoint, which only returns data for a single listing.

Here's the first big requirement for a data loader to work as expected: we need a data source that can resolve a request for multiple objects simultaneously. In practice, this means that our data loader should be able to send off a list of keys (such as ["listing-1", "listing-2", "listing-3"]) and get back data for all of them.

The good news is that we do have a different endpoint in our REST API that we can use to request amenity data for multiple listings: GET /amenities/listings. It accepts multiple listing IDs joined as a single string, and returns data for them all at once.

We've provided a method that utilizes this new endpoint in our data source. Jump into ListingService to take a closer look.

datasources/ListingService

public List<List<Amenity>> multipleAmenitiesRequest(List<String> listingIds) throws IOException {
    System.out.println("Calling the /amenities/listings endpoint with listings " + listingIds);
    JsonNode amenities = client
            .get()
            .uri(uriBuilder -> uriBuilder
                    .path("/amenities/listings")
                    .queryParam("ids", String.join(",", listingIds))
                    .build())
            .retrieve()
            .body(JsonNode.class);

    if (amenities != null) {
        return mapper.readValue(amenities.traverse(), new TypeReference<List<List<Amenity>>>() {
        });
    }

    return null;
}

Note: If you're coming from Intro to GraphQL with Java & DGS, copy this method into your ListingService class!

This method is set up to accept a List of listing IDs (such as ["listing-1", "listing-2", "listing-3"]).

It structures the list of IDs as a single string, attaches them as a query parameter called ids, and makes a request to the GET /amenities/listings endpoint. Then it receives the response body as an instance of the JsonNode class. If the request is successful, we return the results of traversing through the JsonNode and mapping the results into one big List that contains multiple smaller Lists of Amenity types. (We'll dive deeper into the reasons why shortly!) If the JsonNode is null for any reason, we'll simply return null.

For every key, a value

When a data loader sends off a list of keys in a request, it has a very clear expectation from the data source providing the data: the number of objects returned should never be greater than the number of keys that were sent in the request.

Let's break down this expectation and how the data source satisfies it.

In the process of resolving a query, our datafetcher might pass three keys into the data loader. The data loader groups them together into one list (["listing-1", "listing-2", "listing-3"]), and calls the ListingService method, multipleAmenitiesRequest with them. What does it expect back? Well, it put in a list of three keys; it expects a list of no more than three objects back!

A diagram showing a request with three listing IDs; three lists of amenities are returned

Let's take another look at the return type of the multipleAmenitiesRequest method.

public List<List<Amenity>> multipleAmenitiesRequest(List<String> listingIds) throws IOException {
  // ...method logic
}

It returns data of type List<List<Amenity>>. That's not the prettiest type to look at, but we can break it down: each listing that we request amenity data for will have a list of amenities returned; this is because each listing can have more than one amenity associated with it. If we request amenity data for three listing ids, therefore, our response should consist of three lists of amenities.

A diagram showing three listing IDs; and three lists of amenities that map to them

The data loader groups its keys in a list (List<String>), and it expects the response to come back in a list as well; this is what gives us the outer List that wraps each listing's list of amenities (List<List<Amenity>>)! From there, the data loader handles the logic of mapping each List<Amenity> back to the key that requested it.

When we request featured listings data, each object contains an amenities key, which is an array of amenity IDs.

An example listing object in the featured listings JSON response


{
  "id": "listing-1",
  "title": "Cave campsite in snowy MoundiiX",
  "description": "Enjoy this amazing cave campsite in snow MoundiiX, where you'll be one with the nature and wildlife in this wintery planet. All space survival amenities are available. We have complementary dehydrated wine upon your arrival. Check in between 34:00 and 72:00. The nearest village is 3AU away, so please plan accordingly. Recommended for extreme outdoor adventurers.",
  "costPerNight": 120,
  // ... other properties
  "amenities": [
    {
      "id": "am-2"
    },
    {
      "id": "am-10"
    },
    {
      "id": "am-11"
    },
    {
      "id": "am-12"
    },
    {
      "id": "am-13"
    },
    {
      "id": "am-26"
    },
    {
      "id": "am-27"
    },
    {
      "id": "am-16"
    }
  ]
}

This means that we have two possible ways to request follow-up amenity data for a given listing: we could use the listing ID to send a request to the GET /amenities/listings endpoint, as we're about to do, or we could gather up all of the amenity IDs in a listing's amenities property, and make one big request for amenity data to the GET /amenities endpoint.

Let's use our GetFeaturedListingsAmenities query to walk through what this second option (requesting amenities by amenity IDs) would have looked like.

A query for featured listings and their amenities


query GetFeaturedListingsAmenities {
  featuredListings {
    id
    title
    amenities {
      name
    }
  }
}

For every listing object returned, the Listing.amenities datafetcher would be executed. Here, we could access a particular listing's amenities field, map through all the amenity IDs, and pass the whole list to the data loader to be batched together.

Even with just three listing objects in our featured listings response, this starts to look a bit more complicated: the data loader collects each listing's list of amenity IDs, gathering them up in one larger list.

Here's the big problem with this approach. Each entire list of amenities we pass into the data loader is considered a key.

So if we have two lists contain some of the same amenity IDs, we won't enjoy the benefit of data loader deduplication here; it's looking at each list of amenity IDs as a whole.

Even if the ListingService held the logic to break apart each list of amenity IDs, deduplicate them, and make one big request to the GET /amenities endpoint, we'll see another problem emerge.

Now we get back one big blob of amenity data; it's not immediately clear how to map this response back to the list of keys the data loader dispatched the request with. Simply returning the whole blob for the datafetcher to deal with doesn't work either; the data loader tries to enforce that we get a number of objects less than or equal to the number of keys that we submitted. So three keys in should mean, at most, three objects out.

Consequently, we'd need to introduce another layer of logic in the ListingService to receive the JSON results, map through them, and match up each list of amenity IDs with the corresponding data.

It's because of this complication that we've chosen to use a listing's id field to fetch its amenities: it's a lot cleaner, and it allows the data loader to do most of the work!

Data loader scope

There's one last important point to keep in mind. Data loaders and the set of keys they process at any one time are scoped to the life of a single query.

This means that if we run one query for listing data, then a second query, the keys from both queries will NOT be batched together. Instead, each query will be resolved separately.

A diagram demonstrating how a data loader handles two queries separately

With these conceptual points cleared up, let's turn our attention back to the code. We'll update our Listing.amenities datafetcher—and benefit from the power of a data loader!

Practice

Which of the following statements about data loaders is true?

A data loader will wait until ALL queries have run before batching all requests together.A data loader replaces our application's datafetchers.Data loaders can solve the n+1 problem by reducing the number of calls our GraphQL API makes for extra data.A data loader replaces multiple similar requests with a single batched request.

Key takeaways

Data loaders let us batch a list of identifiers (such as IDs) in a single request rather than sending an individual request for each.
Before data loaders can work properly, our data source (whether another API, or a database) needs to implement a method that accepts multiple keys (such as IDs), and returns multiple objects.
The number of objects a data loader receives from a data source should not exceed the number of keys the data loader collected. (For instance, if a data loader requests data for three listings, it should receive no more than three listing objects back!)

Up next

We've learned about data loaders and the problem that they solve in our application. We also have a new data source method that accepts multiple listing IDs, and resolves multiple amenity objects. Next up, we'll implement the data loader logic that gathers up multiple listing IDs in a single request.

Share your questions and comments about this lesson

This course is currently in

beta

. Your feedback helps us improve! If you're stuck or confused, let us know and we'll help you out. All comments are public and must follow the Apollo Code of Conduct. Note that comments that have been resolved or addressed may be removed.

You'll need a GitHub account to post below. Don't have one? Post in our Odyssey forum instead.