2. The n+1 problem
10m

Overview

Our API is already equipped to serve up some basic soundtrack data. We can run a for featured playlists, or ask for one playlist in particular. We can see data about the playlist itself, along with the tracks it contains.

Furthermore, for each track, we can also data for the Artist that created it. But right now, we're facing a big performance issue with how this is implemented.

In this lesson, we will:

  • Learn about the n+1 problem
  • Discuss how to resolve it

Playlists, tracks, and artists

To see our performance bottleneck in action, let's run a test against our API.

Make sure the app is running either by running the following command in the root of the project.

./gradlew bootRun

Now, let's navigate to Apollo Sandbox Explorer, and paste in the address of our locally running server in the input at the top of the screen. By default, our server should be running on http://localhost:8080/graphql.

http://localhost:8080/graphql
https://studio.apollographql.com/sandbox/explorer

A screenshot of the Apollo Sandbox Explorer, highlighting the connection input with the locally running server's address

Let's begin our by selecting the playlist from our Query type in the Documentation panel. For the playlist we , we'll request the basics: just an id and name, along with a list of its tracks.

For each Track object in the playlist, we'll return id, name, and durationMs. Then, we'll ask for its artist . This field returns an Artist type, from which we'll request id, name, followers, genres, and uri.

Here's what our should look like.

A query for a playlist, tracks, and artists
query GetPlaylist($playlistId: ID!) {
playlist(id: $playlistId) {
id
name
tracks {
id
name
durationMs
artist {
id
name
followers
genres
uri
}
}
}
}

And in the Variables panel:

{
"playlistId": "6Fl8d6KF0O4V5kFdbzalfW"
}

Let's take this for a spin and... we get data back! Great. So what's the problem, exactly?

To find out, we'll take a closer look at our terminal where our server is running. Run the again, and... did you catch that? The terminal filled up with statements logging out:

The output every time we call the REST API
I am calling GET /artists/{artist_id} for 3GBPw9NK25X1Wt2OUvOwY3
I am calling GET /artists/{artist_id} for 33QmoCkSqADuQEtMCysYLh
I am calling GET /artists/{artist_id} for 6H1RjVyNruCmrBEWRbD0VZ
I am calling GET /artists/{artist_id} for 2JY5qzEozvTdogkDTkkOMf
I am calling GET /artists/{artist_id} for 3WrFJ7ztbogyGnTHbHJFl2
I am making...

We see one line printed out here for each track's artist ID, and each of these represents a single request across the network to our . Many more requests than we probably expected from our lean and precise ! Let's dive into what's happening here.

Different tracks, same artist

We can try this again with another playlist ID—this time, we'll use one that contains tracks by the same artist. Keeping the in Sandbox the same, update the Variables panel with the following.

{
"playlistId": "5evmObkq06UCWmtlcxK4Ev"
}

And when we run the ... three identical requests are being made for the same artist ID!

The output every time we call the REST API
I am calling GET /artists/{artist_id} for 3WrFJ7ztbogyGnTHbHJFl2
I am calling GET /artists/{artist_id} for 3WrFJ7ztbogyGnTHbHJFl2
I am calling GET /artists/{artist_id} for 3WrFJ7ztbogyGnTHbHJFl2

This is worse than making lots of network requests to the same endpoint: here, we're making multiple identical requests for the same information!

For every artist, a new request

Let's back up and review the endpoints at work in our application.

GET /browse/featured-playlists
GET /playlists/{playlist_id}
GET /playlists/{playlist_id}/tracks
GET /artists/{artist_id}

Our datafetchers call these endpoints when certain pieces of our API are requested.

Here's a breakdown of how our for a single playlist, its tracks, and each track's artist is resolved.

To get our single playlist, our datafetcher first makes a request to the GET /playlists/{playlist_id} endpoint. This returns a big JSON object containing our playlist details, along with data for each of its tracks.

A diagram showing the data that is returned when we query for a playlist

But we need more granular detail for each track's primary artist! This means for each track in the playlist, we make a request to GET /artists/{artist_id} using the track's primary artist ID.

A diagram showing the followup request needed for each track's artist

This extra request gets us the artist data we need, but at a cost: the Track.artist datafetcher is executed for every track in the response, as expected, but this means it calls the REST API endpoint for each track's artist ID. So depending on how many tracks there are, we might have a lot of extra requests to the API on our hands!

The n+1 problem

This is the n+1 problem in action. We start with an initial request (the 1 in the n+1 equation), and this first request determines how many follow-up requests will be necessary (the n in the n+1 equation). The number of required follow-up requests, n, is not known until our first request is executed.

We saw this in action: our first request gave us our playlist and its associated tracks, but we then needed a follow-up request per track to get the track's associated artist data.

This doesn't look too bad with just one or two additional requests, but it leads to some troubling situations as our queries scale. Imagine our playlist has fifty tracks; this means we'll send a total of 51 requests! One request to fetch playlist and track data, and 50 additional requests to get the artist information for each track!

Even worse, this can also lead to duplicate requests. A playlist could contain multiple tracks by the same artist, but the Track.artist datafetcher doesn't know the difference; it will still call the for every track, resulting in multiple identical requests for the same artist.

A diagram showing identical requests when the artist id appears more than once

Data loaders

To solve the n+1 problem in our application, we'll use data loaders. A data loader's job is to replace multiple similar requests with a single batched request.

A diagram showing artist Ids being batched by a data loader

We use data loaders inside of our datafetcher methods. When the process of resolving a requires a datafetcher method to be called multiple times for different parameters, the data loader can batch the parameters together and make a single network request with them.

In our example, this means that when the Track.artist datafetcher is called using every track's artist ID, it won't call our REST API directly anymore; instead, it will pass the parameters to the data loader to collect.

A diagram showing artist Ids batched by a data loader

Once the individual artist IDs are gathered in one list, the data loader can assume the responsibility of calling the . It's able to dispatch a single request to the REST API endpoint for all of the IDs at once—a huge performance boost over letting the datafetcher make a network request for each!

Best of all, with DGS, our data loaders automatically deduplicate the identifiers we pass them. This means even if our playlist contains multiple tracks by the same artist, we'll only ever request that artist once.

A diagram showing the data loader making a single REST request with all the collected IDs

Data loaders work great when a single resource can provide data for multiple identifiers simultaneously. They're also scoped to the life of a single ; this means that if we send two queries back-to-back, (each requesting a different list of playlists, tracks, and artists) the data loader will not try to batch artist IDs from both queries together. Instead, it will handle them separately, resolving each request independent of the other.

A diagram showing artist IDs requested in two separate queries, involving two separate batched calls by the data loader

What a data loader needs

There's just one big requirement for data loaders to work as expected: the endpoint that receives the batched request needs to have the ability to provide data for multiple objects. This requires a change in our application; right now, the Track.artist datafetcher sends each artist ID individually to the GET /artists/{artist_id} endpoint, which only returns data for a single provided value.

Fortunately, we have a different endpoint in our REST API that we can use: GET /artists. It accepts multiple artist IDs joined as a single string, and returns data for them all at once.

We've provided a method that utilizes this new endpoint in our . Jump into SpotifyClient to take a closer look.

datasources/SpotifyClient
public List<MappedArtist> multipleArtistsRequest(List<String> artistIds) {
System.out.println("I am making a call to the artists endpoint with artists " + artistIds);
ArtistCollection artistCollection = client
.get()
.uri(uriBuilder -> uriBuilder
.path("/artists")
.queryParam("artists_ids", String.join(",", artistIds))
.build())
.retrieve()
.body(ArtistCollection.class);
if (artistCollection != null) {
return artistCollection.getArtists();
}
return null;
}

This method is set up to accept a List of artist IDs. It makes a request to the GET /artists/{artists_ids} endpoint, then receives the response body as an instance of the ArtistCollection class. If the request is successful, we return the results of calling the ArtistCollection class' getArtists method, which returns our Artist instances. Otherwise, the method returns null.

Now that we have a method that accepts multiple artist IDs, we can update our Track.artist datafetcher—and benefit from the power of a data loader!

Practice

Which of the following statements about data loaders is true?

Key takeaways

  • The n+1 problem occurs when we make an initial request, followed by some unknown number of follow-up requests.
  • Data loaders let us batch a list of identifiers (such as IDs) in a single request rather than sending an individual request for each.
  • Before data loaders can work properly, our (whether another API, or a database) needs to implement a method that accepts multiple keys (such as IDs), and returns multiple objects.

Up next

We've learned about data loaders and the problem that they solve in our application. We also have a new method that accepts multiple artist IDs, and resolves multiple artist objects. Next up, we'll implement the data loader logic that gathers up multiple artist IDs in a single request.

Previous

Share your questions and comments about this lesson

This course is currently in

beta
. Your feedback helps us improve! If you're stuck or confused, let us know and we'll help you out. All comments are public and must follow the Apollo Code of Conduct. Note that comments that have been resolved or addressed may be removed.

You'll need a GitHub account to post below. Don't have one? Post in our Odyssey forum instead.