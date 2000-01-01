Overview

Our GraphQL API is already equipped to serve up some basic soundtrack data. We can run a query for featured playlists, or ask for one playlist in particular. We can see data about the playlist itself, along with the tracks it contains.

Furthermore, for each track, we can also query data for the Artist that created it. But right now, we're facing a big performance issue with how this is implemented.

In this lesson, we will:

Learn about the n+1 problem

Discuss how to resolve it

Playlists, tracks, and artists

To see our performance bottleneck in action, let's run a test query against our GraphQL API.

Make sure the app is running either by running the following command in the root of the project.

./gradlew bootRun Copy

Now, let's navigate to Apollo Sandbox Explorer, and paste in the address of our locally running server in the input at the top of the screen. By default, our server should be running on http://localhost:8080/graphql .

http://localhost:8080/graphql Copy

https://studio.apollographql.com/sandbox/explorer

Refresher: Apollo Sandbox Explorer Apollo Sandbox is an essential tool in the Apollo GraphOS toolkit. With Sandbox, we can load a GraphQL server's schema and explore it using some cool GraphOS features such as the schema reference and the Explorer. The Explorer is a powerful web IDE for creating, running, and managing GraphQL operations. It lets us build operations easily and quickly, look at our operation history, peek at response hints and share operations with others. Best of all, Sandbox is free to use and doesn't require an account!

Let's begin our query by selecting the playlist field from our Query type in the Documentation panel. For the playlist we query, we'll request the basics: just an id and name , along with a list of its tracks .

For each Track object in the playlist, we'll return id , name , and durationMs . Then, we'll ask for its artist field. This field returns an Artist type, from which we'll request id , name , followers , genres , and uri .

Here's what our query should look like.

A query for a playlist, tracks, and artists query GetPlaylist ( $playlistId : ID ! ) { playlist ( id : $playlistId ) { id name tracks { id name durationMs artist { id name followers genres uri } } } } Copy

And in the Variables panel:

{ "playlistId" : "6Fl8d6KF0O4V5kFdbzalfW" } Copy

Let's take this query for a spin and... we get data back! Great. So what's the problem, exactly?

To find out, we'll take a closer look at our terminal where our server is running. Run the query again, and... did you catch that? The terminal filled up with statements logging out:

The output every time we call the REST API I am calling GET /artists/{artist_id} for 3GBPw9NK25X1Wt2OUvOwY3 I am calling GET /artists/{artist_id} for 33QmoCkSqADuQEtMCysYLh I am calling GET /artists/{artist_id} for 6H1RjVyNruCmrBEWRbD0VZ I am calling GET /artists/{artist_id} for 2JY5qzEozvTdogkDTkkOMf I am calling GET /artists/{artist_id} for 3WrFJ7ztbogyGnTHbHJFl2 I am making...

We see one line printed out here for each track's artist ID, and each of these represents a single request across the network to our data source. Many more requests than we probably expected from our lean and precise GraphQL query! Let's dive into what's happening here.

Watch out! Not seeing any logs in your terminal? Jump into your SpotifyClient class' artistRequest method, and make sure that there's a line printing out a message for every artist ID we're fetching data for. We've gone ahead and added one in the starter repo to show how many times this function gets called. If you're coming from Server-side Lab with Java & DGS, it might be missing! datasources/SpotifyClient public MappedArtist artistRequest ( String artistId ) { System . out . println ( "I am making a request to the artists endpoint for " + artistId ) ; return client . get ( ) . uri ( "/artists/{artist_id}" , artistId ) . retrieve ( ) . body ( MappedArtist . class ) ; } Copy Still having trouble? Visit the Odyssey forums to get help.

Different tracks, same artist

We can try this again with another playlist ID—this time, we'll use one that contains tracks by the same artist. Keeping the query in Sandbox the same, update the Variables panel with the following.

{ "playlistId" : "5evmObkq06UCWmtlcxK4Ev" } Copy

And when we run the query... three identical requests are being made for the same artist ID!

The output every time we call the REST API I am calling GET /artists/{artist_id} for 3WrFJ7ztbogyGnTHbHJFl2 I am calling GET /artists/{artist_id} for 3WrFJ7ztbogyGnTHbHJFl2 I am calling GET /artists/{artist_id} for 3WrFJ7ztbogyGnTHbHJFl2

This is worse than making lots of network requests to the same endpoint: here, we're making multiple identical requests for the same information!

For every artist, a new request

Let's back up and review the endpoints at work in our application.

GET /browse/featured-playlists GET /playlists/{playlist_id} GET /playlists/{playlist_id}/tracks GET /artists/{artist_id}

Our datafetchers call these endpoints when certain pieces of our GraphQL API are requested.

Learn more: How our data source works For a refresher on how data fetching works in our app, let's first jump into the datasources/SpotifyClient.java file. The SpotifyClient class contains a number of methods that are responsible for making a call to a particular endpoint in our Spotify REST API. The featuredPlaylistsRequest method in SpotifyClient public PlaylistCollection featuredPlaylistsRequest ( ) { return client . get ( ) . uri ( "/browse/featured-playlists" ) . retrieve ( ) . body ( PlaylistCollection . class ) ; } We instantiate this class inside of the files in our datafetchers directory ( PlaylistDataFetcher , and TrackDataFetcher ). Each of these datafetcher classes contains methods that are responsible for resolving data for the fields in our GraphQL schema. For instance, the datafetcher for the Query type's featuredPlaylists field lives in the PlaylistDataFetcher class. It calls the SpotifyClient method featuredPlaylistsRequest , which connects directly with the GET /browse/featured-playlists endpoint. The featuredPlaylists datafetcher method @DgsQuery public List < MappedPlaylist > featuredPlaylists ( ) { PlaylistCollection response = spotifyClient . featuredPlaylistsRequest ( ) ; return response . getPlaylists ( ) ; } ; To learn more about how we set up these datafetchers, check out the first course in this series, Intro to GraphQL with Java & DGS.

Here's a breakdown of how our query for a single playlist, its tracks, and each track's artist is resolved.

To get our single playlist, our datafetcher first makes a request to the GET /playlists/{playlist_id} endpoint. This returns a big JSON object containing our playlist details, along with data for each of its tracks.

But we need more granular detail for each track's primary artist! This means for each track in the playlist, we make a request to GET /artists/{artist_id} using the track's primary artist ID.

Check out the code for the datafetcher and data source method The datafetcher for the Track.artist field, which retrieves the artistId and calls a method on the SpotifyClient class. datafetchers/TrackDataFetcher @DgsData ( parentType = "Track" , field = "artist" ) public MappedArtist getArtist ( DgsDataFetchingEnvironment dfe ) { MappedTrack track = dfe . getSource ( ) ; String artistId = track . getArtistId ( ) ; return spotifyClient . artistRequest ( artistId ) ; } And the SpotifyClient method, artistRequest , which uses the artistId to call the endpoint of our data source. datasources/SpotifyClient public MappedArtist artistRequest ( String artistId ) { System . out . println ( "I am making a request to the artists endpoint for " + artistId ) ; return client . get ( ) . uri ( "/artists/{artist_id}" , artistId ) . retrieve ( ) . body ( MappedArtist . class ) ; }

This extra request gets us the artist data we need, but at a cost: the Track.artist datafetcher is executed for every track in the query response, as expected, but this means it calls the REST API endpoint for each track's artist ID. So depending on how many tracks there are, we might have a lot of extra requests to the API on our hands!

The n+1 problem

This is the n+1 problem in action. We start with an initial request (the 1 in the n+1 equation), and this first request determines how many follow-up requests will be necessary (the n in the n+1 equation). The number of required follow-up requests, n , is not known until our first request is executed.

We saw this in action: our first request gave us our playlist and its associated tracks, but we then needed a follow-up request per track to get the track's associated artist data.

This doesn't look too bad with just one or two additional requests, but it leads to some troubling situations as our queries scale. Imagine our playlist has fifty tracks; this means we'll send a total of 51 requests! One request to fetch playlist and track data, and 50 additional requests to get the artist information for each track!

Even worse, this can also lead to duplicate requests. A playlist could contain multiple tracks by the same artist, but the Track.artist datafetcher doesn't know the difference; it will still call the data source for every track, resulting in multiple identical requests for the same artist.

Data loaders

To solve the n+1 problem in our application, we'll use data loaders. A data loader's job is to replace multiple similar requests with a single batched request.

We use data loaders inside of our datafetcher methods. When the process of resolving a query requires a datafetcher method to be called multiple times for different parameters, the data loader can batch the parameters together and make a single network request with them.

In our example, this means that when the Track.artist datafetcher is called using every track's artist ID, it won't call our REST API directly anymore; instead, it will pass the parameters to the data loader to collect.

Once the individual artist IDs are gathered in one list, the data loader can assume the responsibility of calling the data source. It's able to dispatch a single request to the REST API endpoint for all of the IDs at once—a huge performance boost over letting the datafetcher make a network request for each!

Best of all, with DGS, our data loaders automatically deduplicate the identifiers we pass them. This means even if our playlist contains multiple tracks by the same artist, we'll only ever request that artist once.

Data loaders work great when a single resource can provide data for multiple identifiers simultaneously. They're also scoped to the life of a single query; this means that if we send two queries back-to-back, (each requesting a different list of playlists, tracks, and artists) the data loader will not try to batch artist IDs from both queries together. Instead, it will handle them separately, resolving each request independent of the other.

Learn more: Datafetchers and data loaders: what's the difference? When our GraphQL server needs to resolve a particular field, it automatically checks all classes marked with the @DgsComponent annotation for the corresponding method. (It finds this by looking for one that shares the same name as the field, or one that has an annotation labeling it as responsible for the field.) These methods are datafetcher methods, and each one maps to a specific field in our GraphQL schema. When the server needs to provide data for a particular field, it's the job of that field's datafetcher to come up with the data. Where the datafetcher actually gets the data from is up to the individual method. Most of our datafetchers reach out to our main data source, the SpotifyClient class, and call one of its methods to make a network request. The same principle applies to using data loaders with datafetchers. Data loaders give us an efficiency boost; rather than overworking the datafetcher with a lot of requests it needs to make to an external API or database, it can delegate responsibility to the data loader to keep track of all the identifiers for which a request needs to be made. DGS data loaders are intelligent enough to keep track of all the identifiers that a datafetcher passes it during the execution of a query. This lets it bundle them all together and submit one request (in our case, to the Spotify REST API) that retrieves all the data, all at once. The datafetcher is still responsible for handing back this data in the query response, but the data loader's work behind the scenes makes the entire flow much smoother, cleaner, and much more performant.

What a data loader needs

There's just one big requirement for data loaders to work as expected: the endpoint that receives the batched request needs to have the ability to provide data for multiple objects. This requires a change in our application; right now, the Track.artist datafetcher sends each artist ID individually to the GET /artists/{artist_id} endpoint, which only returns data for a single provided value.

Fortunately, we have a different endpoint in our REST API that we can use: GET /artists . It accepts multiple artist IDs joined as a single string, and returns data for them all at once.

We've provided a method that utilizes this new endpoint in our data source. Jump into SpotifyClient to take a closer look.

datasources/SpotifyClient public List < MappedArtist > multipleArtistsRequest ( List < String > artistIds ) { System . out . println ( "I am making a call to the artists endpoint with artists " + artistIds ) ; ArtistCollection artistCollection = client . get ( ) . uri ( uriBuilder -> uriBuilder . path ( "/artists" ) . queryParam ( "artists_ids" , String . join ( "," , artistIds ) ) . build ( ) ) . retrieve ( ) . body ( ArtistCollection . class ) ; if ( artistCollection != null ) { return artistCollection . getArtists ( ) ; } return null ; }

This method is set up to accept a List of artist IDs. It makes a request to the GET /artists/{artists_ids} endpoint, then receives the response body as an instance of the ArtistCollection class. If the request is successful, we return the results of calling the ArtistCollection class' getArtists method, which returns our Artist instances. Otherwise, the method returns null.

Now that we have a data source method that accepts multiple artist IDs, we can update our Track.artist datafetcher—and benefit from the power of a data loader!

Practice

Which of the following statements about data loaders is true? A data loader replaces multiple similar requests with a single batched request. A data loader replaces our application's datafetchers. A data loader will wait until ALL queries have run before batching all requests together. Data loaders can solve the n+1 problem by reducing the number of calls our GraphQL API makes for extra data. Submit

Key takeaways

The n+1 problem occurs when we make an initial request, followed by some unknown number of follow-up requests.

follow-up Data loaders let us batch a list of identifiers (such as IDs) in a single request rather than sending an individual request for each.

Before data loaders can work properly, our data source (whether another API, or a database) needs to implement a method that accepts multiple keys (such as IDs), and returns multiple objects.

