March 9, 2016

Understanding pagination: REST, GraphQL, and Relay

Sashko Stubailo

Sashko Stubailo

One topic that comes up all the time when talking about data loading in modern applications is pagination — splitting long lists of data into chunks. It’s often glossed over in introductory tutorials, and a lot of apps can even get away with avoiding some of the dusty corners of the problem. But when you get into the weeds, you start to wonder — has anyone actually figured this out?

In this post, I’ll cover some different approaches to pagination in REST and GraphQL:

  1. Pagination: what is it for?
  2. What are different types of pagination, and when are they useful?
  3. What is it like to implement these different types?
  4. How does all of this lead to Relay’s pagination spec for GraphQL?
  5. Is there a single best solution? (Spoiler alert: no, as always)

There’s a lot of stuff I won’t get to in this post, so I’d love to see any responses about what pagination issues you have run into in your apps, and how you resolved them!

If you aren’t familiar with GraphQL, you might want to check out this quick introduction: The Basics of GraphQL in 5 Links.


Pagination: What is it for?

Before getting into the technical bits, it’s always good to stop and think — what are we even trying to achieve? In which situations is it useful to display and load data in chunks, or “pages”?

  1. We have too many items to display, and it would be a mental overload for the user to see them all at once — perhaps we only want this display to take up part of the page, or we have a footer that we want them to get to. This is a UX concern.
  2. We have too many items to load, and it would overload our backend, the connection, or the client to load all of the items at once. This is a performance concern.

Depending on the situation, we might want to address one or both of the concerns. If (1) is the only issue, then client-side pagination using JavaScript filters can be good enough, so in this article we’ll mostly focus on (2), and how to efficiently load data while having a great end-user experience.

Types of pagination UX

From the user’s point of view, pagination shows up in three main forms:

Numbered pages, like in a book, or in Google search results— you can say things like “we’re on the third page of Google” and expect it to be consistent over some period of time. This is a throwback to the days of print where once something was on a certain page, it never moved.

Sequential pages, which aren’t numbered; in this case either the pages are specific to a user or the content changes so rapidly that it isn’t meaningful to have page numbers at all, for example on Reddit:

Infinite scroll, which tries to create the illusion of one very long page, because you’re consuming a “feed” of items. It isn’t necessarily important what page you are on, you just scroll to get more content. The easiest examples are the Twitter or Facebook news feeds, where there is no concept of next page or previous page:

We still call this pagination (even though there aren’t pages in the UI) from a technical point of view, because the implementation under the hood is likely to be similar. In fact, there are browser extensions like Reddit Enhancement Suite that convert page-by-page UX into an infinite scroll.

It seems like most modern apps today use either the second or third approach — in the world where your app’s content is constantly changing, it doesn’t make sense to create the illusion of numbered pages. Almost everything these days is some sort of customized feed, often with a different order or composition for every user. But, ironically, numbered pages are the most straightforward to implement with today’s databases and technologies, so let’s talk about that first.


Implementation: Numbered pages

Even though numbered pages are often less useful for modern apps than the alternative pagination models, everyone who has used SQL before knows how to implement them naively:

// We want page 3, with a page size of 10, so we should
// load 10 items, starting after item 20
SELECT * FROM posts ORDER BY created_at LIMIT 10 OFFSET 20;

You might also want to know the total number of entries or pages in the results. This can be useful displaying the total number of pages or results in your UI. To get this information, you can run another query:

SELECT COUNT(*) FROM posts;

Boom! Paginating with page numbers is pretty simple, and shows why a lot of apps use page numbers even when it semantically doesn’t make much sense — it’s easy to add a skip and limit to a query. This approach also maps easily on to REST and GraphQL, let’s take a quick look:

In REST, you simply hit an endpoint with a page query parameter (this is what Discourse, the popular forum software, does):

In GraphQL, you can simply have a query that accepts the same parameter:

{
  latest(page: 2) {
    title,
    category {
      name
    }
  }
}

So what’s the catch? This strategy is so simple to implement and it prevents us from loading too much data, so why don’t we just stop here? Well, it’s up to you to decide based on your app’s requirements.

Drawbacks of page numbering

For mostly static content where items don’t move between pages frequently, page numbers are great. But in today’s web, that’s just not the case. Specifically, items are sometimes added and removed while the user is navigating to different pages. There are a couple things you want to avoid:

  1. Skipping an item. If you are running a product website, maybe that’s the item the user would have bought, and your pagination scheme never even displayed it to them! Or maybe this is a chat window, and when you were loading messages you skipped one by accident. That could really ruin the flow of the conversation.
  2. Displaying the same item twice. This isn’t as bad as skipping something, but it makes your website seem janky if while navigating sequential pages or scrolling through an infinite scrolling newsfeed I see the same item twice in a row. This can happen if a new item was added at the top of the list, causing the skip and limit approach to show the item at the boundary between pages twice.

(The diagrams in this SitePoint article about real-time pagination really make the issue clear: Paginating Real-Time Data with Cursor Based Pagination)

In short, rather than the mental model of pages in a book, which implies a static data set, we need a new one: that the user has a stable window into an ever-updating stream of data:

This means that for certain kinds of apps, the concept of a “Page 1” or “Page 2” doesn’t make sense because the set of data and the boundaries between loaded sections is constantly changing.


Cursor-based pagination

If you look at the diagram above, a solution presents itself — what if we could just specify the place in the list we want to begin, and then how many items we want to fetch? Then it doesn’t matter how many items were added to the top of the list in the meanwhile, since we have a constant pointer to the specific spot where we left off. This pointer is called a cursor. The cursor is a piece of data, generally some kind of ID, that represents a location in a paginated list. So to fetch more data, we need two parameters:

  1. The cursor to start with, and
  2. The number of items (could be fixed)

Let’s look at the URL of our browser as we click through pages of Reddit:

In this case, the page size is always fixed to 25, so that isn’t part of the URL. The count field is actually just cosmetic and tells the UI where to count from for the post numbers on the left. Try picking a page and just removing the count parameter from the URL, it works fine! So really we just need the after field, which is the cursor that identifies the last post we saw.

So what do we need to do to implement this style of pagination?

Implementation: Cursor-style pagination

Let’s say we have an after cursor which is an encoded timestamp of the last item we saw in our list, and we want to fetch 25 items after that.

Now we can write some simple code that will get us the next page of results:

SELECT * FROM posts
WHERE created_at < $after
ORDER BY created_at LIMIT $page_size;

In this case, the “after” cursor value is a timestamp, but it can be anything at all, as long as you can deserialize it into a starting point to fetch the next set of items. One particular benefit of having an encoded cursor with some metadata or a timestamp, rather than something like a row ID, is that it can be resilient to row deletion — we don’t want the query to fail if a specific item is removed. Timestamps and well-designed opaque cursors don’t have this issue.

Now let’s think about how we would implement and use this in our different data loading systems:

In REST, we would just pass these as query parameters:

GET example.com/posts?after=153135&count=25

Now, we expect the API to give us some information about how to fetch additional items from this endpoint, specifically the cursor:

{
  cursors: {
    after: 23492834
  },
  posts: [ ... ]
}

So when we go to load the next page, we know to send the new after cursor in our request to resume where we left off. If we are doing page-by-page navigation, we would throw away the items from the previous page; if we are doing infinite scroll then we simply append the new items to our list.

In GraphQL, we can take a similar approach, but we need to specify all of the fields we want to see, so our query becomes a bit more complicated:

{
  latest(after: 153135, count: 25) {
    cursors {
      after
    },
    posts {
      title,
      category {
        name
      }
    }
  }
}

But our result looks almost exactly the same as the REST response.


Relay cursor connections

Now we finally have enough background to introduce Relay Cursor Connections, a generic specification for how a GraphQL server should expose paginated data so that Relay can understand it. This spec is very useful because it is the natural conclusion of generalizing the concepts we were talking about above. Here’s the example query they give in the spec:

{
  user {
    id
    name
    friends(first: 10, after: "opaqueCursor") {
      edges {
        cursor
        node {
          id
          name
        }
      }
      pageInfo {
        hasNextPage
      }
    }
  }
}

You can see this isn’t too different from what we came up with ourselves working from first principles. There is one notable addition:

In Relay, every item in the paginated list has its own cursor.

Instead of having the friends field return an array of friends directly, it instead returns a list of edges. Each edge has a reference to the user object of the friend, and a cursor. So instead of having one cursor per page, we get one for every object. This means that, if we want to, we can ask for 10 friends starting from the middle of the list we last fetched.

When I first saw the cursor specification, it seemed a bit complex. It is so generalized it can be hard to connect it to concrete concepts, especially because of generic words like edge and node. So to summarize:

  1. connection is a paginated field on an object — for example, the friends field on a user or the comments field on a blog post.
  2. An edge has metadata about one object in the paginated list, and includes a cursor to allow pagination starting from that object.
  3. node represents the actual object you were looking for.
  4. pageInfo lets the client know if there are more pages of data to fetch. In the Relay specification, it doesn’t tell you the total number of items, because the client cache doesn’t need that info. It would be up to the developer to expose that information through another field.

When the server provides all of this information, the Relay client can efficiently fetch new items in the paginated field as the client asks for them, without being constrained by old-school concepts like page numbers.

It’s also notable that the Relay cursor specification isn’t necessarily tied to GraphQL, and could be adopted in any API (for example REST) simply by returning the appropriate structure of edges, cursors, and pageInfo objects.

So what’s the best approach?

As you can see from Discourse, which we investigated in the REST example for numbered pages, you don’t necessarily need cursor-based pagination to build a great application. But we have also seen that many major social networks like Reddit, Twitter, and Facebook rely on these concepts to provide a great user experience. Relay takes it to its logical conclusion by setting up a predictable contract between the client and server about how paginated data should work.

If I were building an app today, I’d probably stick with page numbers for simplicity until I ran into a situation that called for incremental loading of real-time data. So it turns out that, as in all things, there is a tradeoff between simplicity of implementation and fancy user experience and performance benefits.

But at least now we’re all on the same page!

We’re building Apollo, a futuristic data stack based on GraphQL, and we want to make complex data loading decisions a thing of the past. Let us know what interesting pagination situations you’ve run into by responding to this post!

Written by

Sashko Stubailo

Sashko Stubailo

Read more by Sashko Stubailo