March 16, 2016

The Discourse API in GraphQL: Part 1

Sashko Stubailo

Sashko Stubailo

In this post, we’ll talk about:

  1. Why Discourse is a great example for demonstrating the value of GraphQL over a REST API,
  2. How common types of REST endpoints map to GraphQL queries, and
  3. One way to do GraphQL login and auth on top of a Rails app

Note: this isn’t about writing a completely new backend; I’m implementing this API as a layer on top of Discourse’s existing REST API. So there is no new business logic or security to be done, it’s a simple wrapper!


Why Discourse?

Discourse, a popular forum platform.

I believe Discourse is a great open source example of a relatively complex best-practices web application. It’s built using Ruby on Rails on the server, and Ember on the client.

Before we dive in, let’s be clear: Discourse is extremely well-written and documented, and digging through the code and playing around with the development environment has been an absolute joy. Nevertheless, Discourse demonstrates a few unfortunate features that big apps with backend REST APIs and client-side rendering often exhibit:

(1) Data endpoints coupled with UI needs: Each endpoint in the backend API is coupled to the needs of a specific page of the UI. When you go to the “Latest” page, it loads the “latest.json” endpoint, and that endpoint returns nested JSON with exactly the right fields and objects that are needed to render that page of the UI. This is efficient for an app with one user interface, because it means every UI change does exactly one request to the backend, but once you start having multiple clients you might need to write separate endpoints for each with different data.

Fetching /latest.json on a Discourse server returns nested data tailored to the Latest page on the forum.

(2) Many endpoints for similar data: You can get information about a topic by querying for it directly, by passing a list of IDs to a batched endpoint, by fetching the list of latest topics, etc. There isn’t a sense of object identity, and there’s no guarantee you will get exactly the same fields for these different endpoints. For example, these two endpoints with similar URLs actually return totally different results:

/c/1.json        # returns the latest topics in that category
/c/1/show.json   # returns metadata about the category itself

(3) No external API documentation: The first result for “discourse API documentation” on Google is a thread on the Discourse forums. It’s clear that the backend data API is designed and used as an implementation detail of the app itself, rather than for external consumption. This is totally reasonable, as long as that is the only UI to your app, but I imagine if a new developer joined the Discourse team and was tasked with writing a native mobile app, they would need to read the application’s source code to figure out which endpoints to call and which parameters to pass. Browsing the complete list of endpoints can be a bit daunting; your options are either reading the routes.rb file (600 lines), looking at the output of rake routes (655 lines), or talking to someone very familiar with the backend.

If you stacked these endpoints one on top of the other, they would reach a small part of the way to the moon.

GraphQL as the solution

Let’s go over those issues and think about how a GraphQL API would solve them:

  1. Data coupled to UI: GraphQL lets the user interface specify what data it needs, by filtering the fields on returned objects, and following relationships between objects to create nested data.
  2. Many endpoints: Since GraphQL starts with a definition of the types of objects that can be fetched and fields on those objects, there is no concept of endpoints at all. You would get the category metadata in the same place as the list of topics in that category. You never have to split one endpoint or object into two just to improve your data fetching efficiency.
  3. API documentation: As covered in the previous article, “Will GraphQL replace REST documentation”, GraphQL is inherently self-documenting so there is no need to put in additional effort to write documentation.

From this set of points, and through my investigation so far, it seems like GraphQL is a very natural way to work with the data in Discourse. Now let’s dig into some questions that have come up during implementation!


Modeling REST in GraphQL

I’ve only scratched the surface of Discourse’s extensive API so far — just enough to log in and fetch some posts. Let’s go over the main difference in data modeling between REST and GraphQL:

  • REST consists of many URL endpoints, where inputs are passed through URL segments, query parameters, and headers. The result is a JSON blob with arbitrary structure.
  • GraphQL has many available root queries, which form the entry point into an object graph. The graph schema consists of types, fields, and relationships between them. The result of a request always matches the shape of the query you passed in.

GraphQL is a great replacement for REST, but sometimes things that feel natural in REST don’t map exactly onto the fields and arguments in GraphQL. It is worth spending some time to think about the most natural way to access semantic data in GraphQL, rather than directly mapping the endpoints onto queries and types.

Different object types for a forum

If you haven’t used Discourse before, it might be helpful to remember some of the different types of objects that exist in a forum:

  1. Categories: Essentially a list of topics
  2. Topics: Essentially a list of posts
  3. Posts: These have some content, metadata, likes, etc

In this case, we’re going to focus on categories and topics. Let’s go over some different types of REST endpoints, and see how they might best be represented in a GraphQL schema!

Single object endpoints

/c/12/show   # return metadata about category with ID 12

These are endpoints that return details about a single object. In this example, it’s information about a single category. These are very useful as an entry point into your object graph, since many queries will start by referring to a specific category on the forum, and then traverse to find some topics and other data.

It’s useful to have at least one of these for every type of object in your schema. The Relay Object Identification specification says that a Relay-compatible GraphQL server should allow the client to fetch any node in the graph by ID. This is necessary when the data requirements on the client change and your app wants to fetch just one new object, or a new field that wasn’t asked for previously.

Thankfully, pretty much every REST-style API provides an endpoint to fetch just one object of a certain type.

List endpoints

/latest       # return latest topics from all categories
/c/12/l/new   # return new unread topics from category 12

These endpoints return a list of objects. They can either be root queries representing an entry point into the object graph, or semantically a one-to-many relationship between two types. Here, the “latest” endpoint is an entry point, but “new” on a specific category is more of a relationship.

From a REST point of view, these two endpoints are nearly identical, but in GraphQL we would use them very differently:

# latest should be a root query
query {
  latest { ... }
}# new topics in a category should be a field on the category type
query {
  category(id: 12) {
    new { ... }
  }
}

This is one reason why a GraphQL schema will usually end up with a much smaller list of entry points than a REST API, where basically every single endpoint is a possible query root.

Covering multiple endpoints with an Enum argument

/top/all
/top/yearly
/top/quarterly
/top/monthly
/top/weekly
/top/daily

GraphQL has a more strongly typed argument system than HTTP/REST. In REST, arguments to your endpoint can be passed through query parameters, URL segments, or headers. In GraphQL we just have arguments, and we have the ability to restrict the types or values of those arguments. One tool in particular that turned out to be helpful was the humble Enum, or enumeration type (written here using the reference JavaScript implementation):

const TimePeriod = new GraphQLEnumType({
  name: 'TimePeriod',
  values: {
    ALL: { value: 'all' },
    YEARLY: { value: 'yearly' },
    QUARTERLY: { value: 'quarterly' },
    MONTHLY: { value: 'monthly' },
    WEEKLY: { value: 'weekly' },
    DAILY: { value: 'daily' },
  },
});

Now we can query our “top” topics like so:

{
  top(period: ALL) { ... }
}

Of course, the different values accepted by the “period” argument are encoded in the schema, so every client knows which ones are acceptable. Much more convenient than having a list of URLs to worry about!

Another way to do this would be to add another enum to replace the “top”, “latest”, “new”, etc. endpoints, so we would just do:

{
  topics(sort: TOP, period: ALL) { ... }
}

This would make sense because all of the variations of topic lists and sort orders return identically formatted information — a paginated list of posts.

The conclusion here is — once we break out of the realm of URLs, we can start modeling our data in a much more semantic and structured way, where you can more easily predict the path to a specific object in your graph.


Authentication against a Rails app

GraphQL currently doesn’t have a standard way to handle user accounts, login, authorization, and similar. Several approaches are discussed in a recent post by Jonas, but my goal so far in the Discourse experiment was to build the simplest thing that would work.

Essentially, the process of authentication and authorization boils down to two steps:

  1. Get a login token by hitting a login endpoint
  2. Send that login token with your request to prove your identity

When you access Discourse from your web browser, the login token is stored automatically in a cookie, and your browser sends that cookie along with every subsequent request. But when you’re making the requests from a Node.js server running GraphQL, you need to handle that yourself.

Getting a login token from Rails

My approach to figuring out how login worked was to use the Chrome inspector to analyze network traffic during a login flow in the Discourse app. It turned out to be pretty simple:

  1. Get a CSRF (Cross-site Request Forgery) token and a session cookie from the server, using the “/csrf” endpoint
  2. Send the CSRF token, session cookie, and a username/password to the “/session” endpoint, and extract the login token from the response cookies

In the GraphQL API, I decided to encode this in a single request, since we are only making requests from the GraphiQL UI and not from an actual web app. So logging in with GraphiQL through the wrapped API looks like this:

Since I’m just using a querying tool, it’s my responsibility to keep track of the login token somewhere. If you had a nice GraphQL client that was aware of login state, it would keep track of that token for you. But either way, it’s very simple — just one string is all you need to remember.

If you don’t implement this part, you can easily get a login token by inspecting your browser’s cookies for your favorite website. In Discourse, the token is stored in the cookie helpfully named “_t”.

Using a token to make logged-in requests

Certain endpoints on Discourse rely on the logged-in user to return different data. Two of interest are the “new” and “unread” topic listings — these only show you topics that you haven’t opened yet, and are only available if you are logged in. If you hit these endpoints and the server doesn’t recognize you as logged in, you get a 403 error. So we need to figure out how to pass along our token when making this request.

GraphQL-JS has a query context which can be useful for this purpose. It contains some values which are shared between every single resolver in the query, meaning if you set it in the root resolver, more deeply nested fields can use it. So we can create a wrapper for our queries that simply sets the token on this context, and then our HTTP fetching logic can put it in the right cookie header. So here’s what the query looks like:

You can see that we have a special “root” query that just takes the token as an argument, and the nested “new” resolver can now use this token in its request to fetch the new topics.

Now we can both log in and fetch restricted endpoints from any place we can make a GraphQL query!


Conclusion

The most exciting thing for me so far is how much I can see GraphQL making these types of APIs easier to understand and query. Rather than wading through a mess of documentation and trying a bunch of HTTP calls in the browser, both the inputs and outputs to the system are crisp and well-defined.

Whether it’s for an external API for people to consume your data, or just in your own app where you might have different UIs using the same backend, the value of this cannot be overstated.

I’m excited to build more on this API example, and then try to implement a Discourse mobile app on top of it! Having an app to browse the Meteor forums is something I’ve wanted for a long time, but the complexity of the API always stopped me. Now I feel I’m closer than ever, so let’s get started on that client code!

Written by

Sashko Stubailo

Sashko Stubailo

Read more by Sashko Stubailo