June 21, 2016

How to structure GraphQL server code

Jonas Helfer

Jonas Helfer

GraphQL’s value proposition for full-stack and frontend developers is pretty clear. It provides the missing layer of abstraction between backends and frontends. GraphQL is incredibly easy to use on the client, but writing a good GraphQL server can be a bit more work, and can sometimes be tricky.

Having built a couple of GraphQL servers while working on Apollo, I thought it would be useful to do a writeup about a of the lessons I learned along the way.


1. Optimize human time, not machine time

One of the first traps I fell into was trying to optimize performance before I’d written a single line of code. Having worked with SQL backends before, my first question with GraphQL was:

“How do I make GraphQL send a JOIN query to my SQL database?”

It took me a while to realize that that question was the wrong question to be asking. I was trying to solve problems that I hadn’t even observed yet! Premature optimization is one of the cardinal sins of programming, and I was guilty of it. Instead of sticking to the clean abstraction that GraphQL gave me, I tried to optimize performance by making a single request to the DB. That made the code very hard to read and impossible to maintain.

When I finally stopped trying to optimize performance that I hadn’t even measured yet, I realized that sticking to one query per resolver actually optimized a far more important parameter: how many hours I spent writing and rewriting code every time the API changed.

It turns out that it doesn’t matter very much if your code isn’t the fastest, because scaling a cluster of GraphQL servers is easy. If you can cut the engineering time in half by running twice as many servers, you’d be stupid not to do that.

Reducing the time spent on building and maintaining software largely comes down to choosing the right abstractions, and that’s no different in GraphQL. I highly recommend reading David Parnas’ fantastic paper “On the Criteria To Be Used in Decomposing Systems into Modules”, if you haven’t already read it.


2. Choose the right abstractions

Out of the box, GraphQL provides you with two abstractions: a schema and resolvers.

The first layer of abstraction is provided by the GraphQL schema. It hides the details of the backend architecture from the frontend. The schema can even be written in GraphQL schema language, making it quite concise and easy to read or maintain. The most important decision at this layer is choosing which types and fields to create. This is specific to each application, but there are some general rules you can follow. For the sake of brevity I’ll skip them here, but if you’re interested, watch out for a future post on our GraphQL blog.

Resolvers — also called resolve functions — are the second layer of abstraction. They define how the data for a field in a query is actually fetched, i.e. how the field is “resolved”. In order to let a schema map to any number of backends, GraphQL needs to be very flexible. The schema itself is tightly constrained, so all that flexibility has to be in the resolvers.

Resolvers can contain arbitrary code, so being the lazy person I am, I first stuffed everything into the resolvers: data fetching, authorization, session management. As you can probably guess, it wasn’t readable nor maintainable. It was a bit quicker to write at first, but soon I hit the tipping point where I wasted more time making changes in half a dozen places for a minor change in the schema.

Things looked a lot better when I started using two more abstractions: models and connectors.

3. Structure your code with Models and Connectors

Model: a set of functions to read/write data of a certain GraphQL type by using various connectors. Models contain additional business logic, such as permission checks, and are usually application-specific.

Connector: a layer on top of a database/backend driver that has GraphQL-specific error handling, logging, caching and batching. Only needs to be implemented once for each backend, and can be reused in many apps

By using models and connectors, my resolvers turned into simple switches that map inputs and arguments to a function call in the underlying model layer, where the actual business logic resides.

Note: To be clear, models and connectors aren’t abstractions that are built into GraphQL, but they emerge as a natural way to structure code in a GraphQL server.


It’s easiest to explain things with an example, so let’s take the Author and Post models from our GraphQL tutorial to show how it works in practice:

Data for the fields of Authors and Posts is stored in a MySQL database, except for the views field of Post, which is stored in MongoDB.

Instead of writing all the code directly in the resolvers, it makes sense to define two models — one for Author and one for Post, and use two connectors — one for MySQL and one for MongoDB.

Models are application-specific, while connectors are per-backend and can be shared across applicaitions

As the diagram above shows, the Post model uses more than one connector, because the data for that type is stored in two different backends.


Here’s an example of what it could look like in actual code.

Resolver:

import { PostModel } from './models';
const resolvers = {
  Authors: {
    posts(author, args, context) {
      return PostModel.findByAuthor(author.id, context);
    },
  },
// ...
};

Model:

const models = {
  PostModel: {
    findByAuthor(authorId, context){
      context.mysql.raw(
        'SELECT * FROM posts WHERE authorId = ?',
        authorId,
      );
    },
    getViews(postId){
      return context.mongo.collection('views').find({postId});
    },
    // ...
  }
};  

A simple mongo connector:

class MongoConnector {
  constructor(connection){
    this.connection = connection;
  }  closeConnection(){
    this.connection.close();
  }  collection(collectionName){
    // caching, batching and logging could be added here
    return connection.collection(connectionName);
  }
}

And here’s how the server would be initialized:

// other imports ...// setting up DB connections etc.import schema from './schema';
import resolvers from './resolvers';
import {
  MongoConnector,
  MySQLConnector,
} from './connectors';app.use('/graphql', req => {
  const mongo = new MongoConnector(mongoConnection);
  const mysql = new MySQLConnector(mysqlConnection);
  return {
    schema,
    resolvers,
    context: {
      conn: {
        mongo,
        mysql,
      },
    },
  };
});// ...

For a more complete (and more complicated) example, check out our GitHunt demo app. It uses models and connectors in the way described here, and it also has a nice example of how to cache and batch requests to the GitHub API with a library called dataloader.

Note: If you want to use an ORM like Sequelize or Mongoose, that’s also a decent way to get started. In that case, the ORM takes care of both the model and the connector. Our server tutorial is probably the most complete example out there.

4. Don’t fly blind — measure twice, code once

Once I was reasonably happy with the structure of my GraphQL server, I was ready to optimize performance.

Remember my earlier question: “How do I make GraphQL send a JOIN query to my SQL database?”. Now I actually knew enough to answer the question:

The short answer is: you don’t. Use caching and batching instead.

The longer answer is: You can do it, but it comes with a heavy cost to readability and maintainability, and it’s up for debate whether JOINs are actually faster than smaller queries that are efficiently batched and cached.

You should never optimize performance before you’ve measured performance, and that is true in this case as well. There are so many factors involved — CPU load, memory consumption, latency to DB, HTTP vs HTTP2, database engine, etc. — that’s it’s very hard to make an accurate guess as to what issues you’ll run into at scale. If you optimize A and it turns out that B was the real bottleneck, you’ve just wasted a bunch of time and made your code harder to read, without any actual benefit.

Okay, so how do you measure performance? GraphQL endpoints are much more complex than REST endpoints, so just measuring how long it takes for queries to complete doesn’t give you an accurate enough picture. It might tell you that you have a problem, but it won’t tell you where your problem is at.

In order to know which parts of your server are slow, you need to instrument individual resolve functions. Conventional profiling and performance monitoring tools don’t really give you a good picture of what’s going on in the server, so we put our heads together and built a custom GraphQL tracer. Even though it’s still under active development, it has already helped us and others find and fix bottlenecks and performance issues in our GraphQL servers, which we would have not found otherwise!

Here’s one of the screens, showing a partial trace of a slow GraphQL request:

A typical trace from Apollo’s GraphQL Tracer

Our GraphQL tracer tool is still in closed beta, but I wanted to show off this screenshot anyway, because I think it’s pretty awesome. It tells you immediately how the request’s latency is broken down into the duration of different resolvers, it shows which resolvers ran concurrently and which ones had to wait for others to finish. Armed with this information, you can now make an informed decision about which part of your server needs to be optimized, and whether your optimizations actually make a difference. Pretty cool, right!?

If you’re using GraphQL in production and would like to try out Apollo GraphQL Tracer, just drop us a line on the Apollo Slack, and we’ll work something out!


That’s all I have for now. I hope that after having read this post you’ll be able to avoid some of the most common obstacles people run into when building GraphQL servers.

There’s a lot more to write about building GraphQL servers, so if you have a question that you’d like to see answered, let me know in the comments!

Written by

Jonas Helfer

Jonas Helfer

Read more by Jonas Helfer