November 7, 2016

Tracing Ruby’s GraphQL execution

Tom Coleman

Tom Coleman

One of the great things about GraphQL is that as a specification, not an implementation, it has been adopted in a wide variety of backend environments. All developers can benefit from GraphQL while continuing to enjoy the unique things about their platform of choice.

Last week at GraphQL Summit we launched Apollo Optics, a GraphQL performance and usage monitoring software product. We think Optics will be an integral tool to running GraphQL servers on all platforms; and as we’ve begun building out our support for those platforms, we’ve discovered differences in the execution models of GraphQL servers that are quite interesting — and with Ruby in particular.

Building agents for JS and Ruby servers

Today we’ve built a production-ready agent (the tool that collects the statistics about the queries that are running against your servers) for the reference JavaScript GraphQL servers, and our early access customers, including our own Galaxy PaaS, have used it to optimize their query execution.

Also, I’ve been involved in building an agent for Ruby-based servers that use the graphql-ruby gem. With graphql-ruby’s author Robert Mosolgo, and Shopify, who’ve created the popular graphql-batch gem used to group together database access, I’ve put an initial beta version together for users to try out.

We’ve also heard from many users who are keen to use Optics in other environments, and we’re working hard on enabling those agents. If you’d like to help build an agent for your platform of choice, get in touch with us via the #optics-agent-dev channel in our Apollo Slack.

Along the way, I’ve discovered some interesting things about the execution model of GraphQL in Ruby, which comes as a bit of a surprise as someone who has only written his GraphQL servers in NodeJS so far. I think it makes for an interesting journey!

Experimenting with GitHunt

The principal app I’ve been testing our agent against is a version of the GitHunt API server written in Ruby on Rails. GitHunt is a simple “Hacker News” of GitHub repositories, and stores some data in a local SQLite database, as well as querying GitHub’s REST API for repository and user data. The GraphQL server combines these sources behind a single schema.

My reference implementation is the NodeJS version of the server. When you run a query such as:

query Feed {
  feed (type: NEW, limit: 3) {
    repository {
      owner { login }
      name
    }    postedBy { login }
    
    vote {
      vote_value
    }
  }
}

Execution happens in around 800ms, and the trace in Optics looks like this:

The three entries get grabbed first (via a SQL query); then in parallel, we get :

  • the user posting (over REST from GitHub),
  • the repository info (also over REST from GH)
  • some vote information (from SQL)

Ruby’s execution

In Ruby the behavior is quite different. Using the default execution strategy I quickly noticed that (unsurprisingly) execution happened in serial, leading to extremely long total query times and traces like this:

To try and improve performance, and to more closely mirror what users have told us they’re using in real production environments, I tried the graphql-batch gem, which allow you to resolve a set of fields together via a “loader” mechanism.

This is especially useful for loading many of the same type of object (the Entry.posted field, for example,which is loaded from three entries) if fetching multiple objects at once is faster than fetching each one individually.

For a database query like fetching the user’s vote for an entry, this is certainly the case as we can use a SELECT .. FROM (X, Y, Z) .

For a third-party API request like Entry.postedBy and Entry.repository (which fetch data from GitHub), the loader mechanism allowed me to easily use the parallel gem to perform the HTTP requests in separate threads).

When running with these loaders, I saw a trace like:

You can see in the trace that each type of field resolves together (within the loader responsible for that field type), but the three different resolvers still execute serially. This was surprising to me. I expected the loaders, which use a promise-based asynchronous execution mechanism, to resolve simultaneously, which is how things work in the NodeJS case above.

One of the benefits of GraphQL is the execution model allows us to parallelize many field accesses, so it seems fruitful to try and get those loaders to execute in parallel.

Threading loaders

As a proof of concept, I again used the parallel gem to run loaders over multiple threads. The code is quite simple:

Parallel.each(loaders.values, in_threads: 4) do |loader|
  loader.resolve
end

Using that, I get the trace below, which is now in line with the JS version:

Conclusions

It’s perhaps not surprising that the default Ruby GraphQL execution models are serial, given Ruby (and Rails in particular) come with a set of assumptions about a single request’s code being single threaded. These include Rails’ historical lack of thread safety, and assumptions about db connection pools sizes on the server.

Also, as we saw, it’s certainly possible to use a single loader to thread certain kinds of concurrent accesses in a straightforward way.

However, I think there is a case for a true “parallel async” execution model in graphql-ruby, which if used carefully could drastically improve request latencies in queries that are similar to this GitHunt example. There’s currently an active thread on the graphql-ruby repo discussing exactly this, and we can hope to see a true parallel execution model soon.

I’m excited to see how this area moves forward, and how the different execution strategies in the various runtime environments inform each other. We think Optics is a great way to visualize this data and should help driving things forward!

Written by

Tom Coleman

Tom Coleman

Read more by Tom Coleman