February 12, 2020

Apollo Server File Upload Best Practices

Khalil Stemmler
Developer Advocate
@stemmlerjs
BackendHow-to

File Uploads have an interesting history in the Apollo ecosystem.

With Apollo Server 2.0, you can perform file uploads right out of the box. Apollo Server ships with the ability to handle multipart requests that contain file data. This means you can send a mutation to Apollo Server containing a file, pipe it to the filesystem, or pipe it to a cloud storage provider instead.

While this approach works and is relatively trivial to implement, it does come with drawbacks. The most apparent drawback is that an Apollo Server is now responsible for handling large amounts of binary data, and this has the potential to degrade performance.

Depending on your problem domain and your use case, the way you set up file uploads may differ. In this article, we’ll compare the advantages and disadvantages of three common ways to perform file uploads within a GraphQL architecture.

  1. Multipart Upload Requests
  2. Signed URLs,
  3. and utilizing an Image Server.

Approach #1: Multipart Upload Requests

A multipart request is an HTTP request that is able to contain, well, multiple parts 😅In a single request, the spec enables you to send text, file data, JSON objects, and whatever else you like. All in a single request.

This capability seems to lend itself closely to the way that we conceptualize API communication with GraphQL. Both queries and mutations are capable of sending or asking for exactly what we want — no more, no less.

From Apollo Client, it’s possible to, in the form of a mutation, send along a stream of file data using a multipart request in order for the server to pipe it to its final destination.

In Principled GraphQL, a guide created by the Apollo founders on the principles of the Data Graph, it suggests that we “Separate the GraphQL Layer from the Service Layer”. Normally, in a production client-server architecture, the client doesn’t speak directly to the backend service. Normally, we use an additional layer to “delegate concerns such as load balancing, caching, service location, or API key management to a separate tier”.

For greenfield hobbyist projects (less critical or proof of concept), it’s common that the client-facing GraphQL service is also the backend service that performs business logic, talks directly to a database (perhaps through an ORM), and returns data needed for the resolvers. While we don’t recommend this architecture for production environments, it’s not a bad way to get started learning the Apollo ecosystem.

Approach #1 to file uploads using a Hobbyist GraphQL Architecture.

In a hobbyist project depicted above, the front-end app using Apollo Client can upload a file to an Apollo Server using the apollo-upload-client package.

On the server-side, Apollo Server exposes an Upload scalar that can be referenced from within your GraphQL schema in an upload mutation. During an upload mutation, the Upload type exposes a stream that can be piped to the destination of your choosing (filesystem or cloud storage).

Advantages

It’s quick, easy, and works right out of the box.

Disadvantages

Scaling file uploads this way would likely impose some challenges due to the expensiveness of the operation. File Uploads put a lot of stress on the GraphQL Server, which we recommend using as a proxy to backend services rather than handling the heavy lifting.

Federated architecture considerations

This approach breaks down even further in a larger corporation, taking advantage of the separation of concerns at the service level using Apollo Federation. In such an environment, a data stream initiated from the client must pass through Apollo Gateway, then to the federated GraphQL service before finally being able to construct an upload stream to pipe the incoming data to a cloud provider and eventually resolve with a URL.

Approach #1 using Apollo Federation puts a lot of stress on the Apollo Gateway.

Problems with file uploads using this approach are plenty.

  • 1) Not an efficient use of network design. This approach would mean that the entire stream needs to travel from the client to the Apollo Gateway to the Federated Service to the cloud service. Requests to perform this upload mutation could take a long time.
  • 2) This is an unnecessarily expensive operation that we may wish to separate from regular query and mutation traffic. Separating expensive operations from regular ones may help to maintain high throughput since the regular traffic will be much more common. Essentially, the role of an Apollo GraphQL Gateway is to proxy requests to a backend service to handle the heavy lifting, and we want our GraphQL Gateways to resolve as much data as possible.

For a walkthrough on how to setup Multipart Uploads using GraphQL, check out the accompanying article, “☝️ GraphQL File Uploads with React Hooks, TypeScript & Amazon S3 [Tutorial]”.

Approach #2: Signed URL uploads

Several cloud services offer the ability to perform file uploads using Signed Upload URLs.

With Signed Upload URLs, we give the client a temporary URL to which they can upload the file directly, bypassing the need to pass file data through a GraphQL server entirely.

Advantages

  • 1) Scales well. Since the expensive work of shuffling around file data has been offloaded to cloud services that handle this type of operation effectively, our services can continue to serve regular site traffic.
  • 2) Using graphql-upload, which leverages multipart/form-data in a single stream, files can be processed linearly.
  • 3) It’s possible that the URL may point to a POP (place of presence) edge node that is closest to the client and can speed up the time it takes to perform an upload.

Disadvantages

Admittedly, this approach is a bit more complex to configure, and some services might not offer Signed URL Uploads (though several do).

Also, take into consideration the scenario where we want to update a user’s profile picture. How does the server get to know about the uploaded URL?

That may require either:

  • a) The client to perform a subsequent mutation to update their profile picture.
  • b)The server to listen for a webhook or event notifying it that a file was uploaded. This approach is preferred because it decouples the un-ideal relationship of the server relying on the client. Depending on the cloud service, this feature may or may not be possible.

Another disadvantage of this approach is:

  • Revoking access to signed URLs isn’t trivial — sometimes, it’s impossible. One solution is to make the lifetime of the URL small, but that may have other negative consequences (large or slow uploads could fail).

Approach #3: Utilizing an accompanying File Upload/Serving system

For Graph Manager, our commercial product that helps you manage, validate, and secure your organization’s data graph, our infra team experimented with Signed URLs. Initially, we explored using Signed URLs but there were a few factors that we didn’t like. Due to the number of steps involved between client and server, we settled on a different approach — to roll our own file upload/serving server.

Approach #3 utilizes managed infrastructure for your own file uploading system.

Advantages

Scales horizontally, full control over how it works. Using one fast storage location (NFS, or Cloud service), we can horizontally scale the # of application servers that handle uploads.

Disadvantages

Because it is managed infrastructure, like the best things in life, it requires significantly more effort to implement and maintain.

Summary

Some say that a developer’s favorite response to a technical question is “it depends”, and in this case, when deciding on which approach to take for implementing file uploads, it really does depend on what your use case is.

Here’s what we recommend:

  1. For hobbyist and proof-of-concept projects, use Approach #1 — Multipart Upload Requests. This is handy for low-traffic, non-critical applications.
  2. (Recommended) For enterprise applications, consider Approach #2 — Signed URLs. This offloads the expensive work of handling file uploads to a cloud hosting service instead. Depending on how robust the service is, it may mean an additional round trip in order to update your backend with the URL of the uploaded file, but it comes with the amazing ability to sleep peacefully at night knowing that scaling uploads won’t be an issue.
  3. Additionally, for enterprise applications, consider Approach #3 — creating your own Image Upload/Serving system. While this approach means you have more control over how uploads work in your organization and can scale appropriately, it is a considerable amount of work. You will want to consider the operational costs of maintaining your own file hosting infrastructure, as using a cloud service may be preferable if it’s not a core part of your domain.

A special thank you to Emelia Smith for feedback on the initial draft of this post and Sachin Shinde for your wizardry infrastructure knowledge.

Written by

Khalil Stemmler

Follow

Developer Advocate at Apollo GraphQL ⚡ Author of solidbook.io ⚡ Advanced TypeScript & DDD at khalilstemmler.com ⚡

Read more by Khalil Stemmler

Stay in our orbit!

Become an Apollo insider and get first access to new features, best practices, and community events. Oh, and no junk mail. Ever.