Preface
As mentioned in a previous post, Pocket has started moving to a Federated GraphQL API using Apollo Federation. We built a handful of Implementing Services that resolved entities and very quickly run into the N+1 query problem with the resolvers, and this is how we solved it.
What is this N+1 problem I speak of?
Consider the following schemas for an Articles
service, and an Authors
service:
Articles service schema
type Article @key(fields: "id") {
id: ID!
title: String
slug: String
body: String
}
type Query {
get50Articles: [Article]
}
Authors service schema
scalar Url
extend type Article {
id: ID! @external
author: Author
}
type Author {
name: String
profileUrl: Url
}
Now, given the schemas defined above, a client application requests 50 articles with their authors to display on a page using the following query:
query Get50Articles {
get50Articles {
id
slug
title
body
author {
name
profileUrl
}
}
}
Here’s what happens: The gateway will first fetch the items from the Articles
service then for each article id
, send a list of representations for the fetched articles to the Authors
service.
{
"query": ...,
"variables": {
"_representations": [
{
"__typename": "Article",
"id": "1"
},
{
"__typename": "Article",
"id": "2"
},
{
"__typename": "Article",
"id": "3"
},
...
]
}
}
The Authors
service will then go over each representation in the list to resolve it using the following resolver:
const resolvers = {
Article: {
author: ({ id }) => {
return getAuthorByArticleId(id);
}
}
}
I bet you already see what is going to happen here, and the request immediately becomes inefficient, but I’ll tell you anyway.
Our resolver is going to sequentially make 50 separate requests to the datastore for authors (N+1). This is bad for performance because these same 50 datastore requests will happen for each request to the service, and nobody, definitely not Pocket, has time for that.
The Solution
Use a DataLoader to batch requests to the datastore.
DataLoader will coalesce all individual loads which occur within a single frame of execution (a single tick of the event loop) and then call your batch function with all requested keys.
Let’s update our resolver to instead use a dataloader.
import DataLoader from 'dataloader';
const authorLoader = new DataLoader((articleIds: string[]) => batchGetAuthors(articleIds));
const resolvers = {
Article: {
author: ({ id }) => {
return authorLoader.load(id);
}
}
}
Now instead of calling our datastore for each article id
, which would take 50 calls, we can make a batch request and pass all the ids in and make a single call!
If we can’t make a single call, we can still greatly reduce the response time with Promises
and making all 50 requests asynchronously!
The DataLoader also memoizes the results for a given key
DataLoader provides a memoization cache for all loads which occur in a single request to your application. After .load() is called once with a given key, the resulting value is cached to eliminate redundant loads.
That is to say, if the input to the dataloader’s load
function is the same as a previous call, the output of the previous call is returned. This reduces the number of keys that need to be resolved by the batch function, which then reduces the operation time. DataLoader achieves this using an in-memory cache. Checkout the documentation for more.
Much better! We like this, and we want this!
Taking The Solution A Step Further
Now don’t get me wrong, batching, combined with an in-memory cache is great until you have a distributed service; which is pretty much almost every service we run here at Pocket. In this case, we don’t want the cache sitting on a single node, we want all the nodes in our distributed service to have access to the cache.
This presents the need for a distributed cache like Redis or Memcached. This is, however, not as simple as setting a cache map for the DataLoader. We need to fetch from the cache, and cache results within our batch function.
We have created a handy little utility package for this exact purpose. Below is what our batch function could look like using the package:
import { batchCacheFn, LoaderCacheInterface } from '@pocket-tools/apollo-utils';
async function batchGetAuthors(articleIds: string[]) {
return batchCacheFn<string, Author>({
values: articleIds,
valueKeyFn: (articleId: string) => articleId, // same return value as cacheKeyFn
callback: (articleIds: string[]) => getAuthorsByArticleIds(articleIds), // batch fetching
cache: new Cache(), // This needs to implement the LoaderCacheInterface
cacheKeyPrefix: 'author-article-', // optional
returnTypeKeyFn: (author: Author) => author.articleId, // same return value as valueFn
maxAge: 300 // maximum cache age
});
}
And just like that, we are batch fetching authors from the database and caching the results for subsequent requests!
I hope you found this helpful :)
~ kelvin, backend team
Tagged with: #graphql, #apollo, #federation, #dataloader, #caching, #apollo-utils