DataLoader and the N+1 Problem — Batching Database Queries in Node.js

Introduction

The N+1 query problem is one of the most common performance issues in Node.js applications. When fetching a user and their posts, you might execute one query for the user, then one query per post—turning a simple operation into hundreds of queries. DataLoader solves this elegantly through intelligent batching and caching within a single request lifecycle.

Understanding the N+1 Problem
Per-Request DataLoader Instances
Implementing Batch Functions for Prisma
Cache Behavior and Deduplication
Custom Batch Key Normalization
Priming the Cache
Composing DataLoaders for Nested Relationships
Checklist
Conclusion

Understanding the N+1 Problem

The N+1 problem manifests differently in REST and GraphQL contexts, but the root cause is identical: missing or inefficient batching.

// PROBLEMATIC: N+1 queries
async function getUserWithPosts(userId: string) {
  // Query 1: Fetch user
  const user = await db.user.findUnique({ where: { id: userId } });

  // Query N: Fetch each post individually
  const posts = await Promise.all(
    user.postIds.map(id => db.post.findUnique({ where: { id } }))
  );

  return { ...user, posts };
}

// For 1 user with 100 posts: 101 database round-trips

With DataLoader, we batch these queries intelligently:

import DataLoader from 'dataloader';
import { db } from './db';

type BatchLoadFn<K, V> = (keys: K[]) => Promise<(V | Error)[]>;

const postLoader = new DataLoader<string, Post>(
  async (postIds: string[]) => {
    // Single query for ALL posts, regardless of input size
    const posts = await db.post.findMany({
      where: { id: { in: postIds } },
    });

    // Return results in same order as input keys
    return postIds.map(
      id => posts.find(p => p.id === id) || new Error(`Post ${id} not found`)
    );
  },
  { cache: true } // Enable within-request deduplication
);

async function getUserWithPosts(userId: string) {
  const user = await db.user.findUnique({ where: { id: userId } });
  // Still looks like individual loads, but batches automatically
  const posts = await Promise.all(
    user.postIds.map(id => postLoader.load(id))
  );
  return { ...user, posts };
}

Per-Request DataLoader Instances

The critical mistake: creating DataLoader as a singleton. Caching must be scoped to a single request.

// ❌ WRONG: Singleton persists cache across requests
export const globalPostLoader = new DataLoader(/* ... */);

// ✓ CORRECT: Fresh instance per request
async function handleRequest(req: Request) {
  // Create in middleware or request handler
  const dataLoaders = {
    post: new DataLoader(async (ids: string[]) => {
      return db.post.findMany({ where: { id: { in: ids } } });
    }),
    author: new DataLoader(async (ids: string[]) => {
      return db.user.findMany({ where: { id: { in: ids } } });
    }),
  };

  // Pass to resolvers via context
  const result = await resolver(args, { dataloaders: dataLoaders });
  return result;
}

Implementing Batch Functions for Prisma

Prisma and DataLoader work beautifully together. Here's a production pattern:

interface DataloaderContext {
  user: DataLoader<string, User>;
  post: DataLoader<string, Post>;
  comment: DataLoader<string, Comment>;
}

function createDataloaders(): DataloaderContext {
  return {
    user: new DataLoader(
      async (userIds: string[]) => {
        const users = await db.user.findMany({
          where: { id: { in: userIds } },
        });
        // Critical: maintain input order
        return userIds.map(id => users.find(u => u.id === id)!);
      },
      { cache: true }
    ),

    post: new DataLoader(
      async (postIds: string[]) => {
        const posts = await db.post.findMany({
          where: { id: { in: postIds } },
          include: { author: true, tags: true },
        });
        return postIds.map(id => posts.find(p => p.id === id)!);
      },
      { cache: true }
    ),

    comment: new DataLoader(
      async (commentIds: string[]) => {
        return db.comment.findMany({
          where: { id: { in: commentIds } },
        });
      },
      { cache: true }
    ),
  };
}

Cache Behavior and Deduplication

DataLoader's cache is request-scoped. The cache: true option enables within-request deduplication:

const loader = new DataLoader(
  async (ids: string[]) => {
    console.log(`Batch loading: ${ids.join(', ')}`);
    return db.item.findMany({ where: { id: { in: ids } } });
  },
  { cache: true }
);

// Same request, two parts of code:
await loader.load('user-1'); // Batches: 'user-1'
await loader.load('user-2'); // Batches: 'user-1', 'user-2'
await loader.load('user-1'); // Cache hit, no new batch

// Output: "Batch loading: user-1, user-2" (single batch)

With cache: false, each load triggers a batch function call.

Custom Batch Key Normalization

When batch keys need transformation or normalization:

type BatchLoadFn<K, V> = (keys: K[]) => Promise<(V | Error)[]>;

const userByEmailLoader = new DataLoader<string, User | null>(
  async (emails: string[]) => {
    // Normalize emails before querying
    const normalizedEmails = emails.map(e => e.toLowerCase().trim());
    const users = await db.user.findMany({
      where: { email: { in: normalizedEmails } },
    });

    // Return in input order
    return emails.map(email => {
      const normalized = email.toLowerCase().trim();
      return users.find(u => u.email === normalized) || null;
    });
  },
  { cache: true }
);

// Works despite case differences
await userByEmailLoader.load('User@Example.com');
await userByEmailLoader.load('user@example.com'); // Cache hit

Priming the Cache

Pre-populate the cache with known data to avoid redundant queries:

function createUserLoader(context: RequestContext): DataLoader<string, User> {
  const loader = new DataLoader(
    async (userIds: string[]) => {
      return db.user.findMany({ where: { id: { in: userIds } } });
    },
    { cache: true }
  );

  // Prime with data from previous query
  if (context.currentUser) {
    loader.prime(context.currentUser.id, context.currentUser);
  }

  // Prime with preloaded batch
  context.preloadedUsers?.forEach(user => {
    loader.prime(user.id, user);
  });

  return loader;
}

Composing DataLoaders for Nested Relationships

Complex nested relationships benefit from composition:

class DataloaderFactory {
  private userLoader: DataLoader<string, User>;
  private postLoader: DataLoader<string, Post>;
  private commentsByPostLoader: DataLoader<string, Comment[]>;

  constructor() {
    this.userLoader = new DataLoader(async (ids: string[]) => {
      return db.user.findMany({ where: { id: { in: ids } } });
    }, { cache: true });

    this.postLoader = new DataLoader(async (ids: string[]) => {
      return db.post.findMany({
        where: { id: { in: ids } },
      });
    }, { cache: true });

    this.commentsByPostLoader = new DataLoader(async (postIds: string[]) => {
      const comments = await db.comment.findMany({
        where: { postId: { in: postIds } },
      });

      return postIds.map(id =>
        comments.filter(c => c.postId === id)
      );
    }, { cache: true });
  }

  async resolvePost(id: string) {
    const post = await this.postLoader.load(id);
    const author = await this.userLoader.load(post.authorId);
    const comments = await this.commentsByPostLoader.load(id);

    return { ...post, author, comments };
  }
}

Checklist

Identify N+1 query patterns in your resolver/handler code
Create per-request DataLoader instances, not singletons
Verify batch function returns results in input key order
Enable cache: true for request-scoped deduplication
Use .prime() to avoid redundant queries when data is already known
Monitor batch size with logging to ensure batching is working
Consider timeout for batch execution if your system has strict latency requirements
Test that concurrent requests have isolated caches
Document which loaders are available in your context object

Conclusion

DataLoader transforms the N+1 problem from a performance catastrophe into automatic optimization. By batching queries and maintaining request-scoped caches, you achieve efficient database access without complicating your resolver code. The key is understanding that DataLoader instances must be fresh per request and that proper key ordering in batch functions is non-negotiable for correctness.