44 · Best Practices

Connection Best Practices

Connection pooling, lifecycle, and driver configuration

connection

// ✅ Create ONE MongoClient per process — never per request
// The client manages a connection pool internally
let client  // module-level singleton

export async function getDb() {
  if (!client) {
    client = new MongoClient(process.env.MONGODB_URI, {
      maxPoolSize:             50,    // adjust per service (default 100)
      minPoolSize:             5,     // keep warm connections
      serverSelectionTimeoutMS: 5000, // fail fast if no server found
      retryWrites:             true,  // auto-retry on network error (default)
      retryReads:              true,
      w:                       "majority",  // default write concern
      compressors:             ["snappy", "zlib"]  // wire compression
    })
    await client.connect()
    process.on("SIGTERM", () => client.close())
    process.on("SIGINT",  () => client.close())
  }
  return client.db("myDatabase")
}

// ❌ ANTI-PATTERN: creating a new client per request
app.get("/orders", async (req, res) => {
  const client = new MongoClient(uri)  // creates new pool per request!
  await client.connect()
  const orders = await client.db("mydb").collection("orders").find().toArray()
  await client.close()                 // closes pool before response
  res.json(orders)
})

// ✅ Connection string best practices:
// Always specify authSource for non-admin users
// mongodb://appUser:pass@host:27017/mydb?authSource=mydb&replicaSet=myRS
// Use SRV for Atlas: mongodb+srv://...
// Never hardcode credentials — use env vars or secrets manager

DANGER

Creating a new MongoClient per request is the most common and most damaging MongoDB anti-pattern in applications. Each client creates its own connection pool. Under load, this exhausts the server's connection limit (~200 by default in standalone, 128 per mongos in Atlas). Always share a single client instance across the entire application lifecycle.

Schema Design Best Practices

Design for your access patterns, not for normalization

schema

// ✅ Design rules summary

// 1. Embed what is ALWAYS read together
// 2. Reference what is read independently or unbounded in size
// 3. ALL arrays must be BOUNDED or in separate collection
// 4. Pre-compute aggregates (counts, totals) to avoid expensive reads
// 5. Atomic single-document writes are free — no transaction needed

// ❌ UNBOUNDED ARRAY — will hit 16MB eventually
{ userId: "U1", followers: [id1, id2, id3, ...] }

// ✅ Separate collection for 1:many relationships
db.follows.insertOne({ followerId: "U1", followeeId: "U2", followedAt: new Date() })
db.follows.createIndex({ followeeId: 1 })
db.follows.createIndex({ followerId: 1 })

// ❌ DYNAMIC FIELD NAMES — cannot index, schema bloats
{ pageViews: { "2024-01": 120, "2024-02": 205, "2024-03": 180 } }

// ✅ Array of objects — indexable, consistent
{ pageViews: [{ month: "2024-01", views: 120 }, { month: "2024-02", views: 205 }] }

// Monitor document size growth proactively:
db.orders.aggregate([
  { $project: { size: { $bsonSize: "$$ROOT" } } },
  { $group: {
    _id: null,
    maxSize: { $max: "$size" },
    avgSize: { $avg: "$size" },
    p95Size: { $percentile: { input: "$size", p: [0.95], method: "approximate" } }
  }}
])
// Alert when maxSize > 10MB (approaching 16MB limit)

Schema Design Quick Rules

Rule	Why
No unbounded arrays	16MB document limit; update performance degrades
No field names as data	Cannot index dynamic field names
Pre-compute expensive aggregates	O(1) read vs O(N) scan on every request
Use ObjectId or UUID for _id	Monotonic _id → write hotspot in sharded clusters
Consistent field types	Mixed types defeat indexes and make queries unpredictable
Embed <15 items max in array	Arrays with many elements are inefficient to update and search

Indexing Best Practices

Create precisely what queries need — not more

indexes

// ✅ Index strategy rules

// 1. Use ESR rule for compound indexes: Equality → Sort → Range
db.orders.createIndex({ customerId: 1, status: 1, createdAt: -1 })
// Query: { customerId: "C1", status: "completed" } .sort({ createdAt: -1 })

// 2. Index only fields used in queries — every index has write overhead
// 3. Verify with explain("executionStats") before deploying

// 4. Audit unused indexes regularly:
db.orders.aggregate([
  { $indexStats: {} },
  { $project: {
    name: "$name",
    ops:  "$accesses.ops",
    since: "$accesses.since"
  }},
  { $match: { ops: { $lt: 100 } } }   // low-use indexes — candidates for removal
])

// 5. Before dropping an index, HIDE it first (safe removal):
db.orders.hideIndex("idx_old_field")
// Monitor for 24–48h — no performance degradation → safe to drop
db.orders.dropIndex("idx_old_field")

// 6. Create indexes in the background (v4.2+ default):
db.orders.createIndex({ field: 1 })   // non-blocking by default

// 7. Use partial indexes to keep index small:
db.orders.createIndex(
  { customerId: 1 },
  { partialFilterExpression: { status: "pending" } }
)
// Only ~5% of orders are pending — index 95% smaller than full collection

// 8. Use TTL indexes for automatic cleanup instead of delete jobs:
db.sessions.createIndex({ expiresAt: 1 }, { expireAfterSeconds: 0 })
// Deletes session when: current_time >= document.expiresAt

Query Best Practices

Write efficient queries that scale

queries

// ✅ Always use projection — reduce network + memory overhead
db.users.findOne({ _id: userId }, { name: 1, email: 1, _id: 0 })
// ❌ Never fetch the whole document when you only need 2 fields

// ✅ Use cursor limit and skip for pagination
// ❌ AVOID large skip values — scans from start each time
db.orders.find().sort({ createdAt: -1 }).skip(10000).limit(20)  // BAD
// ✅ Range-based pagination (cursor pagination):
db.orders.find({ createdAt: { $lt: lastSeenDate } })
  .sort({ createdAt: -1 }).limit(20)

// ✅ Add maxTimeMS to all application queries
await db.collection("orders").find(filter)
  .maxTimeMS(5000)   // fail fast rather than hang
  .toArray()

// ✅ Count with countDocuments vs estimated
db.orders.countDocuments({ status: "pending" })  // accurate, uses index
db.orders.estimatedDocumentCount()               // O(1), uses metadata, no filter

// ✅ Use $exists sparingly — not supported on partial indexes efficiently
// ❌ Avoid: db.orders.find({ _id: { $exists: true } })
// ✅ Avoid regex without anchoring at start (cannot use index efficiently):
db.products.find({ name: /^coffee/ })   // ✅ anchored — uses index prefix
db.products.find({ name: /coffee/ })    // ❌ not anchored — COLLSCAN

// ✅ Use aggregation $match as early as possible to reduce pipeline data:
db.orders.aggregate([
  { $match: { status: "completed", year: 2024 } },   // filter first
  { $lookup: {...} },                                 // then join (less data)
  { $group: {...} }
])

Query Anti-Patterns

Anti-Pattern	Problem	Fix
No projection	Fetches entire document; wastes bandwidth/memory	Always specify fields needed
Large skip() pagination	Scans from document 0 on every page; O(N) per page	Use range-based (cursor) pagination
$where / $function	Runs JavaScript per document; cannot use indexes; slow	Use native operators; partial index
$regEx without ^ anchor	Cannot use index; COLLSCAN on every query	Anchor with ^ or use Atlas Search
countDocuments() on large collection without filter	Full collection scan for exact count	Use estimatedDocumentCount() or maintain count field
$lookup on unindexed field	Full scan of foreign collection per document	Index the foreignField
Fetching all docs to filter in app	Moves filtering to application layer; O(N) transfer	Push filter predicates into MongoDB query

Write Safety

Write concern, idempotency, and safe patterns

writes

// ✅ Write concern recommendations by use case:
// Financial data, user accounts:  w: "majority", j: true
// Normal application writes:      w: "majority" (default in Atlas)
// Analytics/metrics inserts:      w: 1 (speed over durability)
// Fire-and-forget logging:        w: 0

// ✅ Use findOneAndUpdate for read-modify-write atomicity
// ❌ NEVER: find → modify in app → update (race condition!)
const result = await db.collection("inventory").findOneAndUpdate(
  { _id: productId, qty: { $gte: requestedQty } },   // atomic check + update
  { $inc: { qty: -requestedQty } },
  { returnDocument: "after" }
)
if (!result) throw new Error("Insufficient inventory")

// ✅ Idempotent upsert with $setOnInsert
await db.collection("events").updateOne(
  { idempotencyKey: eventId },           // unique key prevents duplicates
  {
    $setOnInsert: {                        // only set on first insert
      idempotencyKey: eventId,
      payload:        eventData,
      processedAt:    new Date()
    }
  },
  { upsert: true }
)
// Safe to call multiple times — second call is a no-op

// ✅ Use $inc for atomic counters (never read-modify-write):
db.posts.updateOne({ _id: postId }, { $inc: { viewCount: 1 } })
// ❌ NEVER: const doc = await find(); doc.viewCount++; await replaceOne(doc)

// ✅ Batch writes instead of individual writes:
// ❌ 1000 insertOne() calls = 1000 round trips
// ✅ 1 insertMany([...1000 docs]) = 1 round trip
await db.collection("logs").insertMany(logBatch, { ordered: false })

Security Checklist

Minimum security baseline for any MongoDB deployment

security

// ✅ SECURITY CHECKLIST — verify before production launch

// 1. Authentication enabled
//    security.authorization: enabled in mongod.conf
//    OR use Atlas (auth always on)

// 2. Principle of least privilege
//    App user: readWrite on its database ONLY
//    No app user has root, clusterAdmin, or userAdminAnyDatabase

// 3. TLS enabled for all connections
//    net.tls.mode: requireTLS in mongod.conf
//    Connection strings include: ?tls=true or use mongodb+srv://

// 4. Network binding restricted
//    net.bindIp: 127.0.0.1,10.0.1.5  (NOT 0.0.0.0)
//    Firewall: restrict MongoDB port (27017) to app servers only

// 5. Intra-cluster authentication
//    Keyfile OR x.509 between replica set members

// 6. Credentials in environment variables
const uri = process.env.MONGODB_URI  // ✅
const uri = "mongodb://user:password@host"  // ❌ hardcoded

// 7. Disable server-side JavaScript if not needed
//    security.javascriptEnabled: false

// 8. Schema validation for sensitive collections
db.createCollection("users", {
  validator: { $jsonSchema: { required: ["email", "passwordHash"] } }
})

// 9. Audit logging for compliance (Enterprise/Atlas)
//    Log authenticate, authCheck, dropCollection at minimum

// 10. Rotate credentials regularly
//     Human users: every 90 days
//     Service accounts: on team member departure

Production Checklist

Complete pre-launch verification checklist

production

Infrastructure

// ✓ Replica set deployed (minimum 3 nodes — never standalone in production)
// ✓ Replica set has ODD number of voting members (3, 5, 7)
// ✓ WiredTiger cache configured: 50% of available RAM
//   storage.wiredTiger.engineConfig.cacheSizeGB: 8  (for 16GB RAM server)
// ✓ Dedicated data volume — not sharing disk with OS or logs
// ✓ XFS filesystem (recommended for MongoDB on Linux)
// ✓ noatime mount option on data partition
// ✓ Transparent Huge Pages disabled (THP causes latency spikes)
//   echo never > /sys/kernel/mm/transparent_hugepage/enabled

Application

// ✓ Single MongoClient instance per process (not per request)
// ✓ All queries have maxTimeMS set
// ✓ retryWrites: true and retryReads: true on MongoClient
// ✓ Connection string uses replicaSet=name (for replica set awareness)
// ✓ Write concern is w: "majority" for critical data
// ✓ All queries verified with explain() — no COLLSCAN in hot paths
// ✓ All arrays are bounded
// ✓ Schema validation on critical collections
// ✓ Sensitive fields not returned in default projections (passwords, PII)

Operations

// ✓ Database profiler at level 1 (slowms: 100) watching system.profile
// ✓ Alerting configured:
//   Connection count > 80% of maxConnections
//   Replication lag > 30 seconds
//   WiredTiger cache used > 90%
//   Disk usage > 80%
//   P99 query time > 500ms
// ✓ Backup configured and tested:
//   Schedule matches RPO requirements
//   Restore tested within last 30 days
//   Backups stored in different region
// ✓ Oplog sized to cover longest maintenance window
// ✓ Index maintenance plan: review $indexStats monthly

// ✓ Final verification commands:
rs.status()              // all members HEALTHY, lag near 0
db.serverStatus().connections  // current << available
sh.status()              // (sharded only) chunks balanced
db.adminCommand({ getParameter: 1, authenticationMechanisms: 1 })
// should include SCRAM-SHA-256 or MONGODB-X509

TIP

The single most impactful performance improvement for most applications is ensuring every hot query path has a covering index verified with explain(). The second most impactful is ensuring the MongoClient is a singleton. Fix these two issues and 80% of MongoDB performance problems disappear.