23 · Indexes Introduction

01

What is an Index

A sorted data structure that maps field values to document locations

concept

An index in MongoDB is a supplementary data structure that stores a sorted, compact representation of a collection's field values alongside pointers to the full documents on disk. Its sole purpose is to allow MongoDB to locate documents without scanning the entire collection.

Without an index, MongoDB performs a COLLSCAN (Collection Scan) — it reads every document. With an index, MongoDB performs an IXSCAN (Index Scan) — it traverses a B-Tree to find relevant pointers, then fetches only those documents.

Aspect	COLLSCAN (No Index)	IXSCAN (With Index)
Method	Read every document	B-Tree traversal + fetch
Complexity	O(N) — linear	O(log N) — logarithmic
5M documents	5,000,000 reads	~23 B-Tree comparisons
CPU & Disk	High	Minimal

The _id Index

Every MongoDB collection has one index that is created automatically, is always in use, and can never be dropped: the unique ascending index on _id. All queries filtering by _id use this index for free.

// The _id index exists on every collection — you never need to create it
db.users.getIndexes()
// Always returns at minimum: [ { key: { _id: 1 }, name: "_id_", unique: true } ]

NOTE

Creating an index does not change any documents. It creates and maintains a separate, smaller data structure alongside your collection. The original documents are unmodified.

02

How Indexes Work Internally

B-Tree structure · WiredTiger cache · query optimizer

internals

MongoDB's WiredTiger storage engine implements indexes as B-Trees (Balanced Trees). A B-Tree is a self-balancing, sorted tree structure where each node stores multiple keys and child pointers.

B-Tree Properties

Each node stores sorted keys and pointers to children or to documents on disk
Traversal from root to target requires at most O(log N) comparisons
The tree automatically rebalances on every insert, update, and delete — this is the source of the write penalty
For a 5M-document collection: log₂(5,000,000) ≈ 23 steps to find any key

The WiredTiger Cache (Working Set)

For indexes to deliver their O(log N) benefit, they must reside in RAM — specifically the WiredTiger Cache. When an index is too large to fit in cache, MongoDB reads index pages from disk on each query, destroying performance.

// Monitor WiredTiger cache usage
db.serverStatus().wiredTiger.cache

// View index sizes in bytes for a collection
db.collection.stats().indexSizes

// Rule of thumb: all indexes + hot document data (working set)
// should fit within the WiredTiger Cache allocation
// Default: 50% of (RAM - 1GB), min 256MB

The Query Optimizer

When MongoDB receives a query, the query optimizer evaluates all candidate indexes, runs each candidate briefly in a trial ("race"), selects the winner, and caches that plan. The cached plan is used for all subsequent identical queries until it is invalidated.

Event	Effect on Cached Plan
Index created or dropped	All cached plans cleared
Data distribution changes significantly	Plan re-evaluated at next execution
`db.collection.getPlanCache().clear()`	Manual invalidation

03

COLLSCAN vs IXSCAN

Identifying and confirming index usage with explain()

diagnosis

The fundamental diagnostic question is: "Is this query using an index?" The answer is always found in explain().

// Run explain to see the execution plan
db.users.find({ email: "user@example.com" }).explain("executionStats")

// Key fields to check:
// winningPlan.stage        — "IXSCAN" ✅  or  "COLLSCAN" ❌
// winningPlan.indexName    — which index is being used
// totalKeysExamined        — index keys scanned
// totalDocsExamined        — documents fetched from disk
// nReturned                — documents returned to client
// executionTimeMillis      — total query time

Reading the Results

Pattern	Meaning	Action
`COLLSCAN`	No index in use — full scan	Create an index on the filter field(s)
`IXSCAN`, docsExamined ≈ nReturned	Optimal index usage	None needed
`IXSCAN`, docsExamined >> nReturned	Index used but over-scanning	Consider a more selective or compound index
`docsExamined: 0`	Covered query — no disk fetch	Maximum efficiency achieved

Covered Queries

A covered query is one where all fields in the filter AND all fields in the projection exist in the index. MongoDB returns results entirely from the index — zero document fetches from disk.

// Index covers both filter and projection fields
db.users.createIndex({ email: 1, name: 1 })

db.users.find(
  { email: "user@example.com" },  // filter field: in index
  { name: 1, _id: 0 }             // project field: in index; _id excluded
)
// explain() totalDocsExamined: 0 — fully covered

TIP

To confirm a covered query: run .explain("executionStats") and look for totalDocsExamined: 0. This is the highest level of query efficiency available in MongoDB — the data never left the index.

04

Index Selectivity

High cardinality = useful index; low cardinality = possibly ignored

selectivity

Index selectivity is the degree to which an index narrows down the candidate document set. A highly selective index quickly filters to a small result; a low-selectivity index barely reduces the scan count.

High vs Low Selectivity

Field	Cardinality	Selectivity	Index useful?
`email`	One per user — very high	Very high	Yes — always
`userId` (UUID)	Unique — very high	Very high	Yes — always
`category`	20–100 categories	Medium	Often useful
`status`	3–5 values	Low	Combine with another field
`isActive`	Only true/false	Very low	Often ignored by optimizer

WARN

An index on a boolean field like isActive may be ignored by the query optimizer if 90%+ of documents have isActive: true. The optimizer calculates that a COLLSCAN is comparably cheap because the index scan plus document fetch overhead exceeds the marginal savings. Always combine low-selectivity fields with high-selectivity fields in compound indexes.

Improving Selectivity with Compound Indexes

// Instead of a useless standalone boolean index:
db.users.createIndex({ isActive: 1 })   // low selectivity — often ignored

// Combine with a high-cardinality field:
db.users.createIndex({ isActive: 1, createdAt: -1 })
// isActive narrows to ~50% at worst, then createdAt provides precise range

05

The Write Penalty

Every index adds overhead to every write — cumulative cost

tradeoff

Every index must be updated on every write operation — insert, update, delete, and replace. Each update triggers a B-Tree rebalance. A collection with 10 indexes incurs 10 B-Tree updates per insert. This is the fundamental read/write tradeoff of indexing.

// A single insert must update ALL indexes on the collection
// collection with 6 indexes → 6 B-Tree updates per insert
db.orders.insertOne({...})   // updates: _id index + 5 custom indexes

// Identify unused indexes to drop — use $indexStats in production
db.orders.aggregate([{ $indexStats: {} }])
// Fields: name, ops (access count), since (timestamp stat collection began)
// Indexes with ops: 0 are candidates for removal

DANGER

On write-heavy collections (event logging, telemetry, high-frequency updates), excess indexes directly degrade write throughput. Before adding an index, confirm the query it serves is actually slow and frequently executed. A slow query that runs once per day is a worse candidate for indexing than a slightly-slower query that runs 10,000 times per minute.

Hidden Indexes — Safe Removal Workflow

// Step 1: Hide the index — maintained on writes but invisible to query planner
db.collection.hideIndex("index_name")

// Step 2: Monitor query performance for 24-48 hours
// If queries stay fast → index wasn't being used → safe to drop
// If queries slow down → index WAS needed → unhide immediately
db.collection.unhideIndex("index_name")

// Step 3: If confirmed unused, drop it
db.collection.dropIndex("index_name")

TIP

Always use the hidden index workflow before dropping an index in production. It acts as a safe dry run — the index is still maintained (so re-enabling it is instant), but the query planner doesn't use it, revealing its impact on query performance.

06

Management Commands

Create · list · drop · hide · stats

management

// Create a single index
db.collection.createIndex({ field: 1 }, { name: "idx_field" })

// Create multiple indexes in one round-trip
db.collection.createIndexes([
  { key: { field1: 1 },        name: "idx_field1" },
  { key: { field2: -1 },       name: "idx_field2" },
  { key: { email: 1 }, unique: true, name: "idx_email_unique" }
])

// List all indexes on a collection
db.collection.getIndexes()

// Drop a specific index by name
db.collection.dropIndex("index_name")

// Drop a specific index by key specification
db.collection.dropIndex({ field: 1 })

// Drop all indexes (except _id)
db.collection.dropIndexes()

// Hide/unhide — query planner skips it, but it stays maintained
db.collection.hideIndex("index_name")
db.collection.unhideIndex("index_name")

// View index sizes in bytes
db.collection.stats().indexSizes

// Monitor index access counts in production
db.collection.aggregate([{ $indexStats: {} }])

createIndex Options Reference

Option	Type	Description
`name`	String	Custom name. Auto-generated if omitted (verbose for compound indexes — always name them)
`unique`	Boolean	Reject duplicate values
`sparse`	Boolean	Only index docs where the field exists
`expireAfterSeconds`	Number	TTL index — auto-delete docs after N seconds
`partialFilterExpression`	Object	Only index docs matching the filter
`hidden`	Boolean	Create hidden (invisible to query planner)
`collation`	Object	Locale-aware string comparison rules

07

Index Limits & Rules

Hard limits, null handling, and universal principles

reference

Hard Limits

Limit	Value
Maximum indexes per collection	64
Maximum fields in a compound index	32
Maximum index key size	1024 bytes (error in MongoDB 4.4+)
Text indexes per collection	1 (all text fields combined into one)

Missing Field Behaviour by Index Type

Index Type	Document missing the indexed field
Standard (default)	Indexed as `null`
Sparse	Excluded from the index entirely
Partial	Excluded if it doesn't satisfy `partialFilterExpression`
TTL	Document never expires

Universal Index Principles

An index is only used if the indexed field appears in the query filter
For compound indexes, the query must reference a left-prefix of the index fields
Low-selectivity standalone indexes may be ignored by the optimizer — combine with high-cardinality fields
For all indexes to benefit queries, they must fit in the WiredTiger cache (RAM)
Always verify index usage with .explain("executionStats") after creating an index
Use $indexStats in production to identify and remove unused indexes
Direction (1 vs -1) is irrelevant for single-field indexes; critical for compound indexes when sorting