← back

Indexes
Introduction

FILE  23_indexes_intro
TOPIC  COLLSCAN vs IXSCAN · B-Tree · Selectivity · Write Penalty · Management
LEVEL  Foundation
01
What is an Index
A sorted data structure that maps field values to document locations
concept

An index in MongoDB is a supplementary data structure that stores a sorted, compact representation of a collection's field values alongside pointers to the full documents on disk. Its sole purpose is to allow MongoDB to locate documents without scanning the entire collection.

Without an index, MongoDB performs a COLLSCAN (Collection Scan) — it reads every document. With an index, MongoDB performs an IXSCAN (Index Scan) — it traverses a B-Tree to find relevant pointers, then fetches only those documents.

AspectCOLLSCAN (No Index)IXSCAN (With Index)
MethodRead every documentB-Tree traversal + fetch
ComplexityO(N) — linearO(log N) — logarithmic
5M documents5,000,000 reads~23 B-Tree comparisons
CPU & DiskHighMinimal

The _id Index

Every MongoDB collection has one index that is created automatically, is always in use, and can never be dropped: the unique ascending index on _id. All queries filtering by _id use this index for free.

// The _id index exists on every collection — you never need to create it
db.users.getIndexes()
// Always returns at minimum: [ { key: { _id: 1 }, name: "_id_", unique: true } ]
NOTE
Creating an index does not change any documents. It creates and maintains a separate, smaller data structure alongside your collection. The original documents are unmodified.
02
How Indexes Work Internally
B-Tree structure · WiredTiger cache · query optimizer
internals

MongoDB's WiredTiger storage engine implements indexes as B-Trees (Balanced Trees). A B-Tree is a self-balancing, sorted tree structure where each node stores multiple keys and child pointers.

B-Tree Properties

  • Each node stores sorted keys and pointers to children or to documents on disk
  • Traversal from root to target requires at most O(log N) comparisons
  • The tree automatically rebalances on every insert, update, and delete — this is the source of the write penalty
  • For a 5M-document collection: log₂(5,000,000) ≈ 23 steps to find any key

The WiredTiger Cache (Working Set)

For indexes to deliver their O(log N) benefit, they must reside in RAM — specifically the WiredTiger Cache. When an index is too large to fit in cache, MongoDB reads index pages from disk on each query, destroying performance.

// Monitor WiredTiger cache usage
db.serverStatus().wiredTiger.cache

// View index sizes in bytes for a collection
db.collection.stats().indexSizes

// Rule of thumb: all indexes + hot document data (working set)
// should fit within the WiredTiger Cache allocation
// Default: 50% of (RAM - 1GB), min 256MB

The Query Optimizer

When MongoDB receives a query, the query optimizer evaluates all candidate indexes, runs each candidate briefly in a trial ("race"), selects the winner, and caches that plan. The cached plan is used for all subsequent identical queries until it is invalidated.

EventEffect on Cached Plan
Index created or droppedAll cached plans cleared
Data distribution changes significantlyPlan re-evaluated at next execution
db.collection.getPlanCache().clear()Manual invalidation
03
COLLSCAN vs IXSCAN
Identifying and confirming index usage with explain()
diagnosis

The fundamental diagnostic question is: "Is this query using an index?" The answer is always found in explain().

// Run explain to see the execution plan
db.users.find({ email: "user@example.com" }).explain("executionStats")

// Key fields to check:
// winningPlan.stage        — "IXSCAN" ✅  or  "COLLSCAN" ❌
// winningPlan.indexName    — which index is being used
// totalKeysExamined        — index keys scanned
// totalDocsExamined        — documents fetched from disk
// nReturned                — documents returned to client
// executionTimeMillis      — total query time

Reading the Results

PatternMeaningAction
COLLSCANNo index in use — full scanCreate an index on the filter field(s)
IXSCAN, docsExamined ≈ nReturnedOptimal index usageNone needed
IXSCAN, docsExamined >> nReturnedIndex used but over-scanningConsider a more selective or compound index
docsExamined: 0Covered query — no disk fetchMaximum efficiency achieved

Covered Queries

A covered query is one where all fields in the filter AND all fields in the projection exist in the index. MongoDB returns results entirely from the index — zero document fetches from disk.

// Index covers both filter and projection fields
db.users.createIndex({ email: 1, name: 1 })

db.users.find(
  { email: "user@example.com" },  // filter field: in index
  { name: 1, _id: 0 }             // project field: in index; _id excluded
)
// explain() totalDocsExamined: 0 — fully covered
TIP
To confirm a covered query: run .explain("executionStats") and look for totalDocsExamined: 0. This is the highest level of query efficiency available in MongoDB — the data never left the index.
04
Index Selectivity
High cardinality = useful index; low cardinality = possibly ignored
selectivity

Index selectivity is the degree to which an index narrows down the candidate document set. A highly selective index quickly filters to a small result; a low-selectivity index barely reduces the scan count.

High vs Low Selectivity

FieldCardinalitySelectivityIndex useful?
emailOne per user — very highVery highYes — always
userId (UUID)Unique — very highVery highYes — always
category20–100 categoriesMediumOften useful
status3–5 valuesLowCombine with another field
isActiveOnly true/falseVery lowOften ignored by optimizer
WARN
An index on a boolean field like isActive may be ignored by the query optimizer if 90%+ of documents have isActive: true. The optimizer calculates that a COLLSCAN is comparably cheap because the index scan plus document fetch overhead exceeds the marginal savings. Always combine low-selectivity fields with high-selectivity fields in compound indexes.

Improving Selectivity with Compound Indexes

// Instead of a useless standalone boolean index:
db.users.createIndex({ isActive: 1 })   // low selectivity — often ignored

// Combine with a high-cardinality field:
db.users.createIndex({ isActive: 1, createdAt: -1 })
// isActive narrows to ~50% at worst, then createdAt provides precise range
05
The Write Penalty
Every index adds overhead to every write — cumulative cost
tradeoff

Every index must be updated on every write operation — insert, update, delete, and replace. Each update triggers a B-Tree rebalance. A collection with 10 indexes incurs 10 B-Tree updates per insert. This is the fundamental read/write tradeoff of indexing.

// A single insert must update ALL indexes on the collection
// collection with 6 indexes → 6 B-Tree updates per insert
db.orders.insertOne({...})   // updates: _id index + 5 custom indexes

// Identify unused indexes to drop — use $indexStats in production
db.orders.aggregate([{ $indexStats: {} }])
// Fields: name, ops (access count), since (timestamp stat collection began)
// Indexes with ops: 0 are candidates for removal
DANGER
On write-heavy collections (event logging, telemetry, high-frequency updates), excess indexes directly degrade write throughput. Before adding an index, confirm the query it serves is actually slow and frequently executed. A slow query that runs once per day is a worse candidate for indexing than a slightly-slower query that runs 10,000 times per minute.

Hidden Indexes — Safe Removal Workflow

// Step 1: Hide the index — maintained on writes but invisible to query planner
db.collection.hideIndex("index_name")

// Step 2: Monitor query performance for 24-48 hours
// If queries stay fast → index wasn't being used → safe to drop
// If queries slow down → index WAS needed → unhide immediately
db.collection.unhideIndex("index_name")

// Step 3: If confirmed unused, drop it
db.collection.dropIndex("index_name")
TIP
Always use the hidden index workflow before dropping an index in production. It acts as a safe dry run — the index is still maintained (so re-enabling it is instant), but the query planner doesn't use it, revealing its impact on query performance.
06
Management Commands
Create · list · drop · hide · stats
management
// Create a single index
db.collection.createIndex({ field: 1 }, { name: "idx_field" })

// Create multiple indexes in one round-trip
db.collection.createIndexes([
  { key: { field1: 1 },        name: "idx_field1" },
  { key: { field2: -1 },       name: "idx_field2" },
  { key: { email: 1 }, unique: true, name: "idx_email_unique" }
])

// List all indexes on a collection
db.collection.getIndexes()

// Drop a specific index by name
db.collection.dropIndex("index_name")

// Drop a specific index by key specification
db.collection.dropIndex({ field: 1 })

// Drop all indexes (except _id)
db.collection.dropIndexes()

// Hide/unhide — query planner skips it, but it stays maintained
db.collection.hideIndex("index_name")
db.collection.unhideIndex("index_name")

// View index sizes in bytes
db.collection.stats().indexSizes

// Monitor index access counts in production
db.collection.aggregate([{ $indexStats: {} }])

createIndex Options Reference

OptionTypeDescription
nameStringCustom name. Auto-generated if omitted (verbose for compound indexes — always name them)
uniqueBooleanReject duplicate values
sparseBooleanOnly index docs where the field exists
expireAfterSecondsNumberTTL index — auto-delete docs after N seconds
partialFilterExpressionObjectOnly index docs matching the filter
hiddenBooleanCreate hidden (invisible to query planner)
collationObjectLocale-aware string comparison rules
07
Index Limits & Rules
Hard limits, null handling, and universal principles
reference

Hard Limits

LimitValue
Maximum indexes per collection64
Maximum fields in a compound index32
Maximum index key size1024 bytes (error in MongoDB 4.4+)
Text indexes per collection1 (all text fields combined into one)

Missing Field Behaviour by Index Type

Index TypeDocument missing the indexed field
Standard (default)Indexed as null
SparseExcluded from the index entirely
PartialExcluded if it doesn't satisfy partialFilterExpression
TTLDocument never expires

Universal Index Principles

  • An index is only used if the indexed field appears in the query filter
  • For compound indexes, the query must reference a left-prefix of the index fields
  • Low-selectivity standalone indexes may be ignored by the optimizer — combine with high-cardinality fields
  • For all indexes to benefit queries, they must fit in the WiredTiger cache (RAM)
  • Always verify index usage with .explain("executionStats") after creating an index
  • Use $indexStats in production to identify and remove unused indexes
  • Direction (1 vs -1) is irrelevant for single-field indexes; critical for compound indexes when sorting