24 · Index Types

01

Single Field Index

Foundational index — one field, any direction

basic

A single field index is the most common index type. It creates a B-Tree on one field of a document (top-level or nested). Direction (1 ascending, -1 descending) is functionally irrelevant for single field indexes — MongoDB traverses the B-Tree in either direction equally.

// Ascending index on email (direction irrelevant for single field)
db.users.createIndex({ email: 1 }, { name: "idx_email" })

// Index on a nested embedded field
db.users.createIndex({ "address.city": 1 }, { name: "idx_city" })

// Unique email constraint
db.users.createIndex({ email: 1 }, { unique: true, name: "idx_email_unique" })

What Single Field Indexes Provide

Equality — { email: "x@y.com" } → IXSCAN
Range — { age: { $gte: 18, $lt: 65 } } → IXSCAN
Sort elimination — if sorting by the indexed field, no in-memory sort needed
Covered queries — combined with projection limiting output to indexed fields

NOTE

MongoDB automatically creates a single field index on _id for every collection. It is unique, ascending, and cannot be dropped or modified.

02

Multikey Index (Array Fields)

One index entry per array element — automatic detection

arrays

When you create an index on a field that contains an array, MongoDB automatically creates a multikey index — storing one separate index entry per array element. This enables efficient queries that search within array fields.

// MongoDB detects the array and builds multikey automatically
db.products.createIndex({ tags: 1 })
// Document: { _id: 1, tags: ["mongodb", "nosql", "database"] }
// Generates 3 index entries: "mongodb"→doc, "nosql"→doc, "database"→doc

// Index on array of embedded documents
db.orders.createIndex({ "items.productId": 1 })

// Query uses the multikey index just like a regular index
db.products.find({ tags: "mongodb" })   // IXSCAN on multikey
db.products.find({ tags: { $in: ["mongodb", "nosql"] } })  // also IXSCAN

Multikey Constraints

DANGER

A compound index cannot contain more than one array field. Attempting createIndex({ tags: 1, categories: 1 }) where both are arrays fails with: "cannot index parallel arrays". Create separate single-field indexes for each array field instead.

Constraint	Detail
Max array fields in compound index	1 — two array fields = error
Covered queries	Not supported — document fetch always required
Shard key	Cannot use a multikey index as a shard key
Write overhead	Each array element = one B-Tree entry — large arrays = many writes

Array Size and Index Performance

// A document with a 1000-element array generates 1000 index entries
// Average array size directly impacts index size and write latency
// Monitor: db.collection.stats().indexSizes

// For very large arrays consider $elemMatch queries and partial indexes
// instead of indexing the entire array field

03

Text Index

Full-text keyword search with stemming and relevance scoring

search

A text index tokenizes string fields into indexed terms, applies language-specific stemming, and filters stop words. It enables full-text keyword and phrase search via the $text operator.

// Create a text index on multiple fields with weights
db.articles.createIndex(
  { title: "text", body: "text", tags: "text" },
  {
    weights:          { title: 10, tags: 5, body: 1 },
    default_language: "english",
    name:             "idx_article_text"
  }
)
// title matches are weighted 10x over body matches in relevance scoring

Querying a Text Index

// Keyword search — "mongodb" OR "indexing" (OR logic between words)
db.articles.find({ $text: { $search: "mongodb indexing" } })

// Exact phrase (double-quoted within the string)
db.articles.find({ $text: { $search: "\"aggregation pipeline\"" } })

// Exclude a term
db.articles.find({ $text: { $search: "mongodb -nosql" } })

// Sort by relevance score
db.articles.find(
  { $text: { $search: "indexing" } },
  { score: { $meta: "textScore" }, title: 1, _id: 0 }
).sort({ score: { $meta: "textScore" } })

Text Search Behavior

Feature	Behavior
Stemming	"running" matches "run", "runs", "runner"
Case sensitivity	Case-insensitive by default
Diacritics	"café" matches "cafe"
Stop words	"the", "is", "and" ignored automatically
Word logic	Space-separated words use OR; use quotes for AND (phrase)

WARN

Only one text index per collection is allowed. All text-searchable fields must be combined into this single index. Creating a second text index on the same collection fails. Also: $text must be in $match at the first pipeline stage in aggregation.

Text Index Limitations

Cannot use range queries or standard sort operations
Significantly larger storage footprint than B-Tree indexes
Text indexes slow down writes more than normal indexes (many term entries per document)
For production search at scale, consider Atlas Search (Lucene-based) instead

04

Hashed Index

Even shard distribution — no range queries

sharding

A hashed index stores the hash of a field's value rather than the value itself. Its primary use is as a shard key to ensure even data distribution across shards, preventing hotspots caused by monotonically increasing fields (ObjectIds, timestamps).

// Create a hashed index
db.users.createIndex({ userId: "hashed" }, { name: "idx_userid_hashed" })

// Use as shard key for even distribution
sh.shardCollection("mydb.users", { userId: "hashed" })
// Documents are distributed based on hash(userId), not userId value
// Result: even spread across shards even for sequential IDs

DANGER

Hashed indexes cannot be used for range queries ($gt, $lt, $gte, $lte) or sort operations because hashing destroys sorted order. A range query on a hashed-indexed field falls back to a COLLSCAN. Only use hashed indexes for shard keys on equality-accessed fields.

Hashed Index Edge Cases

Scenario	Behavior
Floating-point values	Truncated to 64-bit integer before hashing — `2.3` and `2.9` hash identically
Range query on hashed field	COLLSCAN fallback — index not used
Sort on hashed field	In-memory sort required — index not used
Compound with another hashed field	Not supported

Hashed vs Range Shard Key

// Range shard key — sequential inserts all go to the same shard (hotspot)
sh.shardCollection("db.events", { timestamp: 1 })
// ↑ New documents always land on the "last" shard — uneven load

// Hashed shard key — sequential inserts distributed evenly
sh.shardCollection("db.events", { timestamp: "hashed" })
// ↑ Each insert lands on a random shard — balanced writes

05

TTL Index

Auto-delete documents after a time period — up to 60s lag

expiry

A TTL (Time To Live) index is a single-field index on a BSON Date field that instructs MongoDB to automatically delete documents after a specified number of seconds has elapsed past the stored date value.

// Delete sessions 1 hour after createdAt
db.sessions.createIndex(
  { createdAt: 1 },
  { expireAfterSeconds: 3600, name: "idx_session_ttl" }
)

// Delete OTP tokens 5 minutes after generation
db.otpTokens.createIndex(
  { generatedAt: 1 },
  { expireAfterSeconds: 300, name: "idx_otp_ttl" }
)

// Delete at an exact stored datetime
// expireAfterSeconds: 0 → delete when current time reaches the stored date
db.jobs.createIndex(
  { expiresAt: 1 },
  { expireAfterSeconds: 0, name: "idx_job_expiry" }
)
// { expiresAt: ISODate("2024-12-31T23:59:59Z") }
// Document is deleted at/after that exact timestamp

How TTL Deletion Works

A background thread called TTLMonitor runs approximately every 60 seconds and removes documents where the date field value + expireAfterSeconds <= current time.

WARN

TTL deletion is not instantaneous. There is a lag of up to 60+ seconds between a document becoming eligible for expiry and its actual removal. Do not rely on TTL for time-sensitive enforcement in application logic — check expiry in your code as well.

TTL Edge Cases

Scenario	Behavior
Field contains an array of dates	Document expires based on the earliest date in the array
Field is missing from document	Document never expires
Field is not a BSON Date type (e.g., string timestamp)	Document never expires
Replica set secondary	TTL deletions only occur on the primary; secondaries replicate the deletes
Capped collection	TTL indexes are not supported
Under heavy write load	TTLMonitor may fall behind — lag can exceed 60 seconds

06

Wildcard Index

Index all fields in dynamic/unknown schemas

dynamic

A wildcard index indexes every field in a document (or every field under a sub-path). It is designed for collections with highly variable or unknown schemas where queries filter on arbitrary fields.

// Index ALL fields in every document
db.collection.createIndex({ "$**": 1 })

// Index all fields under a specific sub-document path
db.products.createIndex({ "metadata.$**": 1 })

// Queries on any field under metadata now use the index:
db.products.find({ "metadata.color": "red" })
db.products.find({ "metadata.size.width": { $gt: 10 } })

WARN

Wildcard indexes are large and expensive to maintain. Every field in every document generates an index entry. Use them only when queries are truly unpredictable — for known query patterns, a targeted compound index is always more efficient.

Wildcard Index Constraints

Cannot be used as a shard key
Cannot be compound (only one field spec)
Cannot support covered queries
Significantly larger than targeted indexes

07

Index Type Reference

All types at a glance

reference

Index Type	Primary Use	Range Queries	Key Constraint
Single Field	Basic filtering and sorting	Yes	Direction irrelevant for single fields
Compound	Multi-field filter + sort	Yes (prefix)	Field order critical; max 32 fields
Multikey	Indexing array fields	Yes	Only one array field per compound index
Text	Full-text keyword search	No	One per collection; stemming + stop words
Hashed	Shard key for even distribution	No	No range/sort; float truncate issue
TTL	Auto-expiry of documents	Not for expiry	Must be a BSON Date field; ~60s lag
Sparse	Optional fields	Yes	Missing docs excluded; no null queries
Partial	Index a document subset	Yes	Query must satisfy partialFilterExpression
Wildcard	Dynamic/unknown schemas	Yes	Large; no shard key; no covered queries