← back

Backup
& Restore

FILE  37_backup_restore
TOPIC  mongodump · mongorestore · Snapshots · oplog · Atlas Backup · Strategy
LEVEL  Intermediate/Advanced
01
Backup Overview
Three backup approaches — logical, filesystem, cloud
overview
MethodToolConsistencySpeedProsCons
LogicalmongodumpPer-collection (point-in-time with --oplog)Slow (reads data)Portable BSON; selective restore; no version lockSlow on large datasets; locks not held across collections
Filesystem snapshotLVM/EBS snapshotConsistent if journal on same volumeFast (block-level)Instant; works at TB scale; exact copyPlatform-specific; entire volume only; requires journal on same volume
Cloud managedAtlas BackupFully consistent; point-in-timeContinuous (automatic)No operational overhead; PITR; snapshot schedulingAtlas only; costs extra
WARN
Never copy raw data files while mongod is running without first locking or using a consistent snapshot mechanism. WiredTiger data files may be in a partially-written state. Always use mongodump, a proper filesystem snapshot with journal on same volume, or MongoDB Cloud Backup for any backup you intend to restore from.

Required Roles for Backup Operations

// Minimum roles for mongodump user:
db.grantRolesToUser("backupUser", [
  { role: "backup",  db: "admin" },    // read all data + list collections
])

// Minimum roles for mongorestore user:
db.grantRolesToUser("restoreUser", [
  { role: "restore", db: "admin" },    // write all data + create collections
])
02
mongodump
Logical backup — exports BSON + JSON metadata
dump
// Basic dump — entire cluster to ./dump/ directory
mongodump --uri "mongodb://user:pass@host:27017/?authSource=admin"

// Dump specific database
mongodump --uri "..." --db shopDb --out /backups/2024-03-01/

// Dump specific collection
mongodump --uri "..." --db shopDb --collection orders --out /backups/

// Dump with query filter (partial backup)
mongodump --uri "..." --db shopDb --collection orders \
  --query '{ "status": "completed", "year": 2024 }' \
  --out /backups/completed-orders/

// Compressed dump (gzip) — reduces size significantly
mongodump --uri "..." --gzip --out /backups/compressed/

// Archive mode — single file instead of directory tree
mongodump --uri "..." --gzip --archive=/backups/full-2024-03-01.gz

// Dump to stdout (pipe directly to S3)
mongodump --uri "..." --archive | gzip | aws s3 cp - s3://my-bucket/backup.gz

// Consistent backup with --oplog (captures oplog during dump for PITR)
// Run against PRIMARY of a replica set:
mongodump --uri "..." --oplog --out /backups/consistent/
// Creates dump/oplog.bson containing oplog entries applied during the backup
// Use mongorestore --oplogReplay to apply these entries after restore

// Dump specific collections with field exclusion (projection)
mongodump --uri "..." --db shopDb --collection users \
  --excludeFieldsForExport "password,creditCard" \
  --out /backups/redacted/

mongodump Limitations

WARN
mongodump is not consistent across collections by default — it dumps each collection sequentially, so data may have changed between the orders dump and the customers dump. Use --oplog on a replica set to achieve point-in-time consistency. Do NOT use mongodump for sharded clusters without proper coordination — use mongodump on individual shard replica sets, or use Atlas Backup/filesystem snapshots instead.
03
mongorestore
Restore from mongodump output
restore
// Restore entire dump directory
mongorestore --uri "mongodb://user:pass@host:27017/?authSource=admin" \
  /backups/2024-03-01/

// Restore to different database name (rename on restore)
mongorestore --uri "..." \
  --nsFrom "shopDb.*" --nsTo "shopDb_restored.*" \
  /backups/2024-03-01/dump/shopDb/

// Drop collections before restore (ensures clean state)
mongorestore --uri "..." --drop /backups/2024-03-01/

// Restore from compressed archive
mongorestore --uri "..." --gzip --archive=/backups/full-2024-03-01.gz

// Restore from stdin (from S3)
aws s3 cp s3://my-bucket/backup.gz - | mongorestore --uri "..." --gzip --archive

// Restore specific collection only
mongorestore --uri "..." --nsInclude "shopDb.orders" /backups/dump/

// Exclude collections during restore
mongorestore --uri "..." --nsExclude "shopDb.sessions" /backups/dump/

// Replay oplog for point-in-time consistency
// (use with dumps created using --oplog)
mongorestore --uri "..." --oplogReplay /backups/consistent/

// Replay oplog up to specific timestamp (partial rollforward)
mongorestore --uri "..." --oplogReplay \
  --oplogLimit "1709337600:1" \   // Unix timestamp:ordinal
  /backups/consistent/

// Control parallelism (default 4; increase for faster restore)
mongorestore --uri "..." --numParallelCollections 8 /backups/dump/

// Bypass document validation (if schema validation is applied)
mongorestore --uri "..." --bypassDocumentValidation /backups/dump/
TIP
Restoring to a non-production cluster first (a staging restore) to verify data integrity before applying to production is the safest approach. Use --nsTo to restore to a different database name within the same cluster for verification without touching production data.
04
Oplog-Based Point-in-Time
Use the oplog to replay writes between a snapshot and a point in time
PITR

Point-in-Time Recovery (PITR) uses a base snapshot plus the replica set oplog to replay operations up to an exact timestamp. This enables recovery to any second within the oplog window — critical for "recover to just before the accidental delete" scenarios.

// PITR Workflow:
// 1. Take base backup with --oplog (captures oplog during dump)
mongodump --uri "..." --oplog --out /backups/base-20240301T0000/

// 2. Separately, continuously backup the oplog
//    (or collect separate oplog archive at intervals)
mongodump --uri "..." --db local --collection oplog.rs \
  --query '{ "ts": { "$gt": { "$timestamp": { "t": 1709251200, "i": 1 } } } }' \
  --out /backups/oplog-incremental/

// 3. Restore base backup
mongorestore --uri "..." --drop --oplogReplay /backups/base-20240301T0000/

// 4. Apply incremental oplog up to target timestamp
//    Target: 1709337600 = 2024-03-01 08:00:00 UTC (just before the bad operation)
mongorestore --uri "..." --oplogReplay \
  --oplogLimit "1709337600:1" \
  /backups/oplog-incremental/

// Check the oplog timestamp of a specific event (to find your target):
use local
db.oplog.rs.find({ "o.dropCollection": { $exists: true } })
  .sort({ ts: -1 }).limit(5)
// Look at 'ts' field: Timestamp(1709337595, 1) → use 1709337594:1 to stop just before
NOTE
The oplog window — how far back you can replay — is limited by the oplog size on the replica set. A default oplog window of a few days is typical; size it to cover your longest expected maintenance window plus recovery time. Use rs.printReplicationInfo() to check the current window.
05
Filesystem Snapshots
Block-level backup for large datasets
snapshot

Filesystem snapshots (LVM, AWS EBS, Azure Managed Disk) copy the block device at a point in time. For MongoDB with WiredTiger, the journal and data files must be on the same volume for a consistent snapshot — WiredTiger uses the journal to bring data to a consistent state on recovery.

// Verify journal and data are on the same volume (required for snapshot consistency)
// mongod.conf should have dbPath and storage.wiredTiger.engineConfig.journalPath
// on the SAME mountpoint

// Option A: flush and lock (safer, brief pause)
// Connect to a SECONDARY (to avoid pausing the primary)
use admin
db.fsyncLock()              // flush writes to disk + lock further writes
// NOW take the filesystem snapshot (EBS, LVM, etc.)
db.fsyncUnlock()            // resume writes (run immediately after snapshot initiated)

// Option B: snapshot secondary (preferred — no production impact)
// 1. Identify a secondary that is caught up
rs.printSecondaryReplicationInfo()  // confirm lag is near 0
// 2. Take filesystem snapshot of that secondary's data volume
// 3. No lock needed — WiredTiger journal provides consistency

// AWS EBS snapshot (run from EC2 or AWS CLI after fsync+lock):
// aws ec2 create-snapshot --volume-id vol-xxx --description "mongo-backup-$(date +%Y%m%d)"

// Restore from snapshot:
// 1. Stop mongod on target server
// 2. Restore data volume from snapshot
// 3. Start mongod — WiredTiger replays journal to reach consistent state
// 4. mongod joins replica set and syncs missing oplog entries automatically
TIP
Always snapshot a secondary, not the primary. A secondary snapshot causes no production disruption. Once restored, the reinstated node will sync missing oplog entries from the replica set automatically if the oplog window covers the gap. This is the recommended approach for TB-scale databases.
06
Atlas Backup
Continuous cloud backup with point-in-time restore
atlas

MongoDB Atlas provides two backup tiers: Cloud Backup (snapshot-based) and Continuous Cloud Backup (point-in-time). Both are managed, requiring no operational setup.

FeatureCloud BackupContinuous Cloud Backup
MechanismScheduled snapshots (hourly/daily/weekly/monthly)Continuous oplog tailing + base snapshots
Recovery granularityNearest scheduled snapshotAny second within the retention window
RetentionConfigurable up to 12 monthsConfigurable (oplog window)
Restore targetNew cluster, same cluster, downloadNew cluster, same cluster, specific collection
CostBased on storage usedHigher (oplog storage included)
// Atlas backup is configured in the Atlas UI / Atlas CLI — not mongosh

// Atlas CLI: list snapshots
atlas backups snapshots list --clusterName MyCluster --projectId <id>

// Trigger an on-demand snapshot:
atlas backups snapshots create MyCluster --desc "pre-migration snapshot"

// Create restore job from snapshot to new cluster:
atlas backups restores start automated \
  --clusterName MyCluster \
  --snapshotId <snapshotId> \
  --targetClusterName MyClusterRestored \
  --targetProjectId <targetProjectId>

// Point-in-time restore (continuous backup only):
atlas backups restores start pointInTime \
  --clusterName MyCluster \
  --pointInTimeUTCSeconds 1709337600 \
  --targetClusterName MyClusterPITR

// Download backup as compressed archive (restores to local mongod):
atlas backups restores start download \
  --clusterName MyCluster \
  --snapshotId <snapshotId>
07
Backup Strategy
Recovery objectives, testing, and production checklist
strategy

RPO & RTO

TermDefinitionImpacts
RPO (Recovery Point Objective)Max acceptable data loss — how old can the backup be?Backup frequency, PITR requirement
RTO (Recovery Time Objective)Max acceptable downtime — how fast must restore complete?Backup format (logical vs snapshot), pre-warmed standby

Backup Selection by Use Case

Use CaseRecommended Approach
Small dev/staging databasemongodump with --gzip to S3, daily
Production < 100GB, self-managedmongodump --oplog + S3, secondary snapshot weekly
Production > 100GB, self-managedEBS/LVM snapshot of secondary, hourly + mongodump for logical portability
Atlas clusterAtlas Cloud Backup with Continuous mode for PITR
Compliance (HIPAA/SOC2)Atlas Continuous Backup + audit log backup + encrypted at rest
// Backup verification checklist (automate this as a scheduled job):
// 1. Restore most recent backup to a staging/verify cluster
// 2. Run document count verification on critical collections
db.orders.countDocuments()
db.users.countDocuments()
// 3. Run spot checks on specific documents
// 4. Check oplog replay integrity (if PITR backup)
// 5. Record restore duration → validates RTO is achievable
// 6. FAIL LOUDLY if restore fails — backups that haven't been tested don't exist

// Production backup checklist:
// ✓ Backup frequency meets RPO (hourly PITR for RPO < 1 hour)
// ✓ Backups stored offsite / in different cloud region
// ✓ Backup encryption at rest enabled
// ✓ Backups tested monthly via restore-and-verify
// ✓ Oplog window >= backup interval + expected recovery time
// ✓ Backup user has minimum required roles (backup/restore only)
// ✓ Restore runbook documented and accessible during incident
DANGER
An untested backup is not a backup. Many teams discover their backups are corrupted, incomplete, or too slow to restore only during an actual incident. Schedule a monthly restoration drill: restore to a temporary cluster, verify document counts, then discard. This is the only way to know your backup works.