Handling duplicate records in MongoDB depends on whether you want to prevent them in the first place or remove them after they exist. Here are some approaches:
1. Preventing Duplicates Using a Unique Index (Best Approach)
A unique index ensures that duplicates are not inserted.
Example: Prevent duplicate emails in a users collection
db.users.createIndex({ email: 1 }, { unique: true })
2. Finding Duplicate Records
To identify duplicates, use $group to count occurrences.
Example: Find duplicate emails
db.users.aggregate([
{ $group: { _id: "$email", count: { $sum: 1 } } },
{ $match: { count: { $gt: 1 } } }
])
3. Removing Duplicates
Method 1: Delete All Duplicates Except One
db.users.aggregate([
{ $group: { _id: "$email", ids: { $push: "$_id" }, count: { $sum: 1 } } },
{ $match: { count: { $gt: 1 } } }
]).forEach(doc => {
doc.ids.shift(); // Keep one record
db.users.deleteMany({ _id: { $in: doc.ids } });
});
Method 2: Remove Duplicates Using $addToSet
If merging data, you can use $addToSet to store unique values.
db.users.aggregate([
{ $group: { _id: "$email", uniqueDocs: { $addToSet: "$$ROOT" } } }
]);
4. Handling Duplicates in Bulk Insert
If inserting many documents, use { ordered: false } to skip duplicates instead of failing.
db.users.insertMany([...], { ordered: false })