Indexing and query optimization

  1. Indexes in MongoDB are very important.
  2. Indexes that use more than one key like this are called compound indexes.
  3. The order of keys in a compound index matters.
  4. Aa query where one term demands an exact match and another specifies a range requires a compound index where the range key comes second.

For example:

If we have this query, then the manufacturer shoud be the first index.

db.products.find({
  "manufacturer": "Acme",
  "pricing": { $lt: 7500 }
})
db.users.createIndex({ manufacturer: 1, pricing: 1 })
  1. With extra indexes in place, more RAM will be required to maintain those indexes
  2. A covering index is one where the entire query can be satisfied from reading only the index, making queries very fast.
  3. To create a unique index, specify the unique option:
db.users.createIndex({username: 1}, {unique: true})
  1. If you need a unique index on a collection, it’s usually best to create the index before inserting any data. If you create the index in advance, you guarantee the uniqueness constraint from the start.

Create Index

db.users.createIndex(
  {
    username: 1,
    // Multikey hashed indexes aren’t allowed.
    recipe_name: "hashed"
  },
  {
    background: true,
    // In a sparse index, only those documents having some value for the indexed key will appear.
    sparse: true,
    // If you need a unique index on a collection, it’s usually best to create the index before inserting any data. If you create the index in advance, you guarantee the uniqueness constraint from the start.
    unique: true
  }
)

Get Indexes

db.users.getIndexes()

Drop Index

db.users.dropIndex("")

Get in-progress operations

db.currentOp()

Examining slow queries

explain()

Replication

Replication provides data protection, high availability, and disaster recovery.

Oplog

The oplog is a capped record of recent operations performed by the system, saved in the log in order to facilitate the repetition of any of those operations in the future; replicas sync via the oplog.

mongod --replSet myapp --dbpath ~/node1 --port 40000
mongod --replSet myapp --dbpath ~/node2 --port 40001
mongod --replSet myapp --dbpath ~/arbiter --port 40002
mongo --port 40000

rs.add("iron.local:40001")
rs.add("iron.local:40002", {arbiterOnly: true})

rs.initiate()

db.isMaster()

rs.status()

Get the current replication information

db.getReplicationInfo()

Change the default oplog size

mongod --replSet myapp --oplogSize 1024

Sharding

What is Sharding in MongoDB?

Sharding in MongoDB is designed to do just that: partition your database into smaller pieces so that no single machine has to store all the data or handle the entire load.

The components of a Shard:

  • Config-Server: Deployed as a replica-set, config-servers track state about which servers contain what parts of a sharded collection.

  • Mongos/Router server: These servers are individual instances that do not store data locally. Instead, they query individual shards using cached state from the config-servers, as needed.

  • Shard-Server: These are the MongoDB instances that actually store collection data. Shards can be deployed as standalone instances or as a replica-set (the latter is highly recommended in production!).

Deployment of a Sharded Cluster

$ mongod --configsvr --dbpath /data/config --port 27018

Commands

// Add Shards to the Cluster
sh.addShard( "<replSetName>/s1-mongo1.example.net:27017")

// Enable sharding for the collection
sh.enableSharding("<database>")

// Hashing the shard key
sh.shardCollection("<database>.<collection>", { "<key>" : "<direction>" } )

// Enable balancer
sh.enableBalancing("collection")
sh.isBalancerRunning()

// Shard distribution
db.collection.getShardDistribution()

// Shard Status
sh.status()

Remember

  1. A collection cannot be un-sharded once it has been sharded! Be sure you want to proceed before doing so.
  2. You cannot change the shard-key for a sharded collection!
  3. Backing up a sharded cluster is more complicated than a non-sharded cluster.

Text search

  • First, you define the indexes needed for text searching.
  • Then, you’ll use text search in both the basic queries as well as aggregation framework.

Defining text search indexes

db.books.createIndex(
  {
    // Specify fields to be text-indexed.
    title: "text",
    shortDescription: "text",
    longDescription: "text",
    authors: "text",
    categories: "text"
  },
  {
    // Optionally specify weights for each field.
    weights: {
      title: 10,
      shortDescription: 1,
      longDescription:1,
      authors: 1,
      categories: 5
    },
    // User-defined index name.
    name : "books_text_index"
  }
);
db.products.find({
  $text: { $search: "gardens" }},
  { _id: 0, name: 1, description: 1, tags:1, score: { $meta: "textScore" } })
  .sort({ score: { $meta: "textScore" } })
  .pretty()
db.books.aggregate(
  [
    { $match: { $text: { $search: "mongodb in action" } } },
    { $project: { title: 1, score: { $meta: "textScore" } } },
    { $sort: { score: { $meta: "textScore" } } }
  ]
)
db.books.aggregate(
  [
    { $match: { $text: { $search: "mongodb in action" } } },
    { $project: {
        title: 1,
        score: { $meta: "textScore" },
        // Calculate multiplier: 3.0 if longDescription doesn’t exist.
        multiplier: { $cond: ["$longDescription", 1.0, 3.0] } }
    },
    {
      $project: {
        _id: 0,
        title: 1,
        score: 1,
        multiplier: 1,
        // Calculate adjusted score: score * multiplier.
        adjScore: { $multiply: ["$score", "$multiplier"] }
      }
    },
    { $sort: { adjScore: -1 } }
  ]
)

##

db.books.stats()