Examples of Designing Secondary Indexes

These examples illustrate the concepts behind designing your secondary indexes. Although the examples focus on query patterns and does not account for sizing, storage, and updates, you should always weigh the benefits of indexes against these other requirements.

Assume that you have a MapR-DB JSON table with the following customer data:

{
   "_id": "10000",
   "FullName": {
      "LastName": "Smith",
      "FirstName": "John"
   },
   "Address": {
      "Street": "123 SE 22nd St.",
      "City": "Oakland",
      "State": "CA",
      "Zipcode": "94601-1001"
   },
   "Gender": "M",
   "AccountBalance": 999.99,
   "Email": "john.smith@company.com",
   "Phones": [
      {"Type": "Home", "Number": "555-555-1234"},
      {"Type": "Mobile", "Number": "555-555-5678"},
      {"Type": "Work", "Number": "555-555-9012"}
   ],
   "Hobbies": ["Baseball", "Cooking", "Reading"],
   "DateOfBirth": "10/1/1985"
}     

The following table contains fields in the document that are candidates for indexing based on the sample queries:

Query # Query Candidate Fields for Indexing

1

Find all customers who were born in the 1970s.
  • DateOfBirth

2

Find all customers who have an account balance greater than $10K. Order the information in descending order of balance.
  • AccountBalance

3

List customers who live in California, ordering the list by LastName, FirstName.

  • Address.State

  • FullName.LastName

  • FullName.FirstName

4

Find the ids and emails of customers who live in a specific zip code.
  • Address.Zip

5

Find customers who live in a specific set of states and have an account balance less than a specific value.
  • Address.State

  • AccountBalance

6

Find male customers with the last name starting with the letter "S".
  • Gender

  • FullName.LastName

The following table contains indexes you can create to optimize the queries listed in the previous table and the rationale for doing so:
Index Rationale
Simple index on DateOfBirth

Optimizes the range condition on DateOfBirth in query 1.

You need not create a hashed index, because it is unlikely that the order of DateOfBirth correlates with the insert order of new data.

Simple index on AccountBalance, specified as a descending key
  • Optimizes the range condition on AccountBalance in query 2

  • Descending order of key meets the ordering criteria in query 2

  • Also optimizes the range condition on AccountBalance in query 5 in combination with the index on Address.State

Composite index on:
  • Address.State
  • FullName.LastName
  • FullName.FirstName

  • Optimizes both the equality condition on Address.State and ordering in query 3

  • Inclusion of the name fields in the index meets query 3 ordering

  • Also optimizes the IN condition in query 5 when used in combination with the index on AccountBalance

Simple index with:
  • Indexed field on Address.Zip
  • Included fields on:
    • Id
    • Email
  • Optimizes the equality condition on Address.Zip in query 4

  • Adding the included fields avoids reading the JSON table in query 4

Composite index on:
  • Gender
  • FullName.LastName
  • Optimizes equality condition on Gender and pattern matching condition on FullName.LastName for query 6

  • Specifying Gender as the leading key in combination with FullName.LastName results in more selective index lookups for query 6