Examples of Designing Secondary Indexes

These examples illustrate the concepts behind designing your secondary indexes. Although the examples focus on query patterns and does not account for sizing, storage, and updates, you should always weigh the benefits of indexes against these other requirements.

Assume that you have a MapR Database JSON table with the following customer data:

{
   "_id": "10000",
   "FullName": {
      "LastName": "Smith",
      "FirstName": "John"
   },
   "Address": {
      "Street": "123 SE 22nd St.",
      "City": "Oakland",
      "State": "CA",
      "Zipcode": "94601-1001"
   },
   "Gender": "M",
   "AccountBalance": 999.99,
   "Email": "john.smith@company.com",
   "Phones": [
      {"Type": "Home", "Number": "555-555-1234"},
      {"Type": "Mobile", "Number": "555-555-5678"},
      {"Type": "Work", "Number": "555-555-9012"}
   ],
   "Hobbies": ["Baseball", "Cooking", "Reading"],
   "DateOfBirth": "10/1/1985"
}     

The following table contains fields in the document that are candidates for indexing based on the sample queries:

Query # Query Candidate Fields for Indexing

1

Find all customers who were born in the 1970s.
  • DateOfBirth

2

Find all customers who have an account balance greater than $10K. Order the information in descending order of balance.
  • AccountBalance

3

List customers who live in California, ordering the list by LastName, FirstName.

  • Address.State

  • FullName.LastName

  • FullName.FirstName

4

Find the ids and emails of customers who live in a specific zip code.
  • Address.Zip

5

Find customers who live in a specific set of states and have an account balance less than a specific value.
  • Address.State

  • AccountBalance

6

Find male customers with the last name starting with the letter "S".
  • Gender

  • FullName.LastName

7 Find all customers who have "Reading" as a hobby.
  • Hobbies[]
8 Find all customers who have a mobile phone number with a prefix of "650".
  • Phones[].Type
  • Phones[].Number
The following table contains indexes you can create to optimize the queries listed in the previous table and the rationale for doing so:
Index Rationale
Simple index on DateOfBirth

Optimizes the range condition on DateOfBirth in query 1.

You need not create a hashed index, because it is unlikely that the order of DateOfBirth correlates with the insert order of new data.

Simple index on AccountBalance, specified as a descending key
  • Optimizes the range condition on AccountBalance in query 2

  • Descending order of key meets the ordering criteria in query 2

  • Also optimizes the range condition on AccountBalance in query 5 in combination with the index on Address.State

Composite index on:
  • Address.State
  • FullName.LastName
  • FullName.FirstName

  • Optimizes both the equality condition on Address.State and ordering in query 3

  • Inclusion of the name fields in the index meets query 3 ordering

  • Also optimizes the IN condition in query 5 when used in combination with the index on AccountBalance

Simple index with:
  • Indexed field on Address.Zip
  • Included fields on:
    • Id
    • Email
  • Optimizes the equality condition on Address.Zip in query 4

  • Adding the included fields avoids reading the JSON table in query 4

Composite index on:
  • Gender
  • FullName.LastName
  • Optimizes equality condition on Gender and pattern matching condition on FullName.LastName for query 6

  • Specifying Gender as the leading key in combination with FullName.LastName results in more selective index lookups for query 6

Simple index on Hobbies[] Optimizes the equality condition on array elements of Hobbies in query 7:
{"$eq":{"Hobbies[]":"Reading"}}
Composite index on:
  • Phones[].Type
  • Phones[].Number
Optimizes the following two conditions in query 8:
  • Equality condition on the Type subfield in nested documents in the Phones array
  • Pattern matching condition on the Number subfield in nested documents in the Phones array

Example with Multiple Container Field Paths

The following example references documents that store the high and low temperatures for each day in a week. They use an array to store the data, where each element in the array corresponds to a day in the week. For each day of the week, there is a two element array of nested documents. The nested documents indicate whether the temperature corresponds to the high or low for that day. Typically, the outermost array has seven elements, one for each day of the week. But in cases where data is unavailable, the document has only the days available.

{
   "_id": "001",
   "temps": [{"hiLo": [{"type": "hi", "temp": 61}, {"type": "lo", "temp": 49}], "dow": "Sun"},
             {"hiLo": [{"type": "hi", "temp": 74}, {"type": "lo", "temp": 51}], "dow": "Mon"},
             {"hiLo": [{"type": "hi", "temp": 75}, {"type": "lo", "temp": 51}], "dow": "Tue"},
             {"hiLo": [{"type": "hi", "temp": 74}, {"type": "lo", "temp": 52}], "dow": "Wed"},
             {"hiLo": [{"type": "hi", "temp": 78}, {"type": "lo", "temp": 54}], "dow": "Thu"},
             {"hiLo": [{"type": "hi", "temp": 75}, {"type": "lo", "temp": 53}], "dow": "Fri"},
             {"hiLo": [{"type": "hi", "temp": 75}, {"type": "lo", "temp": 54}], "dow": "Sat"}],
   "weekOf": "4/29/2018"
}
{
   "_id": "002",
   "temps": {"hiLo": [{"type": "hi","temp": 81}, {"type": "lo", "temp": 60}], "dow":"Sat"},
   "weekOf": "5/12/2018"
}
{
   "_id": "003",
   "temps": [{"hiLo": [{"type": "hi", "temp": 80}, {"type": "lo", "temp": 55}], "dow": "Sun"},
             {"hiLo": [{"type": "hi", "temp": 78}, {"type": "lo", "temp": 54}], "dow": "Mon"},
             {"hiLo": [{"type": "hi", "temp": 79}, {"type": "lo", "temp": 54}], "dow": "Tue"},
             {"hiLo": [{"type": "hi", "temp": 77}, {"type": "lo", "temp": 53}], "dow": "Wed"},
             {"hiLo": [{"type": "hi", "temp": 79}, {"type": "lo", "temp": 54}], "dow": "Thu"},
             {"hiLo": [{"type": "hi", "temp": 77}, {"type": "lo", "temp": 54}], "dow": "Fri"},
             {"hiLo": [{"type": "hi", "temp": 78}, {"type": "lo", "temp": 54}], "dow": "Sat"}],
   "weekOf": "5/13/2018"
}

Suppose you frequently run the following queries:

Query Description Documents Returned
find /apps/hiLoTemps
    --f weekOf
    --c '{"$eq":{"temps[].hiLo[].temp":60}}'
Find weeks where any day has either a high or low temperature of 60. 002
find /apps/hiLoTemps
    --f weekOf,temps[].hiLo[].type,temps[].hiLo[].temp
    --c 
        '{
            "$elementAnd":{
                "temps[].hiLo[]":[
                    {"$eq":{"type":"hi"}},
                    {"$ge":{"temp":80}}
                ]
            }
        }'
Find weeks and the high/low temperatures for all days on those weeks where any day of the week has a high temperature of at least 80. 002, 003

To optimize these queries, you should define an index with the following fields:

  • Indexed fields: temps[].hiLo[].temp, temps[].hiLo[].type
  • Included fields: weekOf, temps[].hiLo

By defining the composite index with temps[].hiLo[].temp as the first indexed field, the index can optimize both queries.

By adding weekOf as an included field, the index is a covering index for the first query. By adding temps[].hiLo, the index becomes a covering index for the second query as well. Note that you must add this included field even though the subfields are also indexed fields. This is due to how indexes with container field paths store data. See Covering Indexes and Container Field Paths for more details.