How Partitions Are Chosen for Messages

Because the number of partitions in a topic can change over time, producers regularly refresh the information that they have about the topics that they know of. This interval is controlled by the metadata.max.age.ms configuration parameter.

Partitions of a topic are identified by their index number. For example, if a topic has four partitions, their IDs are 0, 1, 2, and 3.

There are three different ways of choosing a partition for a message:
  • If the producer specifies a partition ID or if the StreamsPartitioner interface specifies one, the MapR Streams server publishes the message to the partition specified.
  • If the producer does not specify a partition ID but provides a key, the MapR Streams server hashes the key and sends the message to the partition that corresponds to the hash.
  • If neither a partition ID nor a key is specified, the MapR Streams server sends messages in a sticky round-robin fashion. For any given topic, the server randomly chooses an initial partition. For example, suppose that for topic traffic_sensors, the server chooses the partition with the ID 1. The server accumulates enough messages for an RPC of optimal size and sends the batch of messages to partition 1. The server then does the same with partition 2, and so on, eventually returning to partition 1.