Configuration Parameters

This topic describes configuration parameters that are either specific to MapR-ES or supported from Apache Kafka.

Table 1. AdminClient configuration parameters specific to MapR-ES
Parameter Description
streams.admin.default.stream This parameter, when set during creation of the AdminClient instance, ensures that the specified stream is using the the AdminClient instance for all administrative operations.

Syntax:

/mapr/<cluster name>/<volume name>/<stream name>
streams.rpc.timeout.ms Specifies the length of time in milliseconds to wait for a response from the MapR-ES server if soft mount is configured (fs.mapr.hardmount is set to false). Default: 120000 Minimum: 30000
Note: Applicable as of MapR 6.0.1, is used instead of fs.mapr.rpc.timeout

Table 2. Consumer configuration parameters specific to MapR-ES
Parameter Description
streams.consumer.buffer.memory Specifies how much memory to use for caching pre-fetched messages. Messages that are in subscribed topics and partitions are pre-fetched and cached to improve performance. Default 64MB
streams.consumer.default.stream Specifies the path and name of the stream that the consumer subscribes to if, when subscribing to a topic, the consumer does not specify a stream.

This default value is also used for the KafkaConsumer.listTopics() method.

streams.rpc.timeout.ms Specifies the length of time in milliseconds to wait for a response from the MapR-ES server if a soft mount is configured (fs.mapr.hardmount is set to false). Default: 305000 Minimum: 300000
Table 3. Consumer configuration parameters supported from Apache Kafka
Parameter Description
auto.commit.interval.ms The frequency in milliseconds that the offsets are committed. Default: 1000ms
auto.offset.reset Specifies what MapR-ES should do when there is no initial offset, such as when a consumer starts reading from a partition. Default: latest
earliest
Reset the offset to the offset of the earliest message in the partition.
latest
Reset the offset to the offset of the latest message in the partition.
none
Throws a NoOffsetForPartitionException exception when the consumer next polls for messages in its subscription and no offset exists. The consumer must unsubscribe from the partition before polling functions correctly. Any other value throws an error to the consumer.
enable.auto.commit If true, periodically commits the highest offsets of the messages fetched by the consumer in all of the partitions for the topics that the consumer is subscribed to. Default: true
fetch.min.bytes The minimum amount of data the server should return for a fetch request. If insufficient data is available, the server will wait for this minimum amount of data to accumulate before answering the request.

This minimum applies to the totality of what a consumer has subscribed to.

Works in conjunction with the timeout interval that is specified in the poll function. If the minimum number of bytes is not reached by the time that the interval expires, the poll returns with nothing.

For example, suppose the value is set to 6 bytes and the timeout on a poll is set to 100ms. If there are 5 bytes available and no further bytes come in before the 100ms expire, the poll returns with nothing. Default: 1 byte

fetch.max.bytes
The maximum amount of data the server should return for a fetch request. If the first record batch in the first non-empty partition of the fetch is larger than this configuration, the record batch is still returned to ensure that the consumer can make progress.
Note: This parameter is new as of MapR 6.0.1.
fetch.max.wait.ms The maximum amount of time the MapR-ES server will block before answering the fetch request if there isn't sufficient data to satisfy the requirement given by fetch.min.bytes.
group.id A string 2457 up to bytes long that uniquely identifies the group of consumer processes to which this consumer belongs. By setting the same group ID, multiple consumer processes indicate that they are all part of the same consumer group. Putting consumers into groups provides benefits that are described in Consumer Groups.

It is possible for a single consumer to be in a group.

key.deserializer The class that implements the Deserializer interface for deserializing keys.
max.poll.records Places an upper bound on the number of records returned from each call.
Note: This parameter is new as of MapR 6.0.1.
max.partition.fetch.bytes The number of bytes of message data to attempt to fetch for each partition in each poll request. These bytes will be read into memory for each partition, so this parameter helps control the memory that the consumer uses. Default: 64KB

The size of the poll request must be at least as large as the maximum message size that the server allows or else it is possible for producers to send messages that are larger than the consumer can fetch.

If the first record batch in the first non-empty partition of the fetch is larger than this configuration, the record batch is still returned to ensure that the consumer can make progress.
Note: This is a behavior change as of MapR 6.0.1.
value.deserializer The name of the appropriate deserialization class in the org.apache.kafka.common.serialization package or a class that implements the Deserializer interface for deserializing values.

Table 4. Producer configuration parameters specific to MapR-ES
Parameter Description
streams.buffer.max.time.ms Messages are buffered in the producer for at most the specified time. A thread will flush all the messages that have been buffered for more than the time specified. Default: 3 * 1000 msec create default stream
streams.parallel.flushers.per.partition If enabled, producer may have multiple parallel send requests to the server for each topic partition. If this setting is set to true, it is possible for messages to be sent out of order. Default: true create default stream
streams.partitioner.class The class that implements the StreamsPartitioner interface. This interface lets you write custom algorithms for determining which topic and partition to use for messages that match specific criteria. Use this configuration parameter only for producers that are written in Java. create default stream
streams.producer.default.stream Specifies the stream that the producer will use by default if the producer does not provide the name of a stream when specifying a topic to write to.
Syntax:
/mapr/<cluster name>/<volume name>/<stream name>
create default stream
fs.mapr.hardmount Specifies whether to use a hard mount or a soft mount for connections to the MapR Streams server.

The default is to use a hard mount and the value is true.

If a value for this parameter is set in the core-site.xml file, the value in that file is ignored.

create default stream
fs.mapr.rpc.timeout Specifies the length of time in seconds to wait for a response from the MapR-ES server if the configuration parameter fs.mapr.hardmount is set to false. Default: 300. Minimum value: 30.
Note: Applicable to MapR 6.0.0 and earlier. As of MapR 6.0.1, use streams.mapr.timeout.ms.

If a soft mount is used, the time expires while a producer waits for a response from the MapR-ES server, and the producer used the KafkaProducer.send(ProducerRecord<K,V> record, Callback callback) method, the callback is invoked with the error EAGAIN, which means "Resource temporarily unavailable."

create default stream
streams.rpc.timeout.ms Specifies the length of time in milliseconds to wait for a response from the MapR-ES server if soft mount is configured (fs.mapr.hardmount is set to false). Default: 30000 Minimum: 30000
Table 5. Producer configuration parameters supported from Apache Kafka
Parameter Description
buffer.memory The total bytes of memory the producer can use to buffer records waiting to be sent to the server. If records are generated faster than they can be delivered to the server the producer will block. Default: 33554432
client.id Producers can tag records with a client ID that identifies the producer. Consumers can then be aware of which producer sent a message or set of messages. Apache Drill or other analytic tools querying messages can include this ID in the filters for their queries. Default: No client ID.
key.serializer The name of the appropriate serialization class in the org.apache.kafka.common.serialization package or a class that implements the Serializer interface for serializing keys.
metadata.max.age.ms The producer generally refreshes the topic metadata from the server when there is a failure. It will also poll for this data regularly. Default: 300 * 1000 msec
value.serializer The class that implements the Serializer interface for serializing values.