Queue Management with Hive on Tez

Queue management is strongly connected to the type of YARN Scheduler.

By default, a MapR cluster uses Fair Scheduler and Hive on Tez executes queries in queues with a user name. If a query is submitted via a Hive CLI then the real user name is used.

If a query submitted via a HiveServer2 client, such as Beeline, then the queue name depends on the Hiveserver2 impersonation configuration by the hive.server2.enable.doAs property. It could be either the real user name or user name of the Hiveserver2 process.

With a Capacity Scheduler, Hive queries via CLI and Beeline are configured via the capacity-scheduler.xml file.

Also, the queue could be changed with the Tez tez.queue.name=<queue_name> property, which helps to execute a query in some specific queue. It could be specified before each query with a Hive command set tez.queue.name=<queue_name> or could be pointed to in the hive-site.xml file.

<property>
   <name>tez.queue.name</name>
   <value>my_queue</value>
</property> 

Basically, Tez initiates a session and keeps it alive for execution of sequential queries and supports setting modifications, such as specifying the queue name. But Application Masters (AM) are strongly bound with YARN and you cannot change the queue for an already started AM.

Queue management with HiveServer2

Hiveserver2 provides built-in functionality for setting up and handling a pool of Tez sessions via initialization of default queues. This approach is only applicable for submission queries using HiveServer2 clients, such as Beeline, and Hiveserver2 works without impersonation: hive.server2.enable.doAs=false.

If you define hive.server2.enable.doAs=true, then a new AM is started beside an existing AM for a default queue. Default queues are not to be used and closed at the end of a lifetime. You can specify the following properties for enabling default queues in the hive-site.xml file:
  • To generally enable default queues:
    hive.server2.tez.initialize.default.sessions=true
  • To specify a queue name that starts at a minimum of one session with AM:
    hive.server2.tez.default.queues=test_TEZ1,test_TEZ2
  • To define the number of sessions and AMs. For example, if you specify two default queues and two sessions per default queue causes four AMs to start:
    hive.server2.tez.sessions.per.default.queue=2
  • To define the lifetime for a pool of sessions. With this property set, on startup, Hiveserver2 creates a pool of sessions. These sessions are re-established after the end of a lifetime during submitting a query with Hiveserver2:
    hive.server2.tez.session.lifetime=5m

If Capacity Scheduler is in use, default queue names are chosen from the scheduler settings. You can still use the tez.queue.name=<queue_name> property to run queries in custom queues.