Admission control enforces limits on concurrent SQL queries and statements that run in an Impala cluster with heavy workloads. When admission control is enabled, Impala queues SQL queries and statements that it would otherwise cancel or re-run due to insufficient resources or performance bottlenecks.
You can configure admission control options that Impala uses to enforce limits on the following behaviors:
- Number of concurrent queries that run in the cluster
- Query queue size
- Number of delayed queries that the queue can hold
- Amount of memory that queries use
- Time that queries can exist in queue before Impala cancels them
Admission control is embedded within each impalad daemon and communicates through the statestore service. The impalad daemon determines if a query runs immediately or if the query is queued. The queue can include queries submitted through multiple Impala nodes.
Impala executes queries in the order that they are submitted to allow for dependent statements, such as a CREATE TABLE statement followed by an INSERT INTO statement. Impala may not execute queries submitted through different nodes in the order that the queries were received. When Impala must execute a sequence of statements in order, you should submit the statements on the same Impala node within a single session. When statement order is not important, you can set up your table structures and submit statements through different Impala nodes.
If a sudden flow of requests causes more queries to run concurrently than expected, the overall Impala memory limit and the Linux cgroups mechanism serve as hard limits to prevent over allocation of memory. When queries hit these limits, Impala cancels the queries.
Configuring Admission Control
You can use a combination of Impala start-up options, and optionally edit configuration settings in
fair-scheduler.xml, to configure admission control. Admission control uses the Fair Scheduler configuration settings to determine how to map users and groups to different resource pools.
Fair Scheduler Settings
You can configure settings in
fair-scheduler.xml that admission control can use to determine how to map users and groups to different resource pools. The
<aclSubmitApps> tag in
fair-scheduler.xml contains users and groups that can submit Impala statements to a corresponding Impala resource pool. The user and group lists are separated by a space, as shown in the following example:
<aclSubmitApps> tag is empty for a pool, no one can submit directly to that pool, however child pools can have their own
<aclSubmitApps> values to allow users and groups to submit to the child pools.
Impala does not use the
disks values, however you must specify them to satisfy YARN requirements for the file content. For more information about Fair Scheduler configuration settings, refer to the Apache wiki.
fair-scheduler.xml shows examples of the
<aclSubmitApps> tag and also shows the
Impala Start-up Options
To configure admission control, modify Impala’s start-up options. You can modify the start-up options in
For more information about how to modify Impala start-up options, refer to Additional Impala Configuration Options.
The following table lists the admission control start-up options that you can configure:
The maximum number of requests allowed in the queue. Impala rejects additional requests when the queue reaches this limit. This a “soft” limit that applies cluster-wide. Each Impala node decides independently whether to run queries immediately or to queue them. Impala allows for the overall number of queued queries to be slightly higher that the limit during times of heavy load. A negative value or 0 indicates that requests are always rejected once the maximum concurrent requests are executing. Ignored if fair_scheduler_config_path is set.
The maximum number of concurrent requests allowed to run before incoming requests are queued. This a “soft” limit that applies cluster-wide. Each Impala node decides independently whether to run queries immediately or to queue them. The overall number of concurrent queries might be slightly higher during times of heavy load. A negative value indicates no limit. Ignored if fair_scheduler_config_path is set.
"" (empty string)
The maximum amount of memory that all outstanding requests in this pool can use before new requests to this pool are queued. Specified in bytes, megabytes, or gigabytes by a number followed by the suffix b (optional), m, or g, either upper- or lowercase. You can specify floating-point values for megabytes and gigabytes, to represent fractional numbers such as 1.5. You can also specify it as a percentage of the physical memory by specifying the suffix %. 0 or no setting indicates no limit. Defaults to bytes if no unit is given. This is a soft limit applied cluster-wide. Each Impala node makes independent decisions to run queries immediately or queue them, so the overall memory used by concurrent queries might be slightly higher during times of heavy load. Ignored if fair_scheduler_config_path is set.
Note: Impala relies on the statistics produced by the COMPUTE STATS statement to estimate memory usage for each query.
Turns off the admission control feature entirely, regardless of other configuration option settings.
Disables all per-pool limits on the maximum number of running requests.
Disables all per-pool memory limits.
"" (empty string)
Path to the Fair Scheduler allocation file,
The maximum amount of time (in milliseconds) that a request waits in queue to be executed before timing out.
Admission Control with Clients
Admission control works with JDBC and ODBC client interfaces, however you may experience the following scenarios due to limits enforced by this feature:
- The API call blocks SQL statements in the query queue instead of running them immediately. Query execution begins when the statement moves out of the query queue, at which time the client program can request the results, which may also block until they become available.
- If a SQL statement is canceled due to prolonged queue time or because it exceeded the memory limit during execution, an error occurs and the client program receives an error message.
You cannot set the following options from JDBC or ODBC applications:
- You must set the REQUEST_POOL option for a session through the impala-shell interpreter, or through the Impala start-up options if you want the setting to apply cluster-wide.
- You must set the MEM_LIMIT query option through the impala-shell interpreter. It cannot be used directly through JDBC or ODBC applications.
Admission Control Guidelines
The admission control system is not aware of other Hadoop workloads, such as MapReduce jobs.
The following table lists some admission control guidelines to follow:
Examine query profile output
Examine the profile output for a query to see how admission control works for the query. The profile output provides details about the admission decision, such as whether the query was queued or not and which resource pool it was assigned to. It also includes the estimated and actual memory usage for the query, so you can fine-tune the configuration for the memory limits of the resource pools. In impala-shell, you can also specify which resource pool to direct queries to by setting the REQUEST_POOL query option.
You can run the PROFILE statement in the impala-shell right after you run the query to see the query output, or you can review the Impala log file.
You cannot use admission control with Hue deployed
Unclosed Hue queries accumulate and exceed the queue size limit. To use admission control, you must explicitly enable it by specifying --disable_admission_control=false in the impalad command-line options safety valve field.
Set the MEM_LIMIT query option to override the query estimated memory usage
When a query cannot run due to high estimated memory usage, set the MEM_LIMIT query option in the impala-shell and issue the query through the shell in the same session to override the estimate. Impala treats the MEM_LIMIT value as the estimated amount of memory and overrides the estimate that Impala would generate based on table and column statistics. This value is used only for making admission control decisions, and is not pre-allocated by the query.
Increase memory if needed when inserting into Parquet tables
Admission control affects query statements, as well as INSERT and CTAS. Inserting into a Parquet tables is memory intensive because 1GB of data is buffered before writing out each Parquet data block. When inserting into a partitioned Parquet table, Impala redistributes the data among the nodes to reduce memory consumption. You may need to temporarily increase the memory dedicated to Impala during the insert operation, or break up the load operation into several INSERT statements, or both.
Limits on queued queries affect subsequent statements in the same session
If Impala queues a query due to a limit on concurrent queries or memory usage, subsequent statements in the same session are also queued to ensure that the statements are processed in the correct order.
Reuse classifications and hierarchy developed for use with Sentry security
If you set up different resource pools for different users and groups, consider reusing any classifications and hierarchy you developed for use with Sentry security. See Enabling Sentry Authorization for Impala for details. For details about all the Fair Scheduler configuration settings, see the Apache wiki, in particular the tags such as <queue> and <aclSubmitApps> to map users and groups to particular resource pools (queues).
Use the COMPUTE STATS statement for large tables involved in join queries
Although COMPUTE STATS is an important statement to help optimize query performance, it is especially important when admission control is enabled. Admission control relies on COMPUTE STATS to generate accurate memory usage estimates for complex queries.