4 min read
One of the most common use cases for MapR customers is to run Spark applications on YARN. There are two deployment modes that can be used to launch Spark applications on YARN: cluster and client modes.
In cluster mode, the Spark driver runs inside an application master process, which is managed by YARN on the cluster, and the client can go away after initiating the application.
In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN. Executors and application masters run inside "containers."
When technical issues arise, the first place to look is in the driver and executor logs, when jobs are submitted in YARN mode. This blog is an attempt at educating you on what logs to look for when typical issues arise.
The Spark Application UI can be used to view the job status and progress; however, in many scenarios, there is a need to capture additional and optional logging. The Spark Troubleshooting Guide has captured different DEBUG configurations that can be provided to the job to acquire a more robust set of logs for a deep understanding of where the issue exists.
This guide is intuitively sectioned into:
|Spark Troubleshooting HUB|
|1.0||Spark Troubleshooting Hub: Start here when debugging Spark issues|
|1.1||Spark Troubleshooting Guide: Running Spark: How do I add verbose logs for Spark Driver and Executor?|
|1.2||Spark Troubleshooting Guide: How to connect to Spark Thrift Server?|
|1.3||Spark Troubleshooting Guide: How to collect GC statistics for Spark (Garbage Collection)|
|2.1||Spark Troubleshooting Guide: Profiling Spark: How to collect heap dump using jmap utility|
|2.2||Spark Troubleshooting Guide: Profiling Spark: How to collect jstack for a known executor or driver process|
|2.3||Spark Troubleshooting Guide: Profiling Spark: How to collect jstat data|
|3.1||Spark Troubleshooting Guide: Tuning Spark: Estimating memory and CPU utilization for Spark jobs|
|3.2||Spark Troubleshooting Guide: Tuning Spark: How do I tune the Spark History Server for large event logs?|
|Debugging Spark Applications|
|4.1||Spark Troubleshooting Guide: Debugging Spark Applications: How to pass log4j.properties from executor and driver|
|4.2||Spark Troubleshooting Guide: Debugging Spark Applications: How to add MapR file client debug from Spark application|
|Spark Memory Management|
|5.1||Spark Troubleshooting Guide: Memory Management: How do I troubleshoot typical out-of-memory (OOM) issues on Spark Driver?|
|5.2||Spark Troubleshooting Guide: Memory Management: How to troubleshoot out-of-memory (OOM) issues on Spark Executor|
|6.1||Spark Troubleshooting Guide: Spark SQL: How do I print the schema of a DataFrame?|
|6.2||Spark Troubleshooting Guide: Spark SQL: How do I generate Physical, Logical, and Optimized Logical Plan?|
|6.3||Spark Troubleshooting Guide: Spark SQL: Examples of commonly used Spark SQL Tuning properties?|
To the committed MapR Support team, Knowledge Articles are a byproduct of solving real customer issues. Given the complexity of technical challenges in this area and given that users have a wide latitude of configurability in their setups, the focus of these articles is to enhance user education on self-solving technical issues. Note that the guidance on what log files are applicable for what issue-types is of high value here. This attempt is to cover the most common issues; it is not an exhaustive compilation. These tips and tricks have been verified to be helpful for several customer cases before being published to the Support Portal.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.