Spark 2.1.0-1707 Release Notes

This section provides reference information, including new features, patches, and known issues for Spark 2.1.0-1707.

The notes below relate specifically to the MapR Distribution for Apache Hadoop. You may also be interested in the open-source Spark 2.1.0 Release Notes.

Spark Version 2.1.0
Release Date August 2017
MapR Version Interoperability See MEP Components and OS Support.
Source on GitHub https://github.com/mapr/spark
GitHub Release Tag 2.1.0-mapr-1707
Maven Artifacts http://repository.mapr.com/maven/
Package Names See Package Names for MapR Expansion Packs (MEPs)
Note:
  • Full support of MapR Streams is available only on MapR 5.2 and later clusters.
  • Spark 2.1 can connect to Hive Metastore 2.1. But, features of Hive added after Hive 1.2 are not supported by Spark.
  • Spark Standalone and Spark on YARN can only run on clusters in MRv2 (YARN) mode. They are not supported on clusters in MRv1 (classic) mode.

Hive Support

This version of Spark supports integration with Hive. However, note the following exceptions:

New in This Release

Spark 2.1.0-1707 introduces the following enhancement:

Patches

This MapR release includes the following new patches since the latest MapR Spark 2.1.0 release. For details, refer to the commit log for this project in GitHub.

GitHub Commit Date (YYYY-MM-DD) Comment
40dca4e 2017/07/28 [MAPR-28441] - Fix Spark Streaming's handling of zero offsets from Kafka 0.9
99daf6b 2017/06/30 [SPARK-19182][DSTREAM] Optimize the lock in StreamingJobProgressListener to not block UI when generating streaming jobs.
8e3b9ed 2017/06/27 Revert earlier fix for MAPR-25770.
52b06b9 2017/06/23 [MAPR-27845] Fix the manner in which Spark determines Hive’s security configuration.
6917681 2017/06/14 [MAPR-27840] Fix wrong type casting when importing data from Oracle.
17f311e 2017/06/06 [SPARK-20393][WEBU UI] Strengthen Spark to prevent XSS vulnerabilities.
e10e660 2017/05/30 Revert [SPARK-16736][CORE][SQL] to avoid superfluous filesystem calls.
d8f8657 2017/05/29 [SPARK-18949][SQL][BACKPORT-2.1] Add recoverPartitions API to Catalog interface.
7733a1c 2017/05/29 [SPARK-19459][SQL][BRANCH-2.1] Support nested char and varchar fields in ORC.
f31976e 2017/05/22 [MAPR-27519] Improve performance of calculating web UI counters for Kafka-streaming.
6a3b683 2017/05/22 [SPARK-19276][CORE] Expose FetchFailure exceptions hidden by user exceptions.
333371c 2017/05/22 [SPARK-19597][CORE] Add a test case for task deserialization errors.
377e2ea 2017/05/22 [SPARK-17931] Eliminate unnecessary task serialization.
0788b14 2017/05/22 [SPARK-18662] Move resource managers to their own sub-directories.
fb5fca1 2017/05/22 Fix Mesos build breakage for Scala 2.10.
e05a1e9 2017/05/22 [SPARK-17062][MESOS] Add conf option to Mesos dispatcher.
e48f72e 2017/05/22 [SPARK-18836][CORE] Improve performance by serializing a single copy of the task metrics in the DAGScheduler.
807ba4d 2017/05/22 [SPARK-18761][CORE] Introduce a "task reaper" to oversee killing of tasks in executors.
54413bd 2017/05/22 [SPARK-20359][SQL] Avoid unnecessary execution in EliminateOuterJoin that can lead to NPE.
c919bdb 2017/05/22 [SPARK-19893][SQL] Avoid running DataFrame set operations on map types.
bd90640 2017/05/22 [SPARK-18863][SQL] Return an error if a subquery's output contains non-aggregate expressions without GROUP BY.
b898e28 2017/05/22 [SPARK-20280][CORE] Fix FileStatusCache Weigher to avoid integer overflow.
7c3b1b2 2017/05/22 [SPARK-19748][SQL] Fix refresh of an InMemoryFileIndex with FileStatusCache. Correct the order of operations.
58f2250 2017/05/22 [SQL] Improve the readability of partition handling code.
b3430f7 2017/05/22 [SPARK-20059][YARN] Use the correct classloader for HBaseCredentialProvider.
285be99 2017/05/19 [SPARK-20043][ML] Fix the DecisionTreeModel so the ImpurityCalculator builder handles uppercase impurity type Gini.
9c60a4d 2017/05/19 [SPARK-20125][SQL] Fix conversion of an Option type to a DataSet, when the Option contains a map type.
6663ca6 2017/05/19 [SPARK-18717][SQL] Fix code generation when mapping to an immutable Scala Map.
d602458 2017/05/19 [SPARK-20086][SQL] Fix CollapseWindow so it does not collapse dependent adjacent windows.
4755b36 2017/05/19 [SPARK-19925][SPARKR] Fix SparkR spark.getSparkFiles to avoid failures when called on executors.
dac1c4a 2017/05/19 [SPARK-19237][SPARKR][CORE] Fix spark-submit on Windows to handle the case where Java is not installed.
f48a43a 2017/05/19 SPARK-20017][SQL] Fix the str_to_map and explode functions to avoid NPEs. Change the nullability of the StringToMap function from false to true.
6e5245d 2017/05/19 [SPARK-19980][SQL][BACKPORT-2.1] Fix DataSet transformations on POJOs to preserve nulls. Add NULL checks in the Bean serializer.
bbd0c4d 2017/05/19 [SPARK-19872] [PYTHON] Fix UnicodeDecodeError in PySpark when reading from a text file with repartition. Use the correct deserializer for RDD construction for coalesce and repartition.
20579df 2017/05/19 [SPARK-19887][SQL] Fix handling of dynamic partition keys when persisting tables.
ff91608 2017/05/19 [SPARK-19611][SQL] Fix breakages for Hive tables backed by case sensitive data files. Introduce configurable table schema inference.
e9984d0 2017/05/19 [SPARK-19082][SQL] Fix config option ignoreCorruptFiles for Parquet files.
014e909 2017/05/19 [SPARK-19857][YARN] Correct calculation of next credential update time.
15fd019 2017/05/19 [SPARK-19765][SPARK-18549][SPARK-19093][SPARK-19736][BACKPORT-2.1][SQL] Backport cache related fixes from Spark 2.2 to Spark 2.1.
c689b5c 2017/05/19 [SPARK-18703][SPARK-18675][SQL][BACKPORT-2.1] Fix CTAS for Hive serde table so it works for all Hive versions. Drop staging directories and data files that were not dropped until JVM termination.
4cf5e41 2017/05/19 [SPARK-14772][PYTHON][ML] Fix Python ML Params.copy method to match Scala implementation.
f8d3846 2017/05/19 [SPARK-19691][SQL][BRANCH-2.1] Fix ClassCastException when calculating percentile of decimal column.
5aa4a2d 2017/05/19 [SPARK-19500] [SQL] Fix failure in radix sort when attempting to spill the aggregated hash map.
20806a8 2017/05/19 [SPARK-19399][SPARKR][BACKPORT-2.1] Fix tests broken by the introduction of R coalesce API for DataFrame and Column.
a4bedf1 2017/05/19 [SPARK-19399][SPARKR] Add R coalesce API for DataFrame and Column.
fc9e7b0 2017/05/19 [SPARK-18788][SPARKR] Add getNumPartitions API to SparkR.
642e7bb 2017/05/19 [SPARK-18335][SPARKR] Extend createDataFrame to support a numPartitions parameter.
4ec94f5 2017/05/19 [SPARK-19342][SPARKR] Fix collect method for timestamp columns so it does not incorrectly covert to numeric.
d639208 2017/05/19 [SPARK-19543] Fix from_json when the input row is empty.
5948815 2017/05/19 [SPARK-19509][SQL] Fix Grouping Sets to handle nullable grouping columns.
f446906 2017/05/19 [SPARK-19472][SQL] Fix parser error when trying to resolve nested CASE WHEN statement with parenthesis. The statement was mistaken for a function call.
889860a 2017/05/19 [SPARK-19406][SQL] Fix function to_json to respect user-provided options.
2370cdf 2017/05/19 [SPARK-19396][DOC] Support case-insensitive JDBC options.
242b33c 2017/05/19 [SPARK-19324][SPARKR] Fix SparkR so it does not remove Spark JVM stdout output.
823d5e8 2017/05/19 [SPARK-19338][SQL] Include UDF names in explain output.
768a10b 2017/05/19 [SPARK-19231][SPARKR] Add error handling for download and untar of Spark releases.
d1f9ed5 2017/05/19 [SPARK-19129][SQL] Disallow ALTER TABLE drop partition with an empty partition value.
98d8d9c 2017/05/19 [SPARK-19180] [SQL] Fix incorrect offset in OffHeapColumn.
79ff854 2017/05/19 [SPARK-19092][SQL][BACKPORT-2.1] Fix save() API in the DataFrameWriter to avoid a scan of all the saved files.
c44f274 2017/05/19 [SPARK-19130][SPARKR] Support setting columns to implicit literal values in SparkR.
148167b 2017/05/16 [MAPR-26414] Fix Spark History Server memory leak.
1a9b364 2017/05/15 Update dependencies after ECO-1703 release.
3554f31 2017/05/04 [SPARK-33] Fix streaming example.
3ae224b 2017/04/28 [SPARK-19019][PYTHON][BRANCH-2.0] Fix hijacked collections.namedtuple. Port cloudpickle changes needed for PySpark to work with Python 3.6.0.
4584170 2017/04/28 [SPARK-19146][CORE] Drop more elements when stageData.taskData.size > retainedTasks.
a259c8e 2017/04/28 [MAPR-26287] Remove unnecessary code from hadoop-version-picker.sh.
c0c94e5 2017/04/28 [MAPR-26414] Fix Spark History Server memory leak.

Known Issues

Resolved Issues

None.