MLlib is a machine learning library that runs on top of Apache Spark.

Machine learning is a discipline of artificial intelligence that enables systems to learn based on data alone, continuously improving performance as more data is processed. Machine learning is the basis for many technologies that are part of our everyday lives. Some examples of applied machine learning algorithms include recommendation engines, natural language processing, anomaly detection etc. Until recently, data scientists had to implement and customize machine learning algorithms manually to the computing framework that they were using, resulting in a significant amount of work. Now, with Spark and MLlib, data scientists can write jobs that reference a number of predefined algorithms to build these kinds of applications.

For the data scientists reading this, below is a current list of machine learning algorithms exposed by MLlib.


Linear Regression:


Collaborative Filtering:

Gradient Descent Primitive:


MLLib Page

Download Sandbox for Hadoop

GitHub - MapR

MapR Developer Central