GraphX is a graph library that runs on top of Apache Spark. Developers can use the languages and tools they are familiar with using for Spark to implement new types of algorithms that require the modeling of relationships between objects.
Graph processing is the backbone for many real-world applications, such as:
Until recently, developers had to choose a language and library that was either optimized for graphs or for traditional table data. However, many use cases require that developers have access to both simultaneously. For instance, a recommendation algorithm may take a social graph as one input and a table of product ratings as another. Furthermore, the developer writing that recommendation algorithm may want to take care of standard machine learning clustering algorithms like k-Means.
GraphX and Spark provide a comprehensive platform to solve these kinds of problems. By adding a library of graph functions (GraphX) and a library for machine learning (MLlib) to a platform that understands table data (Spark, Shark), developers can seamlessly develop algorithms that take advantage of all functionality at once.