5 min read
I’d like to share one of the demos I’m preparing to show at the Strata Data Conference in New York during the week of September 10th. My objective with this demo is to illustrate how certain database capabilities, such as secondary indexes and integration with Apache Spark, can improve the effectiveness of Customer 360 solutions. I really like the Customer 360 use case because it talks to many of MapR’s strengths, even beyond the database, and it can be communicated in a GUI that is both visually engaging, as shown in the following screenshot, and technically deep.
The three points I’m trying to convey with this demo are:
MapR Database is a scalable and resilient NoSQL database that allows different attributes for different customers to be saved in the same table. This enables you to save customer insights derived by joining datasets, regardless of whether those insights relate to all or a portion of your customer base.
This is useful, for example, because in order to maintain a comprehensive view into the preferences of your customers you might need to capture the data they expose through activities in different places, like on social media or in your organization’s mobile app. However, not all customers may use social media or your mobile app. This leads to sparsity in columnar data tables, which can reduce the performance and usability of relational databases. A NoSQL database like MapR Database can store data for all customers in one table even if different columns are used for each customer.
Apache Spark is a leading technology for processing large datasets. However, the speed in which you can apply analytical insights on Big Data is greatly compromised when those datasets must be moved into and out of Spark execution engines. The MapR Database connector for Spark solves this problem by enabling Spark to save and update data in MapR Database without data movement.
When organizations attempt to build Customer 360 solutions they often fail to operationalize analytical insights by saving them back into CRM tables. So, the third major talking point in my demo relates to how we can take an output like churn prediction from machine learning (ML) and load it back into master CRM tables so those insights become instantly accessible by production applications. When ML processes and production applications share a common data platform like MapR, it's much easier to operationalize ML insights. This is part of what many people at MapR refer to as, "the power of convergence".
I wrote a Zeppelin notebook (viewable here) that illustrates how to use Spark on MapR for clickstream analysis. This notebook walks through the following steps to demonstrate how Spark SQL, Spark Streaming, and Spark ML can be used with MapR Database, MapR Event Store, and the Distributed File and Object Store:
The code excerpts below show how those four tasks were implemented.
To download the code and our Customer 360 demo, see the GitHub repository at https://github.com/mapr-demos/customer360/tree/master/clickstream
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.