Data Science Offerings from MapR Power the AI Journey

Contributed by

8 min read

The data science team at MapR has been dedicated to assisting our customers get the most out of their platform. Over the last few years, the focus has been shifting gradually from orchestrating data movement to discovering the insights that generate business value. For many customers, once they have extracted value from machine learning insights, they begin to evaluate artificial intelligence as a potential next step in the evolution of their practices.

We've had many opportunities to advise and assist MapR customers in making these transitions. It's quite reasonable to assume that these scenarios will vary by industry, data sources, and use case, in addition to the stage of the AI evolution our customers have attained. As a result, we've created six new data science offerings designed to appeal to MapR customers, regardless of the maturity of their current AI adoption.

Leveraging the building blocks of AI and machine learning requires many experiments, combined with the ability to customize your ML environment. MapR supplies a pre-built container, which makes this process really easy: the Data Science Refinery. The use of containers in building ML workflows may be new to seasoned data scientists and developers. For this reason, we are offering the Data Science Refinery Accelerator, which includes a week of free training with license purchase. The goal of this training is to ensure that your teams get the most out of the refinery and its ability to leverage the MapR Data Platform and begin deriving value faster.

Sometimes taking the first step is where you get stuck. The AI/ML Hack-A-Thon is designed to define that first step in a one-day session hosted by a MapR data scientist. Utilizing one week for planning, we collaborate with your business stakeholders to identify an opportunity, data sources, and what a possible solution might look like. Types of opportunities will vary by industry, but recurrent themes include finding anomalies in customer behavior, prioritizing cases for review (e.g., fraudulent transactions), and evaluating new data sources for lift to current applications (e.g., internet browsing history). Then, in a session with your engineers, scientists, and project managers, we lead a one-day, hands-on workshop that begins with simple coding templates, adds machine-learning, and ends with a prototype that can be extended and improved.

Every business with a network wants to protect that network from threats. There are a lot of great tools out there, effective at identifying the behavior they're trained to find. Yet we still read about more successful attacks every day. For this reason, we created Cybersecurity Advanced Protection. It's designed to complement an organization’s existing network security applications by focusing on logs that have already been evaluated for known threats. We have created an ensemble of analytic methods that can be tailored to specific security goals and can evaluate network behavior for relationships between entities, network traffic trends, and alerts when stochastic events fall outside acceptable probability thresholds. By providing a visualization tool, you'll easily evaluate anomalies, escalating if necessary.

One of the more common requests we get is to refactor an existing ML solution built in a workstation onto the MapR platform and to be able to make business decisions at scale. To meet this demand, we created the ML Deployment offering. Many of our customers are limited in their legacy modeling environments – either in use of the right tool or access to large amounts of data. A recent example involved some very complex manufacturing data; the customer’s scientists built very good models but were training them in R on local workstations and were severely limited by size of training data and ability to publish models quickly. We helped them run the same algorithms (with some big data adjustments to the features but leaving the algorithms intact) on the MapR Data Platform and removed those limits. The goal of this offering is to remove the limitations in your ML environment through scaling feature engineering and ML algorithms by leveraging the MapR Data Platform and delivering those abilities to the production environment for implementation.

Many machine learning solutions degrade soon after implementation. Often performance data is only available after a considerable amount of time has passed – you need to look deeper than the performance to assess your current performance. For example, consider customer churn. If you predict that a customer will not churn for 60 days, you have to wait 60 days to find out if you were right (whereas you’ll find out if you’re wrong sooner). If your model is degrading (as most models do), you cannot wait 60 days to determine if it’s time for a new solution. An elegant way to monitor changes in your input data that might affect your model involves running a clustering algorithm on your training data. By measuring the MSE of your newly-scored batches, you can create an early warning system that is a proxy for actual performance alerts. To monitor a deployed machine learning solution, we created the ML Model Maintenance offering, based on the scoring architecture outlined in this Machine Learning Logistics ebook. Using the MapR ecosystem, we collect, store, and visualize the model's payload to keep you alerted to trends that may be impacting performance.

Once an ML solution is deployed and monitored, it can be evaluated for the application of reinforcement-type algorithms continuously, which comprises a process which has come to be defined as "artificial intelligence." A few examples of ML processes include the following: loan applications being evaluated (accept/deny), a car dashboard camera that identifies obstacles, and computerized responses to customer complaints. Each of those solutions were probably trained on many, many historical training examples. However, models built on existing patterns are bound by what is known. Building on the insights of the original ML model, our AI Enablement applies algorithms designed to evaluate each decision combined with expert coaching that informs the ML solution continuously in order to identify and react to new patterns. The choice of algorithm depends heavily on the circumstances surrounding the decision being made (e.g., costs of errors, number of agents, nature of agent goals, availability of coaching, requirement of reason codes, etc.) but may include elements of reinforcement learning, genetic algorithms, game theory, or combinations of multiple methods. The most important aspect of successful artificial intelligence is having a platform agile enough to consume the outputs of the model in production and deliver that to the ML development environment in a continuous loop. This is what gives MapR the ability to make this offering viable.

The ability to leverage AI does not emerge overnight, and it may not be the best solution for every case. It often requires a series of steps, each depending on the success of the previous. But regardless of where you are now or how daunting that next step may appear, MapR has the data science offering to help you take it.

This blog post was published September 11, 2018.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now