14 min read
Deep learning, machine learning, artificial intelligence - all buzzwords and representative of the future of analytics. In this post we will explain what is machine learning and deep learning at a high level with some real world examples. In future posts we will explore vertical use cases. The goal of this is not to turn you into a data scientist, but to give you a better understanding of what you can do with machine learning. Machine learning is becoming more accessible to developers, and Data scientists work with domain experts, architects, developers and data engineers, so it is important for everyone to have a better understanding of the possibilities. Every piece of information that your business generates has potential to add value. This and future posts are meant to provoke a review of your own data to identify new opportunities.
Throughout the history of AI the definition has been continuously redefined. AI is an umbrella term, the idea started in the 50s, machine learning is a subset of AI and deep learning is a subset of ML.
In 1985, when I was a student interning at the NSA, AI was also a very hot topic. At the NSA I even took an MIT video (VCR) class on AI, which was about expert systems. Expert systems capture an expert's knowledge in a rules engine. Rules engines have wide use in industries such as finance and healthcare, and more recently for event processing, but when data is changing, rules can become difficult to update and maintain. Machine learning has the advantage that it learns from the data, and it can give data driven probabilistic predictions.
image reference https://www.pcmag.com/encyclopedia/term/42865/expert-system
According to Ted Dunning, it is better to use precise terminology, like Machine Learning or Deep Learning instead of the word AI, because before we get something to work well, we call it AI, afterwards, we always call it something else. AI is better used as a word for the next frontier.
According to Thomas Davenport in the HBR, analytical technology has changed dramatically over the last decade, with more powerful and less expensive distributed computing across commodity servers, streaming analytics, and improved machine learning technologies, enabling companies to store and analyze both far more data and many different types of it.
Traditionally data was stored on a RAID system, sent to a multi-core server for processing and sent back for storage, which caused a bottleneck on data transfer, and was expensive. With file and table storage like MapR XD and MapR-DB, data is distributed across a cluster, and Hadoop technologies like MapReduce, Pig, and Hive send the computing task to where the data resides.
MapR Event Streams, a new distributed messaging system for streaming event data at scale, combined with Stream processing like Apache Spark streaming, or Apache Flink, speed up parallel processing of real time events with machine learning models.
Graphical Processing Units (GPUs) have sped up multi-core servers for parallel processing. A GPU has a massively parallel architecture consisting of thousands of smaller, more efficient cores designed for handling multiple tasks simultaneously. Whereas a CPU consists of a few cores optimized for sequential serial processing. In terms of potential performance, the evolution from the Cray-1 to today’s clusters with lots of GPU’s is roughly a million times what was once the fastest computer on the planet at a tiny fraction of the cost.
image reference http://www.nvidia.com/object/what-is-gpu-computing.html
Machine learning uses algorithms to find patterns in data, and then uses a model that recognizes those patterns to make predictions on new data.
In general, machine learning may be broken down into two types: supervised, unsupervised, and in between those two. Supervised learning algorithms use labeled data, unsupervised learning algorithms find patterns in unlabeled data. Semi-supervised learning uses a mixture of labeled and unlabeled data. Reinforcement learning trains algorithms to maximize rewards based on feedback.
Supervised algorithms use labeled data in which both the input and target outcome, or label, are provided to the algorithm.
Supervised Learning is also called predictive modeling or predictive analytics, because you build a model that is capable of making predictions. Some examples of predictive modeling are classification and regression. Classification identifies which category an item belongs to (for example whether a transaction is fraud or not fraud), based on labeled examples of known items (for example transactions known to be fraud or not). Logistic regression predicts a probability, for example the probability of fraud. Linear regression predicts a numeric value, for example the amount of fraud.
Some examples of Classification include:
Some examples of logistic regression (or other algorithms) include:
Some examples of linear regression include:
There are other Supervised and Unsupervised learning Algorithms shown below, which we won’t go over, but we will look at one example of each in more detail.
image reference : http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
Classification takes a set of data with known labels and pre-determined features and learns how to label new records based on that information. Features are the “if questions” that you ask. The label is the answer to those questions.
Let’s go through an example of Debit Card Fraud:
To build a classifier model, you extract the features of interest that most contribute to the classification.
Decision trees create a model that predicts the class or label, based on several input features. Decision trees work by evaluating a question containing a feature at every node and selecting a branch to the next node, based on the answer. A possible decision tree for predicting debit card fraud is shown below. The feature questions are the nodes, and the answers “yes” or “no” are the branches in the tree to the child nodes. (Note that a real tree would have more nodes).
Decision trees are popular because they are easy to visualize and explain. The accuracy of models can be improved by combining algorithms with ensemble methods. An ensemble example is Random forest, which combines multiple random subsets of decision trees.
Unsupervised learning, also sometimes called descriptive analytics, does not have labeled data provided in advance. These algorithms discover similarities, or regularities in the input data. An example of unsupervised learning is grouping similar customers, based on purchase data.
In clustering, an algorithm classifies inputs into categories by analyzing similarities between input examples. Some clustering use cases include:
The K-means algorithm groups observations into K clusters in which each observation belongs to the cluster with the nearest mean from its cluster center.
An example of clustering is a company that wants to segment its customers in order to better tailor products and offerings. Customers could be grouped on features such as demographics and purchase histories. Clustering with unsupervised learning is often combined with Supervised learning in order to get more valuable results. For example in this banking customer 360 use case, customers were first segmented based on answers to a survey. The customer groups were analyzed and labeled with customer personas. These labels were then linked by customer Id with features such as types of accounts and purchases. Finally supervised machine learning was applied and tested with the labeled customers, allowing to link the survey customer personas with their banking actions and provide insights.
Deep learning is the name for multilayered neural networks, which are networks composed of several “hidden layers” of nodes between the input and output. There are many variations of neural networks, which you can learn more about on this neural network cheat sheet. Improved algorithms, GPUs and massively parallel processing (MPP), have given rise to networks with thousands of layers. Each node takes input data and a weight and outputs a confidence score to the nodes in the next layer, until the output layer is reached where the error of the score is calculated. With backpropagation inside of a process called gradient descent, the errors are sent back through the network again and the weights are adjusted improving the model. This process is repeated thousands of times, adjusting a model’s weights in response to the error it produces, until the error can’t be reduced any more.
During this process the layers learn the optimal features for the model, which has the advantage that features do not need to be predetermined. However this has the disadvantage that the model’s decisions are not explainable. Because explaining the decisions can be important, researchers are developing new ways to understand the black box of deep learning.
There are different variations of Deep Learning Algorithms, which can be used with the Distributed Deep Learning Quick Start Solution from MapR to build data-driven applications such as the following:
Deep Neural Networks for Improved Traditional Algorithms
Convolutional Neural Networks for images
Recurrent Neural Networks for sequenced data
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.