Converging Your Data Landscape


Speakers:

John Myers

Managing Research Director Business Intelligence, Enterprise Management Associates (EMA)

Jack Norris

SVP, Data and Applications, MapR Technologies


How Data-Driven Approaches are Changing Your Data Management Strategies

Introducing data-driven strategies into your business model alters the way your organization manages and provides information to your customers, partners and employees. Gone are the days of “waterfall” implementation strategies from relational data to applications within a data center. Now, data-driven business models require agile implementation of applications based on information from all across an organization–on-premises, cloud, and mobile–and includes information from outside corporate walls from partners, third-party vendors, and customers. Data management strategies need to be ready to meet these challenges or your new and disruptive business models will fail at the most critical time: when your customers want to access it.

In this webinar, John L. Myers of Enterprise Management Associates (EMA) and Jack Norris of MapR will discuss how the new business advancements require data-rich applications that enable global, real-time data integration, microservices support, and in-place and continuous machine learning/AI and SQL capabilities.

Watch the on-demand webinar to learn:

  • Examples of disruptive business models
  • Drivers of changes to the management landscape
  • Best practices associated with meeting requirements for data-driven applications

Transcript

David: Our speakers today will be John Myers, managing research director at EMA and Jack Norris, senior vice president, data and applications, at MapR. Our presentation today will run approximately one hour with the last 15 minutes of the hour dedicated to addressing any questions. You can submit a question at any time throughout the presentation via the chat box in the lower left-hand corner of your browser.

David: I apologize for the late start. We had technical difficulties. With that, I would like to pass the ball over to John to get us started. John, it's all yours.

John Myers: Why thank you very much, David. I appreciate the introduction and we'll get going here. A little bit about the agenda of what we're going to do. We're going to do a conversational type of panel discussion between myself and Jack Norris and we're going to talk about the concepts of data-driven cultures that are disrupting the business landscape. What are some of the drivers from that change that are changing the way that we look at data management? Look at some of the best practices that go along with that and then, at the end, Jack, I believe you're going to share how one or more of the MapR customers are looking at the way that this all works together.

Jack Norris: Yes, yes. Looking forward to it, thanks.

John Myers: Alright. So, topic number one, data-driven cultures are disrupting the business landscape and when we look at the business of going through an organization, we have two camps, if you will. We have the old school guys that believe that data is an exhaust or in a result set from their business. So, they look at their revenue transactions, they look at their inventories, they look at all of these different things, almost in a rear view mirror type of concept where they're looking backwards, they think of data as being more of the system of record, if you will, but they don't really use it as an input into their system.

John Myers: And then we have the data-driven cultures that are saying, "Hey, we have all of this information about our customers, about our inventories, about market trends and things of that nature," and they're taking that information and pushing it in to the front end of what they're doing. They are going to explore through that information, they are going to do analysis on that information, they're going to create ... Whether it be campaigns, whether they're going to make targeted matches associated with pricing, things of that nature, and really are getting to ... They're getting to a point they're able to see the changes in the transactions and they're really getting to a point where they're disrupting the business that's going on.

John Myers: So, at EMA, when we look at data-driven applications and some of the usage scenarios that they're looking at, we look at them across five major components. One is managing operation, how do we make our business better? How do we find economies of scale and efficiencies within our supply chain, our inventory, the way that we handle our own business? We also look at ways that people can look at customer experience. How can they make all of these new channels of communication that work for particular customers?

John Myers: They're also looking at areas around sales and marketing. How do we take this information about customer, how do we turn that into actual revenues? There's also customer and product intelligence. How do we create that bucket of one, if you will, and match up particular products with particular customers and make sure that they come together?

John Myers: And then we have the cost and risk analysis side of things, because as we have all these other pieces around it, we need to make sure that we're not exposing ourselves to risk if we're talking about transactions around credit score and things of that nature. We want to make sure that we're not being defrauded by some of our customers and the way that goes with that.

John Myers: But these are the ways that we see that and it's a mix of these workloads around exploratory, being able to stick your fingers in the data, doing data discovery, information exploration, operational platforms, where we're able to say, "Hey, how do we get a much faster way to create orders, fulfill orders, make sure our customer experience is good," and then down to the point for looking at analytics and then operational analytics for things like fraud, like risk management, things of that nature, but how we go do those types of things.

John Myers: In terms of the pace of industry disruption, I like to look at these three main industries and some of the disruptors in that. So, the first one I want to talk about is content consumption or what we might have called in the past "video distribution." We used to think of it as a broadcast, scheduled type of industry and when we had information, that we would definitely be in arrears. We would always say, "Hey, who watched last night?" And things of that nature.

John Myers: Well, the guys over at Netflix have taken the information that they've gotten from both their history of DVD distribution and streaming content distribution and they're able to make decisions about their content portfolio, what content is sticky, what is particularly good for people, what can they recommend the right particular pieces to their customer base? And it really flipped this over and said, "If people want to watch in these particular fashions, then let's enable them to do that. Not based on schedule, not based on a particular screen, but give them that wide choice." And because they've taken the data and put it into the beginning of the process for recommendations, for all these pieces, they're able to really transform the way that we view and do content.

John Myers: Another one is what I lovingly refer to as the personal transportation business. And if you'd asked me 5 or 10 years ago, I would've said there's no way you can flip over the taxi business and really disrupt it. There were too many barriers to getting in to it, but the folks at Lyft and Uber said, "You know what? We're going to take this data. GPS data from smartphones, payment information, things of that nature and we're going to flip this model over and give people a great experience."

John Myers: So, it's no longer, 'Hey, the cab will be there in 30 to 45 minutes. If you're not there, it'll drive off." No longer you're not arguing with the cab driver about payment and things of that nature and they've really enabled a new way of doing things for setting expectations and managing those things and they've really done a really good job of managing the way that this works.

John Myers: And I'd say the 800 pound gorilla in the retail space that's really disrupting that is the fine folks over at Amazon, who were definitely one of the initiators of being a data-driven organization. They pull information from their supply chain, from their click streams, from different partners and things of that nature and they've really changed the way that we shop and the way that we look at certain things. And it's interesting to see them now disrupting not just the online experience, but now they're starting to disrupt the on-premises retail experience.

John Myers: But in each of these areas, we're seeing this pace of industry disruption, and in Amazon's case, really changing dramatically the way that we do things, really pushing the envelope on the way that we tackle these types of concepts.

Jack Norris: So, I think that was an excellent overview, John. I'd like to share some of MapR's viewpoints and some of our customer experiences. You start off by talking about looking at data as an exhaust and now looking at it as a fuel and I think just analytics in general, instead of assuming that it's a batch process, it takes a while to get that data formulated, transformed and available for queries, we're seeing organizations really inject analytics into the business, into the operations.

Jack Norris: This example on American Express I think is a really interesting one. They've, for years, led the credit card industry in terms of understanding fraud and reporting it and being able to structure things. Well, the next generation now that they're leveraging with big data has a huge infrastructure behind it where they're making decisions at the point of sale. So, while the credit card's being swiped and within milliseconds, identifying is it fraudulent or not, trying to reduce the number of false positives. So, over a trillion dollars of the annual spend is protected with their back end infrastructure.

Jack Norris: You mentioned Amazon and the disruption that's happening there. We're working with some of the leading retailers as a way to effectively compete and I think the move that Amazon did with the Whole Foods acquisition could be viewed as a response to some of these moves that are going on in the industry. So, we're seeing leading companies like the largest retailer combine the online experience with their store network to provide some unique experiences.

Jack Norris: So, they have not just a single application but multitude of applications that are customizing the web experience and then using the power of their store networks. So, you were browsing this bicycle. They'll follow and have re-targeted ads saying this bicycle is now available for 20% off and you can pick it up at this store location that's just a few miles from your home.

Jack Norris: Another example that's on screen here is a European retailer and they're linking multiple ends. So, that slide that you showed, John, in terms of the data-driven application usage, well, we're seeing the linkage of those. So, it's the back end managing the operations that's linked to the online experience. So, not only are they recommending products that you would be interested in, but they're recommending products that they have a lot of inventory at the nearby location. And being able to join those together drives down cost, so they reduce inventory gluts, they avoid store markdowns, they understand regional trends and route inventory more appropriately. So, it's just a way to really increase profitability by doing this digital transformation.

Jack Norris: And at the root of it, it's how do you take analytics and make them a real-time piece of the entire operation? And that has implications for how data is stored and managed as well as analyzed. I think that-

John Myers: Jack, I think you make some great points about digital transformation being able to be one of the keys to this and when organizations are focused on this type of stuff and your example with American Express and this one with the online shopping experience, they almost inherently have that digital transformation and have that information. We're seeing a lot of organizations that are asking those questions. "How can we get on par with the information that goes with this?" If they are a more on-premise type of thing.

John Myers: But I think your example about American Express is fantastic and the online shopping because as we have more of these types of online concepts, while we're after adding more customers and more transactions, the American Express use case is very much about, "How do we protect ourselves so that we're, in our zeal to get things up, running, out into people's hands, we're not giving away the store and we're not losing what we're trying to do in terms of payments and things of that nature?"

John Myers: So, I think those are two great examples of how data-driven can make this digital transformation.

Jack Norris: Awesome.

John Myers: The next topic that I think you and I wanted to talk about was what were some of the drivers in data management? One of the things that I've really found is that when people want to start getting into these concepts, they have all of these disjointed components. They might have event data that has landed in Hadoop or coming from a streaming platform. They may have customer information that is sitting in their enterprise data warehouse. They might be getting external data about demographics, customer rating, things of that nature and they have all these different pieces and they don't really know how to bring these components together, because making it all work together is one of the key things that you want to do if you want to become data driven and do all these different ideas.

John Myers: Part of EMA research has shown that when we ask end users who are doing this type of work, what are the drivers for implementation? Well, one is the requirement for faster analytical and transaction processing. So, it's not just that ability to have access to the data, but how to do it faster. How to bring in that real time streaming information into what they're trying to do. We're seeing this quite a bit, not just from ...

John Myers: A lot of people, when they think about streaming, they think about IoT devices and things of that nature, but mobile and online applications that are constantly giving transactional data or even pre-transaction information, a lot of people think of this as click-stream data. As we see that streaming information come in, we need to be able to integrate it in to what we're trying to do.

John Myers: And then the other piece is access to those internal and external data sets so that we can bring these things together. So, these are some of our top three core components that bring this all together and are the drivers that people are looking for implementation because they've got all these new pieces of information and now they need to integrate that together.

Jack Norris: Yeah, that's ... Both of those, I think, are adding context to what we've been talking about for quite a while now. This graphic on the left I like because I remember back in 2011, we were talking about, gee, the rate of data growth, the volume ... And now we're looking ahead and it's in the zettabytes. So, the volume continues to, I'll even say exceed our expectations and now with IoT and then number of devices and machine generated content, that curve is even steeper. And we've been talking about volume, we've been talking about the variety, the data diversity. But now we're starting to talk about ... Velocity's taking on a new phase. It's not just the change rate of the data, but it's as you pointed out in the earlier example, it's about the streaming data and the desire to act on that data as soon as possible and understand the context of that.

Jack Norris: And then in the era of cloud and IoT, we're also talking about the vicinity, the location of that data. Because we're recognizing with data gravity, our solution has to extend across locations and coordinate the execution across those locations and as we look at shared data, as we look at staple applications, it's that data that becomes the obstacle to how we ... Not only do analytics, but do things like use of containers and micro-services.

Jack Norris: And the last point I'd like to make on this ... One of the drivers is in machine learning and AI and deep learning, we're talking much more about the algorithms, we're talking Beyond SQL, we're talking about different tools, whether it's TensorFlow or Caffe or what have you, but the real driver of success is actually the data logistics and this is something that was cited in Ted Dunning and Ellen Friendman's latest book on Machine Learning Logistics. There'll be a call to action later if you're interested in the book, but the focus here is understanding the data logistics are really key. How you populate a model, how you train that model and how you make it very easy to swap out. It's not just driving efficiency, it's squeezing out the latency and it's allowing you to intelligently respond quickly to that streaming data, to act on data as soon as possible, to have a lot of agility within the organization. So, that's our perspective, John, in terms of what we see as drivers.

John Myers: Yeah, no, Jack, I agree with you. I think you raised an excellent point, that it's not just the diversity of those platforms but where are those platforms? Are we geographically dispersed? When we see a lot of mergers and acquisitions activities out there, they're not just, "Hey, we're in the same office park," and everything comes together. We've got offices across North America, across the world and then you've got both data center and cloud implementations that all come together, so you're not just talking about a couple of Hadoop clusters. You're talking about the possibility of many Hadoop clusters across multiple components and how do we bring that together? How do we bring in customer data from our Asia-Pacific offices with the customer data from our European offices? And sometimes, not to get too into one area, but with the growth of GDPR, do we even want to mix those two pieces of information? How do we manage that?

John Myers: And having talked with both Ted and Ellen about the book, the data logistics is a key thing, because if I create the world's greatest model for customer experience or for cross sell, upsell or matching and I don't know how to distribute it out, it's as if it never happened, kind of like if a tree falls in the forest and nobody hears it. But that data logistics concept gets to that speed of implementation, and how do we make this work better and faster because if it's just another piece of a science project that's locked in the back office, then I really can't utilize it if I can't scale it to get it across the customer base. I think those are great ways of looking at how some of these challenges to data management, and as you're pointing out, data logistics, some people might call it data ops, really causes some consternation in the way that people are looking it.

John Myers: The next one we would like to take a look at is some of those best practices with data-driven applications. I think, and this is backed up by our research, organizations are most successful when they focus on the end goals, not, if you will, the implementation details. If we have a great vision of where we want to go with our application, let's focus on how do we enable that, not necessarily on the elegance of how we implemented it. I think that's where organizations really get some great value and say, "Let's not worry about the underlying data structure. Let's not worry about some of those components. How do we get this data-driven, data-enabled application? How do we get it out into the hands of whether it be our customers if we're talking about a mobile app, our partners if we're talking about some type of shared supply chain portal? How do we get it to the people that need so we can make it work quickly?" I find that organizations that are focused on the tactical side sometimes get caught in the details. A lot of times IT says, "Well, no, we can't do that," or, "It'll take us 12 to 18 months." When business hears both of those answers, they almost hear, "I might as well not even do this." This leads to a change in the way they want to look at things.

Jack Norris: John, the other part about that, I think what's unspoken there is question your assumptions because there are a lot of assumptions that have been built up over time. Proven success in data warehouse has led to certain practices. Sometimes those practices are actually working against some of the best demonstrated practices in big data. For instance, the assumption that you've got a lot of time to land the data and then deal with the data after the fact, that assumption, if you bring the same assumption to big data, can actually get in the way of some really interesting applications.

John Myers: Yup, and I agree with you. I like to call that the difference between our best practices and our best patterns. A lot of times when we talk about a best practice or a common practice, we think about, "We've always done it this way," versus, "Why did we do it this way." You example about data warehousing and data and things of that nature, a lot of times we would put up gates in front of our data warehouses to say on the pure data gets into the system. Part of that was because they didn't have a lot of space, so they had to make some choices about what was going to go in there. They didn't have all the processing power in the world.

John Myers: I think that's what big data and our data fabrics really bring to the table is we no longer have to make those choices of who gets in and who gets out. We can have all of our data and then make an assessment after we do that. The best pattern of let's get the right data in the right location and doing that, but if we try to take that common practice of saying, "Oh, no, it has to be pure before we can land it. If we can't land it, then it goes someplace else." You're exactly right. In a streaming environment, we need to say, "Hey, how do we look at those particular types of things?"

John Myers: Leading into that, some of the things that we talk about and we've seen from our end user research is speed is key. Your point about the new streaming components, whether it be coming from IoT devices or from real time applications, we need to be able to match that speed, not just of the flow of the data but the speed of business. If I have a trend, I need to be able to catch onto that as opposed to saying, "Hey, well we can get this into our change control process in three to six months." A lot of times some of these trends, they come up and they disappear, and they're gone within six weeks, not three to six months. If organizations can make this a simple process to make it work, can we make it scalable? Can we make it configurable so that we're not going back and hand coding or custom developing each particular piece? If we can take an approach that says, and back to Ted and Ellen's talk about machine learning, can we take a model and then say, "Hey, we've adjusted it, and now we can make that minor change," as opposed to having to build it up from the base at that point every single time.

John Myers: Another component is how do we get this data into the hands of the right people? One of the hallmarks, in my opinion, about what a data-driven organization is all about is how dispersed is the information in the organization. Some of those older school versions, they would hold all that data and that decision making power in the executive suite or in the management team and send down edicts. Data-driven organizations are all about how do we get the right data into the hands of as low a level of an employee as is most effect, and how do we let them use that data to make sense.

John Myers: Some people think about this and they go, "Well, EMA giving our corporate data to our warehouse team or to our customer care agents, things of that nature?" No, not really. What you're doing is you're putting the right data for the warehouse team or the fulfillment team. Maybe they want to see how they compared against what they did last year, what they're doing against a different shift, how they're doing in terms of their SLAs. When they have that data, they're going to do better as a group. Our customer care agents, getting back to that fraud or that risk management, give them that decision that is saying, "Hey, based on these three factors, you can make an offer of credit, or you can take a larger order than we were before," but not letting them, say, do it willy nilly and say, "Hey, let's give them a red, yellow, green," or some type of indicator that says, "Hey, you now have a greater sense of autonomy and impact into some of these issues." In my mind, speed, simplicity of implementation, and how do we get the data out there are the true drivers of these data-driven applications.

Jack Norris: Yeah. I would add two other points, too. One is that when we're talking about sharing the data, think different views of the same data, not duplicating the data because we're really trying to decrease the touch points and the duplication. Secondly, that data-driven applications, some of those consumers are the applications themselves. How do I get the data injected in the application? What you have is this series of automated processes. They're informed by data exploration by individuals and line of business, but you're operationalizing the data to have it make an impact.

John Myers: I agree with you. I think that when we look at a data-driven organizational, and as I talked about, we've got our exec team, we've got management, we've got our frontline employees, we've got those partners, we don't want to make a replication of each of those data sets. For our junior analysts or our frontline employees, we may not want to share all the Social Security numbers or all of the personally identifiable information, but we want them to have enough so they can make an identification or they can make a decision. Our partners, we may not let them see any of our PCI or PII type of information. If we can make it so that each of those groups is operating off of the same data set, we're no longer arguing about, "Well, what replication do you have?" They're all hitting that same data, and they are all bringing the same information to the table.

Jack Norris: In my mind, access to this data needs to be proactive and centralized. The graphic on the left-hand side is something I found is really great. It's the Sisyphean task where you keep rolling this rock up to the top of the hill, and when you get up top it rolls down the other side. That's to that point of we don't want to have to redo these environments every single time. We want to be able to make it so that when we do something, we can continue to get incremental value and not have to start at the bottom and continue down that process. The image on the right shows it's a wave of data. If we can ride that wave of information, we can bring in a much better set of data in terms of what we want to do. If we're faster with our access and we can flex or flow with the amount of information that comes through, it really allows us to take things to the next level.

Jack Norris: At EMA, we call this our hybrid data ecosystem, or our logical framework for next generation data. What we do as part of our framework is that we say, "Hey, rather than put technological constraints or some of those things we talked about, those barriers from the past at the center, let's put what the business wants to do at the center and allow our systems to be integrated, collected, and brought together to meet those challenges." We have all those disparate data points and data sources. How do we integrate those from an information management, security, and integration perspective so we can use them to the best effect? Then how do we have those applications that run around the outside of that? What we really take a look at is how can we share the metadata across all these different platforms. I don't mean just technical metadata about, for lack of a better term, describing a database, things of that nature.

Jack Norris: How do we look at the technical, which is our entities, our attributes, things of that nature? How do we merge that with our business metadata about how do these things work together to create the concept of customer, product, location, etc. Then how do we look at the processing of that information so that we can understand, "Hey, this system works well. This one doesn't for those particular queries." How do we bring that all together? When you do that, you're able to drive the adoption with this vast number of whether they be partners, internal technical resources, internal business resources, and really bring all that together to create one logical framework. We think if you take these concepts and build out something that supports these ideas, you'll have a great way of being able to manage this, particularly as we get into managing big data, managing streaming information, and continue to push down that path.

Jack Norris: That logical framework that you laid out, I think that's an excellent way to set up the description for what MapR has delivered. If you look at the MapR Converge data platform, it's a data fabric. It stretches across locations, and it converges a variety of data, a variety of processing, with a common security, a common management framework underneath. What that does is not only does it drive efficiency and not only does it drive productivity for administrators and developers, but more importantly, it squeezes out delays and latency. To transform that analytics from a historical reporting function on the data exhaust to something that's impacting the business as it happens, it requires this underlying architecture, this underlying data fabric, if you will.

Jack Norris: We set up the requirements a little bit. If I were to dive into it and say, "What is different about a data fabric? Why aren't you able to do the same thing with, say, existing solutions today such as a storage framework, or virtualization, or a NoSQL database?" it comes down to the original three V's that are extended to take into account what we've seen evolve recently. A lot of solutions hit the wall when it comes to volume in terms of the size of the data, the number of the files, the number of contents that you need to handle a net fabric. Or it's a variety where you've got different solutions, but they require their own separate infrastructure. Increasingly, we're seeing this information that's in motion that needs to be combined with the data at rest to drive applications. That's all part of the same fabric.

Jack Norris: While that fabric can be distributed all the way to the edge and include central on premise data centers and cloud, you've got the visibility. You've got a global name space that reaches across it. You can make decisions based on location to optimize for performance, or cost, or government regulation. Perhaps the most critical component is that this data fabric is enterprise grade. It can serve as the system of record. It has all the high end disaster recovery, and data protection, and security so that you can press this fabric for your operational data, your run the business fabric, if you will. It's hard to do that justice, John, in just a few minutes, but I wanted to lay the groundwork and then get into some examples to show how that works in action.

John Myers: No, and I agree with you. I think that as we talk about those initial foundational concepts of big data, the three V's, you add some great pieces. Vicinity, location awareness, whether it be where the data resides or where the data comes from, that brings you greater context and a greater level of complexity that if you did just a bare metal cluster, you weren't going to have that issue. That visibility, being able to see that data and being able to manage the visibility into that data, I believe, is going to become key in the way that we see things not just from a privacy and a security aspect but increasingly from a regulatory aspect of it. Then the consistency and the integrity. The better the quality of the data that you're pushing into these applications, the more trust you're going to have in the results, and the better the results are going to be.

Jack Norris: Yeah, yeah. Let's get into some examples. The first example I have is, I think, something that everyone can understand, and that is a connected car. We have several customers that are automobile manufacturers that are looking at not only better handling their vehicles today but the future of autonomous driving. Just to give you an indication of the data requirements there, the in production test vehicles today are generating 50 terabytes per vehicle per day. When we talk about a data fabric, that first processing note is the trunk. That has to be coordinated across a huge volume of vehicles. We've got other examples where it's just the beta test cars per volume, which is thousands of vehicles. In production, we're talking about over 250 million vehicles that will be connected globally according to research.

Jack Norris: The issue there is how do you collect all this data? How do you learn across all of these locations? You want to learn globally but drive that intelligence back out to the vehicle so that you can act in a very high speed, low latency manner. When we're talking about a fabric, it's not just the data flow but it's the processing as well. It's this whole connected fabric to drive that processing, drive these next generation applications, as well as new applications that will be at the edge itself, cars that can interact with each other, cars that can interact with smart cities. The obstacle is the data. Being able to handle these large sets of data, being able to understand just the deltas so you can compress and have that information flow is really the requirement of a data fabric.

Jack Norris: If you look at how that works, we see that across connected car, oil and gas, metical equipment. There's huge number of resources that have this edge processing as part of it. We haven't talked much about the underlying architecture, but the applications themselves are moving away from being defined by the processing. Instead of a database application or a file application, we're seeing a use case that's using a variety of methods to process the data and drive intelligence. When we talk about our fabric, we actually have a web scale storage component that's part of it. We have a database component that's part of it. We have streaming that's integrated in. Then it supports a series of standard APIs so that open source components, legacy components, existing applications can run directly on that fabric. This isn't a case of here's a great architecture for all your net-new applications. This is a great platform to allow you to use your existing and drive down costs while you're also innovating on the fly.

John Myers: I think you make a great point, particularly in that connect to car example where the way I usually describe it is, I don't want my connected vehicle to figure out a pattern about brakes failing, send all that data back to a central location who then makes a decision about okay, hey the brakes are failing, they should do something. You wanna have that processing ability to look for those patterns of engine fatigue, brakes failing-

Jack Norris: Yes.

John Myers: -things of that nature to happen there at the edge or as close to where a decision point can be made. Whether it's me driving the vehicle and the vehicle going hey the brakes are about to fail we're going to assist you over to the side or in an autonomous component, being able to make those decisions at the edge, that's gonna be key and like you talked about, when we look at streaming, we look at these things, we can't often times wait for something to go all the way to the center, get landed, get analyzed to come back we need to be able to look at the stream from where the data is created all the way through as it's traversing to the central core, but the thing is you also pointed out, not everybody needs to see all of those components. Product development probably needs to look at it down at the millisecond level, whereas business might go okay, we just need to see the average fuel economy or the average usage of these types of things as opposed to what are the brakes telling us every millionth or thousandth of a second.

Jack Norris: Exactly. I think that's the trick, what's the noise, and what's the signal within all of this information and sometimes it takes a subsequent event to help inform what was the signal. The last example I wanna use just ties in this whole idea of the kind of combination of capabilities in the fabric and what that can mean to a business, so in the healthcare space we have a customer that would take healthcare data and then process that data for their individual customers. Some of those were hospital, some are clinics, some are physician groups, insurance companies, patient kiosks, et cetera. They would take the data and do different instantiated views. This was a batch process that would be updated and then put in the format that those individual customers could consume it with. What they've moved to is a process where this stream itself is a system of record. The stream is not, it's persisted for long periods of time. They can persist that up to years, it doesn't have an arbitrary two week time to live. They can look at any point in time and see what the status of that electronic medical record was, and say well what happened to it on August 1st and how did that subsequently change.

Jack Norris: Then all of the customers are consumers, so it's a publish and subscribe environment. Any update is immediately pushed to those consumers and it's consumed in the format that makes sense for the patient kiosks, it's a search index. It's a database table for the hospital application and the payer, so it's moved from a batch process to a real time process where information is consumed by topic and it's a digital transformed process through this architecture. It drives efficiency, it drives greater visibility, and most importantly it drives better speed and reaction.

John Myers: Jack, one of the great things about the healthcare system is, one, it's an environment that's ripe for us to have a great impact on. They've got all of the different things that we've talked about so far. They've got those common practices that haven't really done what they needed to do so they need to really upgrade a lot of their tech. They've got streaming information coming through from whether it be monitoring in hospital or out patient monitoring that's becoming more popular because it allows to lower costs and improve patient outcomes. You've got all of these different pieces and a lot of these folks that are in this industry whether they be doctors, clinicians, nurses, et cetera. They're all people that for lack of a better word train scientists that would say if we can give them this data they can come up with great new insights and being able to say hey, let's collect this data. Let's make it really work, and it becomes this better way to look at things. You no longer get these situations where we're talking about issues around healthcare, we're talking about true innovation across the healthcare industry itself.

Jack Norris: Yeah, excellent.

John Myers: All right. Just kind of my little wrap up piece, about where EMA thinks that the industry is going. Keep up or get left behind. I think that the old school way of doing business, if it's not on it's way out the door, it's definitely getting close to it, and the disruptors in the industry are really driving both change and a way that things look at. If you go back to those three industries that I talked about, distributed content, distribution, we talked about the failings of some of our regular traditional models and how they're not keeping up and how the wave of the future is being able to say any screen, anytime, matching people with what they like to do. That's really gonna be where it's gonna be at.

John Myers: In terms of the retail industry, again as you pointed out, as I talked about, if you're not keeping up in some of these areas, we may not see some of those brands that we have thought of for many, many years as being part of the landscape. They're gonna disappear because others are ready to move up and take their position in the environment. To that point about moving at the speed of business, I think that with smart phones, connected vehicles, et cetera we are continuing to increase that pace of business, and I think that as we get farther down this component it's gonna be about how well can we keep up with the pace and how well can our data management environments support this, and I don't mean from just a onesie, twosie, type of thing but from that ongoing data logistics perspective of we got one project and now we got the next one. Now we got the next one, and making sure that we can do that and in my opinion the best way to do that is through a coordinated environment such as the EMA hybrid data ecosystem or the data fabric, the converged platform that you've described from that bar, because if you don't have that and you have to rebuild at each stage you can't do the top two bullets, you're gonna be stuck constantly in a back log and missing those great opportunities.

John Myers: I think that's there we're at and then at this point I think we have a little bit of Q&A that we're gonna do 'cause I've seen the questions that are coming through on the participant feedback so, David, what great wonderful questions have we gotten from the audience today?

David: Thanks, John. Yeah, let's get started. What ... One question that came in, what challenges are normally faced for such migration and best practices?

Jack Norris: Actually, it's two questions there. One was cases of migrating from relational database to JSON document data store and then the second part was the challenges. It's interesting that the example I shared on the online retailer that was a migration from a DB2 environment to JSON There's a lot wrapped up in the question, I'll attempt to kind of unravel it a little bit. One is, how do I lower costs, so some of it is I've got a certain environment here, I'm just trying to lower costs 'cause data volumes are growing and my footprint is growing. You can deploy Map R in those environments and take either the fast growing data or some of the cold data, put it on the platform and reduce just the cost of the data, not touch some of the downstream applications, so it's basically, I wanna put something in there, lower costs as quickly as possible and avoid a lot of rip and replace that would be disruptive to the organization.

Jack Norris: A second driver is not I wanna drive innovation, particularly in kind of actual environments where I see a lot of change rates of the data so the advantage of JSON document store is document by document can change. It's basically the defector of the standard in web interchange formats, and in that example, online it's easy for them to do product recommendations et cetera. You can have many different applications use that document format because we treat it natively and you're not having to flatten it and lose fidelity et cetera. There's a better demonstrated practice for treating the stream using that as a JSON, using the Map RDB that has a native JSON document database, so you get fast ability to execute on that and you haven't somehow disrupted additional use cases around that. Then we do see ... just to complete rip and replace of some of the relational environments particularly those environments where they're trying to do operations and analytics, and have a series of different tools that they're trying to nit together and replacing that with the single instance. We've seen that in the security place, with the replacement of relational databases, Map R. We've seen it in the online environment. We've seen it in a lot of different environments.

Jack Norris: There are some differences in terms of how that's architected. When you get to best demonstrative practices you have to look at what you're trying to do. The ... I guess in the spirit of time, we'll take ... We won't try to instruct everyone on the best demonstrated practices across everything. I think it's suffices to say what we're looking for is how do you drive agility? How do you really support multiple applications on that platform and I think that in general the road to digital transformation is not a huge hairy multi year project, it's a series of short term tactical steps which keep moving you in that direction. Some of those are as I mentioned, kind of offloads where you're preserving the existing application but you're doing it on a platform that provides scalability and much lower costs. Others are new applications that are taking advantage of different data stores, pulling those together typically a publish and subscribe environment and then having an interface on top.

John Myers: Jack, just to follow up on that, you're right. It's about that agility and what we're seeing is that from an RDBMS perspective some of these data models have become calcified or a little too rigid for the flexibility that we need to have. The JSON data format really gives that flexibility but also gives us enough format that it's not just a free for all, and I think that's one of the things that JSON really brings is that when you start storing in that document type of format you can have your set components but then had that flexibility that change doesn't have to be this monolithic thing that moves through the process, we can add attributes or subtract them as we need and give flexibility to people who are developing these types of applications without having to say all right let's funnel it all the way back through the DBA and have the scheme of what we populated in advance. We find that to be very good and it goes to that point that you're raising, flexibility being able to have that nimble approach to doing the things that people need to do.

David: That's excellent. In the interest of time, let's address one more question and then we'll wrap it up. What other approaches are out there for data fabrics?

Jack Norris: There are different companies talking about data fabrics. Some of them are approaching it from the storage perspective and trying to kind of scale up on a storage model. What they lack is the analytics, the processing is integrated into the fabric, so you lose that very fast intelligent reaction. Some are from the ETL perspective and they're focused on, we've got a lot of sources, and a lot of destinations, and if you look at that together that's a fabric, but then missing out how do you handle it as it's moving between and how does that function as a whole? Others approaching it from a virtualization standpoint and trying to have a fabric be more of a federated approach and you miss out on the conversion. It really comes down to, how do you have scale and reliability and availability at the same time, and the answer to providing that is really hidden in the underlying infrastructure of the data system underneath. That's where Map R is focused from the beginning and that's where all our IP is located.

John Myers: Jack, just to follow up on that. I've seen a bunch of different ways that people are doing these types of data fabrics but some of the core components that the Map R team has addressed that global name space, being able to be distributed across multiple geographies really are some of the core concepts that people need to appreciate because sometimes when they think data fabric they often times think okay, it'll work in my geographic location plus a couple of other areas and when they start to think about a global reach some of those things start to fall down.

John Myers: Another area I think that when we talk about a data fabric often times there's a constraint that's placed on it that says you have to do data access or processing a particular way and again, I think that the way that the Map R team has addressed this to give multiple tools and not really put constraints, but to give that menu of options, so my traditional database people can have an option. My data application developers can have an option. My data scientists can have an option. People can all get at this without ... That gets to Jack's point about I don't have to replicate because everybody can bring their tool of choice to the table.

David: Great. Thank you john. Thank you Jack. That is all the time we have for today, if you have additional questions you can engage with us via Facebook, LinkedIn, Twitter, or you can contact us through the contact information on this slide. We also have a couple additional resources, the machine learning logistics book by Ted and Allen that jack referred to as well as the HDA and data driven applications that John was referring to. I will be sending out a link to the recording and the slides as well as these assets in a follow up email shortly after this event. Thank you again and have a great rest of the day.