Is a Business-Oriented Data Strategy Necessary?


Scott King

Managing Director, Brilliant Data

Todd Freemon

Territory Director, MapR

Data has been called "the new oil," and we agree, though not for the same reasons as others who use the phrase. We see data as the lubricant of a well-tuned enterprise, easing the decision-making process at almost every level of the business. Placing timely, accurate data in the hands of experienced managers is like oiling the gears of a complex machine. In order to extract the maximum amount of value from data, a carefully thought out and easily understood data strategy should be created. Most importantly, it absolutely must align to the overall business strategy.

Please join Scott King, Managing Director of Brilliant Data and Todd Freemon, Territory Director for MapR as they discuss the steps to create such a data strategy.


Scott: Thanks to everybody for coming. We're going to cover a lot in this webinar and I really want to get to that so we won't spend a lot of time on preliminaries like some other webinars you might have been on. We'll spend just a few minutes on why data strategy is necessary, but then after that we'll get right into the step by step on it. So, by the end of this presentation you should have a solid understanding of how to create a business oriented data strategy. So, if that's what you want to build you're in the right place. As David said, my name's Scott. I'm the Managing Partner of Brilliant Data, roughly 20 years IT consulting experience. In addition to consulting I've also done training about data science and worked with publishing Packt Publishing both as an author and a technical editor.

Scott: So, now let's just get into the question, why is a data strategy necessary? What value does that bring and what are the consequences if I don't have one? If you look back over the last decade or so, the growth of data has just been phenomenal. There was a study a couple of years ago that estimated the amount of data generated in 2015 was equal to all the data created before that going back to the beginning of recorded history, so that's about ten thousand years. So, we created more data than that in just one year. So, businesses are saying this too. We're generating a lot more data, and part of the reason for that is it's not just people anymore, devices are also generating data. Computers, manufacturing equipment, even things like refrigerators and phones are creating data.

Scott: So, there's a lot more data these days and it's coming from a lot more places. And, companies like yours are realizing they can take all of this data and put it to use to help them run their business. It used to be if you needed data to help make a business decision, you were limited by the amount of data available. There just wasn't much of it and maybe the quality wasn't go good. And, that's not really the case anymore because now we do have so much data available. And, because there's such a glut of data in the world now, business leaders have come to expect that there's an answer to almost any question, because there's data somewhere in the company that answers it. And, if they're told those data aren't available, they don't understand that and they get frustrated. So, on a personal note here, I've been on both sides of that.

Scott: Several years ago I took a position as the Director of Business Development for a company that sold IT equipment. And, my team's responsibility was the second largest product line, the second biggest OEM, and that manufacturer made up again the second largest chunk of the company's revenue. The year I came on we were tasked with growing that manufacturers revenue in the company double digits. So, we needed to go from 80 something million to over a 100. Now, I'm a big believer that to get where you want to go you have to know where you are first. So, I started asking a lot of questions about the business, mostly to IT because they were in charge of the ERP system, right? And, I just, I wasn't getting any answers. And, it was frustrating for everyone because as much as I hated not getting the answers, IT hated not knowing the answers. They felt like they should have had those data. Well, it turns out they did. What we needed to know was in the ERP, but there wasn't an understanding of how to turn those data into answers. There wasn't a process and there weren't the right tools to get the answers from the data so it was just kind of sitting there.

Scott: So, we went down this path of organizing the data into a system dedicated to analytics and we brought in the ERP data and formatted it the way that we needed. And, we had a process for that and tools for analysis. So, we started looking at the data, we'd get year to date data everyday and before long it was about, I guess it was about two months, we knew everything that was going on in the company. We knew what got sold to who, at what amount, and when, and we got an understanding of what our customers were doing and what they needed. We blew through that 100 million mark. I mean not only did we exceed the 100 million, but we went from the number two OEM in the company to number one. We wouldn't have been able to do that if we hadn't started looking at the data and understood what it was telling us. And, one of the by products of that is afterwards when I'd go talk to the IT guys they wouldn't hide from me any more. They wouldn't see me coming and hide under their desk. But, you know that's what business is like these days.

Scott: People think there should be an answer for pretty much any question. I believe in smart phones, right? You're out having a drink together, if you don't know the answer to somebody's question you pull out the phone and you Google it, and you get your answer in that instant. So, to go back to the question from earlier, why do you need a business strategy. It's because everyone thinks the answer should be available and you've got so much data nowadays that in order to be able to answer those questions you have to know where to look. You have to know what data you have, where it is, how accurate it is, and if you don't have a strategy you're going to fall short on most if not all of those. You may not know what data you have. You may not know where it is or how good it is. And, you may not know how it can be put to use to support the business strategy, and that's just frustrating to everybody.

Scott: This slide shows what that looks like from a high level. The data strategy itself is, or should be, wide in scope and ideally it would cover the whole enterprise. We've had success working with individual units that really, the ideal is it's [inaudible 00:06:02]. Below the strategy, you see the three main areas that are addressed, the data governance, data architecture, and data analytics or data use. Data governance is one of those topics we could do a whole webinar on and we probably will sometime in the future, but it can get really in depth if you want to do it right. The biggest thing to remember about data gov is that like any other form of government it's an ongoing business process, it's not an IT project. You can't just buy a tool from a vendor and have IT implement that and you're covered, it doesn't work that way. Data gov creates standards for how data ought to be treated. So, everything from who enters what data using what standards, how often those data get audited or checked for accuracy, who's allowed to make changes to it and under what circumstances. I mean it's governance. Just like you would do with any other asset like cash or real estate. You don't have people doing whatever they want to with real assets, you govern them, and you check to make sure people are following those rules. It's the same thing here.

Scott: Now, data architecture is probably what you think. It's how the systems that store the data are set up and what does the data look like in those systems. And, something that is kind of new is that we have more and better tools now for analyzing unstructured data which is stuff that you wouldn't normally put in a database. So, emails and other kinds of texts, video footage like security cameras or quality control pics from manufacturing, maybe even the audio from phone calls with customers. That's unstructured data and if you want to use that it's got to be covered in the data architecture. So, we're doing this call with MAPR and part of the reason for that is their product is a great choice of a data platform. You can put structured or unstructured data on it, it can handle streaming data like it's nothing, and the tools for machine learning are built into it. So, they'll talk about that more in a minute. But, yeah, when you're working out your data architecture, their platform answers a lot of the questions that come up, a lot of things that would be major issues on other platforms.

Scott: Okay. Finally, you're not just storing data and organizing it, and governing it for nothing, right? There's something you want to do with it, probably a lot of somethings. And, that's where the analysis piece comes in. What kinds of insights do you want to get out of this data? That's what we're shooting for. So, why do we need the data strategy? Well, we've got a lot more data than we used to, it's coming from a lot more directions than it used to, and it's getting used a lot more than it used to. So, now that we've covered that let's get into actually how to create one.

Scott: Step one, you assemble the team. Saying it that way invites the question, why does it have to be a team, and can it be the team I already have, the people who report to me or the people that I work with everyday. It should definitely be a team and that team should be as cross functional as you can make it. Remember earlier when we said the data strategy should be broad in coverage? That's almost impossible when team members are from a single unit. If you don't bring in people from different functions two things commonly happen. A, you miss out on some valuable insight and that usually means you have to go back and redo something. And B, you'll hit more resistance when it comes time to roll this thing out. People are resistant to change anyway. People who feel that they weren't represented in some way are really resistant to change.

Scott: So, we need a cross functional team, but who needs to be on it? When you're considering ideal team members there are two big questions that you have to ask. And, the first is who has the necessary insight for planning. We said in a previous slide we don't want to miss out on useful insight that will save time and trouble down the road. So, who has that insight? Right now that may not be obvious and names are probably not popping into your head. But, at the end of the webinar you'll have a much better idea of who they are because after we discuss the other steps you'll know what needs to be done. And second is who has the necessary influence for implementation? Casting a vision can be hard, right? Implementing that vision can be even harder. If you want this to succeed, you'll need help not only with formulating the strategy, but with rolling it out as well. So, it helps to have people on the team who are known influencers and can get corporation from other people. And, these people may or may not be senior, they may just be well respected by their peers and can influence them.

Scott: And, there are three categories of people on this thing. First, there's the Executive Sponsor this is a case where the more senior the better. Projects that don't have executive sponsorship fail disproportionately more than projects that do. Creating a data strategy is a big project and the benefits are huge, it deserves to have the clout of a Executive Sponsor behind it to give it feet. Second, you'll need people who manage data. So, IT is an obvious place to look for them, but frequently people outside IT, in the BUs, have those kinds of responsibilities as well, it's worth your time to seek them out. Third, you have users. Again, if we're not using the data effectively then why are we doing all the rest? These are people with deep domain experience. They know what the data means when they see them and they have a vision for what could be done with them. So, power users who are maybe a little frustrated with the current limitations are great here. They'll be quick to buy into the idea of getting organized and doing more and they'll usually have ideas. So, just to summarize on the team front, we need a cross functional team of people with insight, ideas, and influence. We don't need the whole company but we need a good representation of the company in a small team.

Scott: There's an old story about three blind men who encounter an elephant for their first time, I'm sure you've heard this. Since all three of them are touching a different part of the elephant they formed very different ideas about what it really is. When you think about the business strategy, your experiences, your role, and your expertise are going to form your understanding of it and your interpretation of it. Once you form the team and you get together to discuss the overall strategy for the business, you're going to get different perspectives on it. So, someone from sales is going to look at it differently from say somebody in HR, and that's good. But, the value of that cross functional team you put together in the first step is that all of you come away from that discussion with a more holistic view of what it means. You'll also have the advantage of seeing where maybe this project from sales and this project from marketing overlap. You'll see where the duplication of efforts in relation to data are happening across the company, and that gives you the opportunity to cause a savings in time and money.

Scott: So, when you talk about the business strategy you're going to talk about two broad categories, what it means for data in the long run, and that might take some imagination. So, a good way to think about that one is just to imagine "the day in the life" a few years down the road. If this vision, this strategy is successful, what does that look like? What'll need to be in place then that currently doesn't exist? But secondly, you also talk about what the BUs planning to do to turn that strategy into reality. You talk about tactics, and you turn those tactics into use cases for data. So, if the business strategy says we need to launch a new product to get into an adjacent market, well that requires several BU's to have good data and to know how to interpret it.

Scott: So, product needs to know what the market wants. Sales needs to know what customers, which customers bought a complementary product within the last couple of years because that's likely your first targets. Marketing needs to know what customer segments are the most likely to respond to a promotion about the new product. Executive leadership needs a way to monitor how well those BU's are performing, but also how the market's responding to the new product. Are they buying it? Who's buying it? What are customer's saying about it online? Are there any good ideas coming from social media and comments that customers are making about how to improve the product that would be easy to implement and we could do now? So, you see where I'm going with this? Part of the strategy was to grow by adjacent, but each BU had a different tactic to make that a reality. When your team discusses those tactics and the use cases that come out of them you can become to understand the data needs and those use cases will be crucial to your data strategy.

Scott: So, quick little story. My iPhone is a little annoying sometimes. I don't like turning on the WIFI because it makes the battery run down faster, but I do like asking Siri for directions. The problem is that WIFI is part of the phone knowing where it is and it's constantly reminding me that if I just turn on WIFI it would have a much better idea of where I'm starting from. And, you see the problem right? Siri, bless her heart, wants to know where I am before giving me directions. The phone's smarter than me in that regard, because it knows if you're going from A to B it helps to know where A is first, and not just vaguely or generally, the more sure you can be the better. So, the moral here other than I should just get over the WIFI thing, the moral is you've got to know where you're starting from. And, in today's environment of software's the service, cloud services, data vendors that you can license data from, odds are good no one person in the firm has as complete knowledge of all the firm's data assets. In the past, that was a lot easier, but now not so much.

Scott: So, you get together and you brainstorm about the firms assets, data assets. If you have a data warehouse, IT probably has a good handle on what's in it, but the problem is not everything's in it. Unstructured data? Yeah, probably not. Licensed data? Maybe. And, how many people outside of sales know what's really in the CRM? Especially if that CRM is in the cloud and paid for by sales. So, you have a situation where IT knows part of the picture, the BU's know part of the picture, but no one unit have all of that knowledge. So, we get the team in the room and we start brainstorming about what data assets the firm has, and you start listing out what you've got. And, sometimes, well actually a lot of the time listing one asset either reminds somebody of another one or it raises questions. Things like oh yeah we get data from the S & P? Okay, what's in that? And, those kinds of questions can spark really good discussions that generate good ideas. Once you've got a good list of your assets ... And, by the way that list will undoubtedly change over time, it should. But then, you can categorize and classify the data, and begin to organize a catalog.

Scott: So, to categorize your data you ask questions like, are these data transactional? Are they structured or unstructured? What business unit is closest to these data and use them the most? And, out of those questions you'll begin to see how they can be categorized in a way that makes the most sense. In addition to categorizing the data, you also need to classify them according to their sensitivity or the importance to the organization. So, your customer list is something you don't want people outside the company getting their hands on, but internally it's usually not a big deal who sees it. Certain financial data on the other hand, there are definitely limits on who should see that internally, especially right before an earnings call, if you’re public. For that, you could have as few as three classifications, like non sensitive, confidential, and critical. Or, you could expand that out to account for use scenarios and have more like a dozen. The number really isn't important. It's just important that you undertake classifying the data and taking action based on those classifications.

Scott: So, once we've identified data and then categorized and classified them, it makes sense to create a catalog of those assets and spread that knowledge through the organization. So, how you present that isn't terribly important. I mean you could just get away with a spreadsheet, but what works really well is a wiki. Wiki's aren't hard to spin up, they're easily modifiable, you can control who's allowed to modify it, and it's accessible from a browser. But, whatever you use, the catalog should include the name of the asset like customer list, category which could be master data, the classification which would probably be confidential, the format which in this case is tabular, the location which would be the name of the database where you store that data, and the name of the customer table list, or the customer list table, I'm sorry. And, you should assign a quality score. A quality score just reflects the teams measure of how reliable, or up to date, or accurate that data asset is. Once you've gone through this step, you know what assets you have, you know where A is on your trip from A to B, and your phone won't fuss at you anymore.

Scott: Okay. So, let's review where we are at this point. We've put together a small team from several different areas of expertise and that team has discussed the overall business strategy in terms of what the various units are going to do to turn that strategy into reality. From those tactics we've identified use cases that require data. We know what data assets we have on hand and we've even given them a quality score. So, the next step is to identify what the actual data needs of the use cases are, and from there we do a few things. We determine how many of those can be satisfied with the assets in the catalog and we determine what gaps exist. So, maybe supply chain has three use cases, and out of those we have data today to deliver on two of them. They're data available, they're in a reliable system, they've got a good quality score, so we're good to go on those two. But, that other use case, there's a problem. We don't have that data in our catalog so there's a gap in our ability here.

Scott: So, you've really got three ways of getting data. You collect it internally or from customers, you figure it out from data that you do have, or you get it from a third party. So, for gaps we have to ask which one of these methods is going to fill in that gap. If you can put together an internal program to begin collecting the data that's great. If you can't collect it on your own then you might need to get it from a third party data vendor who licenses their data. But, the two issues with third party data are there's costs associated with that and different vendors have different restrictions on what they'll allow you to do with the data. You have to do the cost benefit analysis to know whether it's worth the fee sure. But, you also have to read the use restrictions and the license agreement to be sure that you can do what you need to for the use case.

Scott: So, let's say this third use case from supply chain requires you to combine data from FedEx and UPS, and use that combined table? Well, odds are good that's not going to happen because they probably prohibit combining their data in that way. I've seen that happen, I mean not with those two, not with FedEx and UPS but with others. They don't want you to combine their data with their competitors. So, it's essential to read the licensing agreement and know what those restrictions are. And, the thing is not all third party data comes with a price tag. You can get some really great data from the government like weather, consumer spending, economic indicators. They've even got consumer complaint data. If you're just getting started with using outside data that's an excellent place to start and begin to get used to that. If you're B2B you probably already know about Experian's demographic data and they even go so far as to segment the people in their database. They can tell you detailed characteristics about the people in those segments. For B2B, you're probably more interested in standard and porous data services. And, in case you're wondering, full disclosure, we don't get money from Experian or S & P so now we're not pushing their services, we've just worked with the data before and we know it's good.

Scott: Another option to fill in gaps is to figure out what you know or what you need to know from what you already know. So, let's use a B2B example here. You've been doing business with a customer company for a long time, you have at least 18 months of transaction data from them. So, from that data you can calculate all kinds of things, the probability that they'll stop doing business with you called customer turn. You can also calculate lifetime value which is immensely valuable for making certain decisions about how your company will interact with that customer. You can calculate or figure out all kinds of things from your existing data. So, keep that in mind when you're looking at options for how to fill in a gap. So, when we source additional data whichever one of those three methods that we choose to do that, it should get added to the catalog. And, your catalog should be continuing to grow as you add more assets to the portfolio and get more data under management. Which brings us to the next step.

Scott: Okay. The next step is to plan your management and governance. We've got an understanding of the data that we need, we know what data we have and we've made a catalog of them. And now, we've got a plan for dealing with any gaps that exist. So, now is the time to talk about how we're going to keep that situation manageable. And, to do that you need to determine what the life cycle of the data is. How do we get the data? If it's ours, how did it get input? Where is it stored? How long is it stored? How often is it updated and who's allowed to do that? Should it eventually be deleted and if so under what circumstances? So, to do that we go back to the catalog and figure out the life cycle for each named asset in that list. So, that's basic management, but there's a step beyond that if you're willing to take it and that's called data governance. And, this is another one of those topics you could just, you could create a whole webinar about and we'll do that in the future. But, there's a lot of misconceptions about data governance that need to be cleared up.

Scott: So, let's start with the most important point about it being a business process and not an IT project. Data's an asset so you govern it like an asset. To do that right you need two categories of people. The data governance council which is usually made up of senior managers and stewards who are people close to the data who understand it and how it's used on a day to day basis. Now, the council writes the data governance charter and sets the goals for the program. They decide how and when audits are done to measure progress. They take all of that and they hand it over to the stewards and it's the job of the stewards to turn those goals into concrete standards and policies. So, for all intents and purposes stewards are the owners of that data and are held accountable for things like availability and quality of the data, it's part of their job description or it needs to become part of their job description. Once the stewards have established the standards and the policies the council conducts periodic audits to measure their progress and to help remove any obstacles that come up. So, the end goal is that data governance just becomes part of the culture. And, when you get to that point, a lot of the worries that other businesses have about data, about availability and quality, those things aren't really issues for you then.

Scott: Okay, everybody knows you don't just go out and build a billion dollar office building on a whim, right? I mean I tried building a deck extension once without a plan and I ended up spending way more than I needed to because well it was just dumb. I should have taken measurements first, and drawn it out, and then tried to build it. But, to build anything meaningful, like a house or a office complex, you need an architect to plan it out and say, "Okay, this is what it should look like. This is how it should be built, with these materials, and so on and so forth." When the data architecture is being designed, certain questions need to be answered and here's a good list of those. If the architect is worth their salt they'll ask users about how the data will be used and they'll design accordingly. The real point of this slide is that there need to be answers to these questions and they need to be asked. So, if you can't get an answer about how data will be moved from an operation database to the data warehouse for example, that's a major problem, and it's a major architectural problem. Your team has to feel comfortable with the answers to these questions or you shouldn't move on.

Scott: Just like there were gaps we identified in the data needs there will probably be gaps in the architecture. Maybe you want to do a kind of analysis that you don't have the tools for right now. Or, maybe you want to add too much data to the data warehouse and it can't handle that much. We have to do the same kind of gap analysis here that we did for the data requirements, by looking at what's needed for the use cases and compared that to what we currently have. If a use case calls for analyzing unstructured data and all your tools are for structured data that's a gap. So, is that something that can be fixed in the current budget cycle or do we have to push it out? Is the BU willing to foot the bill for that? Do they consider the use case worth the cost? When we can answer those questions then we can build a remediation plan, and you build that into the architecture document, the budget, and the data strategy. And, speaking of documentation ...

Scott: Okay, now we've done all the pre work. We've got use cases identified and prioritized. We know what assets we have and which ones we're going to acquire. We have a plan for managing all of that. Now, we need to document what we've discovered and what we've decided, and we need to communicate the strategy to the people who know or need to know. There will be a few documents actually that come out of this process, but let's start with the first one which is the data strategy itself. This should give the high level view of the need and purpose of the strategy, what the use cases are and their prioritization, how the strategy will be implemented, and a brief assessment of current capabilities. It should have all of that.

Scott: The two most important things to remember about the data strategy document is that it should be high level and it's a living document. Data strategies tend to have a short shelf life so they have to be flexible. I mean you'll make minor changes more frequently, but you need to revisit the whole thing about every six months or so to make sure it's still on track with the business strategy and the BU tactics. We also have to recognize that there will be different audiences who are going to be reading this and we have to account for that. And, we do that by putting summary information at the beginning, something that executive's can read, get a good idea of what's going on, and be satisfied. As the document goes on, it can get more detailed, but really the deeply technical in the weeds stuff should be in separate documentation that's referenced in the strategy rather than contained in the strategy.

Scott: Now, some of the additional documentations that'll come out of this process could include things like a data governance charter or the policies and standards that data stewards create. But, there should definitely be a data architecture document and a data asset catalog. So, those should be considered mandatory absolute minimum. But, all of these things should be stand alone documents that get referenced by the data strategy but they're not included in it. And, the reason for that is, if you include all of this stuff in the data strategy you're going to end up with this huge detailed document that people are going to look at and go, "Nope".

Scott: Alright, that's enough from me. I'm going to turn this over to MAPR because their product provides a solution for a lot of the issues that you're going to run into as you do this and it gives you a powerful data platform when it comes time to execute your strategy. And, rather than try to explain what I mean by that, I'll just let those guys talk about it because they know it best. So, thanks everyone.

Todd Freeman: Hi everybody. This is Todd Freeman from MAPR. Thanks for sticking around, thanks for joining us today. I'm going to step through just a handful of slides and give you just a really quick overview of the MAPR data platform, and a few examples if we have enough time for how some of our customers are using this. So, let's get into it. This is going to be a quick overview and we're obviously more than happy to have a detailed conversation with you should you be interested in that. So, just a quick introduction to MAPR as a company. We're based in Silicon Valley and we provide a data platform which helps our customers to operationalize and monetize data assets. So, MAPR customers use the platform to construct or build out a data fabric which can include any and all kinds of data as Scott mentioned earlier, structured, unstructured, semi structured, etc. The fabric can extend from primitive data centers to public or private clouds or any kind of hybrid combination of all the three of those, and out to the edge for edge use cases around IOT etc. So, MAPR powers use of open source technologies as well as proprietary analytics, machine learning, artificial intelligence, cloud and container technologies on an enterprise grade, production grade, single code base as a platform.

Todd Freeman: So, that's an important differentiator from our perspective because the history of MAPR and where we come from is we consider ourselves to be the answer to the questions that were left unresolved by the original technology of Hadoop and the Hadoop distributed file system. That old school technology has been around for a number of years now and there are some limitations to it like the notion that you have distributed storage and compute, but it's not fully read write, right? It's limited in it's current use cases around a container strategy for example. It's limited by the file system for it's ability to be cross cloud or enable data to reside in multiple locations etc.

Todd Freeman: So, MAPR purpose built this platform mainly around the file system which is the foundational layer of what makes up MAPR but it's not the only layer. That platform that sits in the middle on this slide consists of a file system, no SQL tables, no SQL capabilities, and an event streaming engine all on one code base. And, that entire platform is underpinned by the types of enterprise level SLA's that you have required of your technology vendors for decades and which from our perspective you tend to give up using only open source technologies. However, MAPR is not so proprietary that you can't work with those open source technologies. In fact, the platform is purpose built to enable you to use open source API's and open source technologies in conjunction with this enterprise grade platform with a consistent security model underneath things like disaster recovery, and high availability, consistent snapshots, the ability to be multi cloud, etc. And, the ability to work with edge data for IOT use cases.

Todd Freeman: So, one of the ways that we see this platform moving into the future and the more advanced customers that we have is for artificial intelligence in addition to the traditional notion of big data analytics. So, the ability to have all of these different components on one platform and one code base enables you to work with all different manner of work loads in the way that responds and relates to the business. And, as Scott was mentioning an important key component to this is doing exactly that and doing this plan, and making a plan for how you're going to build out a strategy that includes real world business use cases. And, if you're going to be impacting real world business use cases, then you have to be mindful of the notion of being able to be in production. There's a [inaudible 00:38:47] study that talks about Hadoop data links, Hadoop HDSF based data links and only 17% of them being in production. That's a key critical limitation in using strictly open source tools without have an enterprise grade, production grade platform to put them on top of.

Todd Freeman: MAPR considers itself to be the strongest data platform foundation for production use cases. So, I mentioned the file system is a distributed Exabyte scale file system, we're fully read write, fully POSIX compliant. You can map us as NFS. But, while that is all true you can use the HDSF open API, the JSON document database open API that talks to API to interact with the components of our platform that work with those data types or data strategies. The notion of a global namespace across all locations and clouds, and out to the edge. The ability to tier data whether it's hot, warm, or cold. The notion of high scale and high reliability, disaster recovery, all of those enterprise SLA's that I mentioned a moment ago, set in in a number of ways you give up, and in doing so in a more efficient manner in terms of how you would manage and run a cluster relative to the amount of hardware that you have to use to run your big data or your data strategy.

Todd Freeman: I mention it that way because I note that's a question that we'll get to in a few minutes when we do Q and A about, "Hey what's this notion of big data? We're not, maybe we're not a huge company or we haven't wrapped our hands around yet this notion of big data, but we have a data problem. We have the need to perform analytics and get some more advanced use cases with how you can work with data. But, we don't necessarily consider ourselves a "big data" company just yet." And, I want to purposely mention that this is something that comes up all the time in my conversations in the field with those business leader executives and technology executives around this notion of wondering whether or not they need to worry about the idea of big data. So, the idea behind data strategy can apply to any all companies whether you're a large enterprise or smaller enterprising scale.

Todd Freeman: And also, you'd be surprised maybe to find out that you attend a conference, you hear some people present, and you get this notion that some companies are further advanced, and think to yourself boy we haven't even started. And, I can tell you from my experience in the field and my experience in conversations at large conferences that the vast majority of companies and enterprises are really just getting started. They've kicked the tires and sort of dipped their toe into the water with HDFS over the past couple of years and now they're realizing that they need a more enterprise grade platform that can handle some of the more advanced work loads.

Todd Freeman: So, what is that success rate and how do we impact that with regard to getting into production, right? So, this is the Gartner statistic around 17% of typical big data projects get into production successfully. MAPR customers experience 85% plus success rate of putting a project into production. And, it's for all the reasons here which basically boil down to this notion of how do you make it easy for your data ops team that Scott suggests, and I would 100% back up that you build. And, how do you interact with the business and bring them on board? Because in many cases as we all know, as everyone on this call knows, in many cases IT is servicing the business. So, whether it's that the business decision makers have the control of IT budget or if IT has to do justification of projects relative to the business even though they control their own budget, there's a lot more pressure to make sure that you can get things into production, right?

Todd Freeman: So, this call is about developing this business strategy and I wanted to at least give some data around how MAPR's delivering real business value to the leading companies that we work with. There's a continuum of ways that you can think about getting started with a data strategy and those can be from costs of avoidance strategies like a data warehouse off load, or a historian off load in manufacturing environments, to where you are optimizing a data warehouse or optimizing a historian allowing it to perform it's functions by reducing the cost of print of it.

Todd Freeman: There are cases where you can replace some of those systems in some instances. And then, you can sort of scale yourself up to the more advanced use cases and if you have an enterprise grade, production grade platform on which you can do that it makes it easier to accomplish those goals because you don't have to spin up high loads of clusters, of different open source technologies and try to fix them together. If you have a platform that has the underpinnings of all of those open source technologies, and itself has a foundation with enterprise grade security and enterprise grade SLA, then you're able to run multiple uses cases on a single cluster. If you have a platform that's easier to manage and maintain, you can run multiple use cases and you can run the cluster with less people enabling more data ops professionals and other technology professionals within the enterprise to work on other high value projects.

Todd Freeman: So, I'm going to wrap up really quickly here over the next three slides and I just wanted to give a picture for how MAPR is addressing these use cases. Obviously AI and analytics is a big topic of conversation. IOT and edge analytics not just in manufacturing, there's a healthcare manufacturer use case that we've been working on for the past year or so that's around scanning images and enhancing the ability of the technicians that's looking for specific diagnosis to also be able to find additional things within a scan or an x-ray etc. And, obviously everyone is either, has already developed a cloud strategy, or is developing a cloud strategy, or is modifying a cloud strategy, but they had already developed and MAPR can help with that with our cross cloud functionalities and our global main stays to help deliver lower total costs of ownership etc.

Todd Freeman: And then, containers is such a big topic of conversation in our conversations right now. And, if you can imagine a data fabric extending between our premises, and cloud, and out to the edge, and then the notion of containerized applications that require stable data to underpin them you can just think of containers being plugged right into the MAPR data fabric that you've built. And so, what are the use cases that are a part of this? Well, I won't go over all of these, I won't read the slide to you, but this is something that every single business should be focused on. I think one of Scott's slides mentioned this notion that if you're not already planning for this or businesses that don't plan for this over the next couple of years will be outpaced in terms of revenue generation, top line revenue, and bottom line cost savings by literally billions of dollars is estimated.

Todd Freeman: So, this approach around a data platform to build a data fabric, right? I want to leave with one last thing and that is this sort of old school HDFS approach that has severe limitations. Because it's been around for a long time and there are new and better ways to do things. Everything from containerization strategies, and not being fully read write, having main nodes that sort of clutter up and expand the amount of hardware that's required to run a cluster. MAPR has addressed those things and gives our customers the ability to work with and manage their data and have a future proof platform that's ready for whatever the latest and greatest open source innovations are. And, one thing I'll leave you with is that MAPR has multiple leadership members that are involved in the open source community. So, with that I think that I'll hit pause and allow us to have a 10 minute Q and A session, and we'd love to open it up for questions.

Todd Freeman: Great, thanks Todd. So, just a reminder to everyone on the call, to submit a question please insert it into the chat box in the lower left hand corner of your browser. So, we've had a couple of questions that have come in Scott and Todd. Let's start with the first one. "We're doing an MDM project currently. Should we have started with a data strategy or is it okay to keep going with this?"

Scott: So, yeah I mean it's perfectly fine to do master data management by itself. And, you'll get the benefits of having done that, right? So, you'll notice your data quality goes up. You'll notice that you don't struggle with certain things the way that you did before so you'll get the benefits of master data management, but in the end it's really, it's one component of data governance. So, you wont' get all of the benefits of data governance. And then, data governance itself is a sub component of a data strategy. So, you won't get all of the benefits of the data strategy. So, you'll get the benefits of master data management, but no you don't have to go back and start over and begin at the top with data strategy, not necessarily. You can go back and do that and just you've already got the MBM piece handled.

Todd Freeman: Great, thanks Scott. So, another question that came in is, "Is big data the only reason for a data strategy? What about enterprises with small data problems such as redundancy, data quality, lack of effective use of data?"

Scott: I would say companies that have a huge amount of data, they've got big data ... What do the kids say? First world problems. So, companies that have big data problems, yeah, they will benefit more I think. But, no you don't have to have big data in order to need a data strategy. And, maybe the only reason the framework looks that way is because it's designed to scale. So, you can take this framework for a data strategy and apply it to almost any amount of data, but if you don't have huge amount that's okay. The same things apply. You still need to manage it. You still need to figure out who you're going to use it. You still need to catalog what you have. So, yeah you can totally apply this to smaller data uses.

Todd Freeman: Yeah, and this is Todd. I would add that your ability to ... The point of the question is a very valid one, right? The notion that you may not already have big amounts of data within your environment. And, one way to think about as you strategize about data and make a decision around a platform is exactly to this point. How do you run analytics and work with your data that you already have, that's existing data inside of databases, data warehouses, behind your fire wall, proprietary data. And, how do you combine that in the same moment for analytics purposes with real time data, right? And, the knowledge that once you start the strategic process of understanding that that's where you want to get to then knowing that the process of gathering the real time data will eventually lead you to having big data.

Todd Freeman: And so, you can future proof from both perspectives. You can have operational data that you have sitting within your own environment and you can have real time data, and have a single platform on which to work with both of those types to work with them in conjunction with each other to really have powerful analytics, and get to real time machine learning and that sort of thing. And, future proof knowing that you don't have to worry about a name node and having one or two nodes within a five node cluster dedicated to telling the other three nodes where the metadata sits. At MAPR we distribute meta data across all the nodes and then you therefore end up using less hardware. And so, as you grow and expand and get closer and closer, and move towards big data you've future proofed yourself with the total cost of ownership from a hardware perspective.

Todd Freeman: Great. So, I've got two more here and then maybe Todd you could take a couple of minutes to run through the last couple of slides that you've got. "So, how do you convince the business teams to own the data strategy as it is considered an IT project?"

Scott: Well, I guess you'd have to define what you mean by own. Really any department can own a data strategy as long as everybody agrees it's theirs. But, to get the BU's to contribute to a data strategy you just, you have to show what's in it for them. And, I mean that may sound a little mercenary, but that's usually the best way to get anybody to buy into something. So, you go to the BU's and you say, "Okay guys, we want to help you execute on the business strategy. So, tell us what your tactics are going to be, and we will look at what we've got in terms of data and see where we can help you with that, and we'll ... The benefit to you is you get to execute on these use cases and we're going to help you with that."

Todd Freeman: Okay, thanks Scott.

Todd Freeman: Yeah Scott, I couldn't agree more. Let me jump in really quickly and just say that I'm currently working with a large financial services institution around wealth management strategies and the various business groups that work with different components of a financial services company, right? And, how they go about in their interactions with institutions that they manage. They might manage the money on behalf of, and with individual investors for whom their advisors work to help manage their investment. And, it's as simple as this notion of you can apply the same concepts to retail, right? So, this notion of sort of a chain store sales group. You think of that within the retail industry and analyzing data both static data and real time data to come up with strategies that help to improve chain stores sales growth within retail. You can apply that to financial advisors and institutions who are attempting to grow their customer base and grow their revenue. You could look at senior care facilities for example and the amount of data that they have access to, and how can you strategize with the business and get the business to start thinking of the data that they have access to, and how they can monetize that data and drive new revenue streams for the company with that. And, as soon as you start to talk numbers with the business, their ears are going to naturally perk up.

Todd Freeman: Thanks Todd for jumping in. So, one last question. "What's the typical cost and timeframe to execute on a data strategy?"

Scott: Oh man, I hate giving this answer because it seems like everybody gives this answer in text, but it depends. So, if you decide to go this route yourself and do it internally it's going to cost one thing. If you have brilliant data or somebody else come in and either coach you through it or do it for you it'll cost a different thing. If you have a small amount of data it won't take as long or cost as much as if you have a huge amount of data. If you can sit all of your data into or all of the data that you want to use into a single data warehouse. Well, that's different than if you go with a data platform like MAPR. There's difference in costs and difference in time involved there. It also depends on how much of this you already have. So, maybe you don't need to add to the data assets you've got. Maybe you've got enough that you can do the use cases [inaudible 00:57:26] have come up with. So, there's just, yeah I hate to use the it depends answer, but it really does apply here. There's too many variables really to say.

Todd Freeman: That's the honest answer and my two cents would be just very quickly that there, you could think of a strategy if what you want to do is get started. You could think of a strategy of figuring out a high value use case in conjunction with your business partners and do, for example MAPR has quick start solutions, And, brilliant data can work with you on single use case solutions that will immediately sort of bring up a machine learning use case. For example, a recommendation, right? MAPR worked with American Express on a recommendation engine for their online offers. And that enabled American Express to uplift a 100 million dollars of new revenue just through additional suggested offers by using existing customer data to make better suggested offers to their loyalty program, to their sort of points program participants. So, you can pick off a use case and go for that, show success, and then broaden it out to the overall larger data strategy platform conversation etc.

Todd Freeman: Okay, thanks Todd. Was there anything else you wanted to walk through Todd or because we're at the top of the hour.

Todd Freeman: Yeah, I want to be respectful of people's time, but if you're willing to spend an extra couple of minutes with us, I can give some of these examples and we'd love to have you join us for that. So, David if you want me to just sort of run through a handful of these maybe?

Todd Freeman: Yeah, go ahead.

Todd Freeman: Okay. So, I just mentioned this one which is American Express. This is an important one, the uplift in the Amex offers program was something that was the original use case. But then, the use case expanded and on the same cluster American Express is running fraud detection and prevention, they're doing new customer acquisition, and then obviously the recommendations for better experiences for those loyalty program members. United Health Group is a big customer for MAPR, and talking about a strategy and involving the business. The first bullet point here is the big data platforms and the service for 50 plus business units, right? I mean this notion of creating what everyone used to refer to as a data lake, right? To be able to bring in all types of data and not just about health records, right? But, about claims, and the original use case was preventing claims fraud. And, when you're the largest of what it is that you do, if you can impact a given part of your business by a few percentage points then that's a massive amount of either uplift or cost avoidance, fraud, detection, etc. right?

Todd Freeman: So, one way to think of an additional component of IT that should be important to every business unit is security, right? And, specifically speaking in many cases purpose built security systems, a data platform like MAPR is not going to replace those it's going to augment those. Because in many cases those systems might have some limitation to the amount of data that can be stored there. And, one thing that machine learning and AI experts will tell you, I work with them so I've learned this over the past few years of my time at MAPR is that the more data that you can throw at models and algorithms to train them better the best, that's the best way to go about it, right?

Todd Freeman: So, the more data that you can use to train models and algorithms the better and you can apply that in the security sense, right? You can take the data from the point systems that manage security intrusions etc. And, you can funnel that data out of those systems and into a security data fabric if you will. And, you can run machine learning models and algorithms against that to do better detection. And, in particular with MAPR's ability to tier data hot, warm, or cold you can utilize that as a part of this kind of strategy where maybe you need really fast real time analytics on some of that data. But you also need to do, run models and algorithms on the entire set of data, but it doesn't have to be real time and you can do some of this on solid state drives for the fast analytics and some on spinning disks for the stuff that doesn't need to be real time.

Todd Freeman: I've talked about financial services. I've talked a little bit about healthcare, some security, manufacturing, oil and gas. MAPR has a really, really solid position within this sector of the economy and we've got a couple of really good use cases on this side. So, this notion of the ability to extend the fabric out to the edge, right? You think about the oil and gas industry, the data, some of the data that's going to be important to them is going to be coming from these locations that are hard to reach, right? So, if you can have a small edge cluster of MAPR and have some of the analytics actually be performed at the edge. And then, when internet connectivity is available because it's not often available at these locations then that analytics and the data itself can be processed back to a centralized cluster, right? And, in the case of Anadarka, they're already into GPU and accelerated analytics on a massive data list.

Todd Freeman: Tupras is a very interesting customer use case for MAPR in terms of the number of different use cases that they've brought to the MAPR platform and they won some awards for their IT team. This is a refinery the largest refinery in Turkey. And so, everything from corrosion detection to alerts and alarms monitoring The ability to monitor data and collect data every second instead of every thirty seconds. I mean when you think about it if you've got some sort of issue within the processing in a plant like this or refineries like this, you want to know about it in as close to real time as possible that you can address these and then therefore lower downtime, right? And, obviously the benefits of these are pretty self evident, right? So, I see a number of people still sticking around so I'll cruise through a couple more of these and then we can wrap it up.

Todd Freeman: This one's really interesting to me, there's a major health care manufacturer that we work with where this also involves edge data because it's about getting data from cat scans and MRI machines, right? Typically you're going to be, you're going to have a technician that's looking for a specific diagnosis that the doctor has asked for films to be able to locate, right? Well, the technician is only looking for that specific thing and they're so busy and there are so many things for them to do in the overall process of their job that they don't spend hours looking at a particular piece of film looking for other things that might be on that film by happenstance. But, if you can do image recognition and training machine learning models and algorithms to do this at scale and at speed then you can have an additional component and a value add to what the systems were originally intended to do, right?

Todd Freeman: So, you can get new services and you can identify other diagnoses that you weren't specifically looking for etc. right? And, if you're able to as the manufacturer of that equipment include that as a value add service obviously you're impacting lives but you're also impacting the ability for healthcare providers to provide excellent care to their patients and drive revenue to the healthcare provider.

Todd Freeman: And, one last one, this is related to this conversation that many of us at MAPR are having out in the field and that is this notion of utilizing that MAPR data fabric to provide state full data for containerized applications, right? So, both of these are examples where MAPR's platform creates a data fabric, containers run against the data fabric, and allow for really cool use cases around models and algorithms and the ability if you think about it you could spin up a container and have it have not just our, for example but a specific version. And, data scientists like to train their models and algorithms against the largest data set yes, but they also like to use different versions and then compare results, right? And so, a containerized strategy on top of a fabric allows you to do that. And, the guys from Brilliant Data and utilizing the MAPR platform can help to build out this kind of strategy.

Todd Freeman: And, one last one I'll just throw this up there. So, to point of the question earlier, you can see in this there's some really, really large companies like Samsung and Sony, and some global enterprises, right? And then, there's also smaller companies that some of them utilize a massive amount of data. For example, like Rubicon, and some of them are not necessarily using that quantities of data yet but they are using real time and static data on the same platform and MAPR helps to power that. So, hopefully stepping through some of these has been helpful.

Todd Freeman: That was great, thanks Todd. Thank you Scott and Todd, and thank you everyone for joining us and staying on a little extra longer. That is all the time we have for today. For more information on this topic and others, please visit Thank you again and have a great rest of your day.