Show 1: Maximizing Business Value with Data Science and Data Operations

Podcast transcript:

Jim Scott:

Hi, my name is Jim Scott, I'm the host of Data Talks. Thanks a lot for joining us today. Today's guest is Kirk Borne. Kirk is with Booz Allen, and we are going to be discussing a very fun and exciting topic, and that topic is data ops, or data operations. So Kirk, could you go ahead and, before we get rolling, maybe just give the listeners a little bit of background about yourself?

Dr. Kirk Borne:

Sure, thank you, Jim. I'm with Booz Allen Hamilton, as Jim mentioned. I'm currently the principle data scientist for this management consulting and technology firm, which is over 22,000 people. What I like about my role is that at Booz Allen we have nearly 1,000 data scientists actively involved in data analytics, data science, machine intelligence, quantum machine learning, all kinds of cool things. My role as the principle data scientist in this wonderful firm is to be sort of the horizontal matrix guys that gets to talk about what everyone is doing, what anyone is doing, what the clients need in different client accounts. I bring from years of experience in astronomy, and working with large data systems at NASA, that experience and that knowledge of what customers and clients need to do their jobs. That's what I enjoy doing is showing people the value of data and extracting that value on how to build systems that do that.

That's what I'm doing now. Just briefly prior to that, it was 12 years at George Mason University. I was professor of Astrophysics, but I never actually taught Astrophysics, I actually taught data science courses. We did that for 12 years, many years prior to the current boom in data science. Prior to that, I spent 18 years within the NASA environment working on large data systems for astronomy. My background is astronomy, but I live in the data world now.

Jim Scott:

Fantastic. You said data scientist. Now if we put aside the fact that you're just a super smart dude, clearly you are a constant learner, you really like to keep up the speed of what's going on in the industry. Can you explain for, let's just say the general population of people, what exactly does data science and a data scientist really entail?

Dr. Kirk Borne:

Well for me, I tell people that the two most important things in data science are the data and the science. Okay. It sounds like a joke, but it's not really a joke. First of all the data, the evidence that we collect from whatever work we're doing, and nowadays everything has some kind of digital signal, some kind of data associated with it, no matter what you're doing. Whether it's in health care, or your TV viewing, your entertainment, your purchase histories, your travel. Whatever you're doing there's data that follows you, that tracks you, that monitors what you're buying and doing. So that data gives information and from information, we try to extract knowledge. The scientist's role is to do discovery for patterns and trends and interesting behaviors in those data, and to build models that can sort of do prediction off data from those models. And even do something which we call "prescriptive modeling", which is understanding what can we do to improve the outcome. As opposed to just predicting what will happen, can we make a better outcome than the one we predict.

So there's no necessarily right answer in that, so what the scientist role in this is the scientists do that experimentation. Test different models, test different outcomes, test different combinations of data to see which thing works, which one is most accurate, which one is most representative of the real world behavior of this person or this process or this product or this thing that we're studying. Its that combination of that iterative experimental mindset of the scientist applied to these massive stores of data, where we're trying to learn what is the right pattern or feature that we can extract from this data to make a better decision, to do a better discovery, and to achieve valuable outcomes.

Jim Scott:

Fantastic. Yeah, I mean myself, I hear people say the question a lot. "What exactly does that mean?" And when I personally try to explain it to them, I try to keep it as simple as possible and then let them know the data scientist role, its been saddled with a really overly generic name, right? Because data can imply any topic in any domain, because its an output from something, and science is "a study of". So wow, we've literally just said we're studying the output of stuff and its kind of a shame, in my mind, that someone wasn't able to get a better, more descriptive name, or something that it could be turned into "Oh wow! That's that role, that's amazing!" Right?

Dr. Kirk Borne:

Yeah. I call it rocket science for data, I don't know. Unless its a "wow, that's amazing" response.

Jim Scott:

Well, rocket science for data, you've got a website for that don't you? Isn't that your blog?

Dr. Kirk Borne:

Yep, rocketdatascience.org.

Jim Scott:

There you go. Yeah, you've had some pretty good posts on there. I've gone and referenced some of your reading lists in the past.

Dr. Kirk Borne:

Well one of the things I like to collect there are interesting collections of reading materials and other things like that. So not much just pontificating about my opinions about things, but give good resources to people who are trying to learn things or trying to use these data tools, and where can they find information about it.

Jim Scott:

Great. At the top I mentioned that we're going to be talking data ops and data operations. Now, there's a pretty substantial buzz that's been moving over the last, say, year or so, helping push this concept and this notion of data ops forward. I was wondering, so I find it to be a pretty important topic. Do you think you could elaborate for the listeners and help explain, what is your interpretation of data ops? What is it or what isn't it, and what does it mean to a business?

Dr. Kirk Borne:

Well that's certainly a good question right now because much like data science, which has variety of different interpretations for people, data ops is somewhat similar to that. Maybe not quite as diverse in terms of the definitions, but its still more than one definition out there that people use. And one of them which I don't use quite as much, which is really the one that you were just alluding to, which is data and operations, or operations around data, that's true. But for me data ops is something different, to me personally, which is sort of the concept of dev ops as applied to data projects.

So dev ops is this process where you have like a tight coupling between a development team, the people who built the thing, and the operations team, that is the people that use the thing. And so in the world of system engineering, which I lived and breathed at NASA for 18 years, this tight coupling of the system engineering if you will was an exceedingly important step in the building and operation of a spacecraft. Or any kind of system that would be launched in space, primarily because once it's launched, you can't fix it because it is up there already. And so you need to understand what is it the users really need, how should the system really be designed, and get quick feedback from the end users. Does the same meet your specifications? Is this really the thing that you're going to use? Does this meet the requirements that you set for the system?

So that dev ops concept of sort of rapid cycles and rapid iteration between test, deploy, feedback from the users. So if you think about dev ops being this continuous merger, if you will, in development and operations teams, apply that now two data analytics projects. That's what I think about when I think of data ops. Now of course, part of that is the data operations phase of the work that you do, so data ops certainly makes sense for people who think of it more on the operations side. Then what are the data operations that we are running, which for example might be a cloud server, it might be an analytics platform, it might be a machine learning market for APIs to do machine learning. So sort of the operational thing you deliver might look different for different people and that would be her data operations.

But for me, data ops is extended back into the design and development phase as well, which is why did we build this in the first place? What are the energies and requirements and are we meeting those requirements so that this becomes usable. It's almost like the famous quote from the "Field of Dreams" movie where they said, "If you build it, they will come." Well, that was true in that movie because it was made clear in that movie, if you're familiar with it, building that field was something deep inside what people wanted. So if you can know what that is, what that thing that people really want and need and you can tap into that, then yes, if you build it they will come. So data ops is, I want to say, the field of dreams for data analytics.

Jim Scott:

Well yeah, and so if I were to kind of stack on top of it a little bit in my perspective, one of the things that I think has been missing for a long time is just ... let's look at something as simple as SQL. Okay, it's been around for a long time, people use it for all kinds of different purposes. At the end of the day, you could consider it to be one of the most rudimentary forms of performing analytics. Regardless of the intent of the outcome, it is used for that purpose.

Now, since the data that the sequel query queries can be changing over time, the data models that it is querying can be changing over time. The fundamental problem that I have seen is people don't typically version their sequel queries along with their data models, or with their data. And I think that's been a massive gap, and I think until data has grown to this phase where we now call it "big data". People haven't really started considering data ops as an important piece, yet it's been important since day one, just those people didn't practice it.

And so applying these good concept that came out of dev ops, we're saying "Look, it's not just about the software that's being written by the software engineers, right? It's the data models coming out from all parts of the business. It is the queries that are being run." Today you're querying these data sets on these data models. Tomorrow you are querying additional data sets with new data models. How do I make sure that over time I can produce and reproduce what I was able to do in the past, without having to worry about this whole "Oo I threw something over the wall. I wonder who I messed up tomorrow."

Dr. Kirk Borne:

Yes. Good point there. I think you've hit on a really important concept within data ops, which is the sort of reusability, or [inaudible 00:10:47] is another way you might want to think about it. That is, how can you reuse something if you do not know the state of both the input, the process that works on the input, and the data model that it is all structured on. And so when you're versioning the SQL code, you're versioning the data model, and you're even versioning the data content.

For example, if I do a query [inaudible 00:11:09], show me the list of users who have a particular property. Okay, so let's say you find me the list of all my bank customers who are, for example, defaulting on their loan. Well, that list will change over time, so you need to know the list if you are going to take some action based on that list. You need to make sure you have the updated list, so that's understanding what was the data timestamp when you did the request. Also the data model might have changed, some feature in the model, that what you think you are querying is not really what you're thinking. You might think you're querying all possible users, but it may change to be just the users within a particular branch of the bank and you missed that fact, so you missed all the other possible users because the data model has changed. And also-

Jim Scott:

Exactly.

Dr. Kirk Borne:

In SQL code, the code language doesn't evolve very rapidly, but it does evolve and just knowing that is important. To think more generally about any code that you write, certainly in a development environment, the code changes hourly, daily, or faster, so you need to know the version of the code you are using. Certainly that is the value that dev ops brings to software development projects is knowing the timestamps on all those things. And data ops should definitely include all of those same sorts of promenades to be able to be sure you can reproduce the answer to a question. If someone said "Well how did you get this answer to this particular question?"

And that happens a lot of time in the data analytics world. If you're generating a list, for example, risky behaviors in your organization and you do some kind of querying based upon some data set, you might identify the wrong sets of people or the wrong sets of things that you're going after. And so you need to have all those aspects in your data system to also make it useful, in the same way as any software system to know what state was the code in what version was the code and when you did the following calculation.

Jim Scott:

Exactly. What seems to me is just critically important, and I've seen a number of our customers do this is they use one of the features like snapshots so they can keep a consistent view of everything all at the same time and then they just make sure they store their models at the appropriate location of their queries, in the appropriate location so they can version it along with the data at the same time.

Dr. Kirk Borne:

Absolutely. I think the mention of data ops, and this is focusing in on it if I may, more focus on the data engineering site of the data processes in an organization. But my life sort of lives more usually on the data science side, which is the model building, the data collection that you need to build a model, to answer business questions. And so the versioning is more about what version of the model, what version of the data set that we were using. And of course it requires certain version control, obviously, in the whole area in the data infrastructure that you're describing. So all those things work together, which is why we call that the continuous cycle. Dev ops is a cycle. Some people say it is not a thing you do, it's a way of doing things.

Jim Scott:

Exactly. So I was recently talking with Jay Zaidi, we were talking about his book that he wrote, "Data Driven Leaders Always Win".

Dr. Kirk Borne:

Yes.

Jim Scott:

I know you're familiar with this because you wrote the foreword for the book.

Dr. Kirk Borne:

Yes, exactly. Thank you for the shout out.

Jim Scott:

And when I was talking to him, he says to me, "Jim, you don't have to read my whole book, but at least read the forward by Kirk."

Dr. Kirk Borne:

Well, that's a good plug.

Jim Scott:

Yeah, it was pretty good, so it made me laugh and I did read more than just the forward. But it was really interesting because one of the things in your forward, you mention, is customer 360. That do you think customers should focus on implementing a customer 360? Or do you think maybe they should focus on something more like, I don't know, maybe they should go building a customer 288. What do you think?

Dr. Kirk Borne:

Well, yeah. Maybe not everyone knows why this is funny, but I did write an article for the manpower website called "The 288 View of the Customer". And someone may say what does that mean? Basically it's applying the 80/20 rule. That is often times we discover that these numbers aren't exact, but often it's like 80% of your return on investment comes from the first 20% of the work, and in order to get the last 20%, that's 80% of the effort in. So literally have to spend five times more effort just to get a 20% improvement on the final product. And instead of necessarily going after the full 360 view of a customer or a process or anything for that matter, where you collect all possible data, maybe you can get most of the value that you need by using only 80%. Spending 80% of that, not just the effort, 80% of the data, which could come with 20% effort and that it's so much harder to get the last 80.

And so I think the 288 view ... Oh, 288 is 80% times 360 by the way. The 288 view to me is probably an example of data ops in operation, actually, and that is you understand that you can deliver almost all of what the end-user needs without spending an enormous fortune. As I said, the last 20% of value comes from the last 80% of investment. Well, that's five times more investment just to get that little bit extra. Maybe the customer is satisfied where you are, maybe the end-user, the stakeholder, the business process manager is satisfied with where you are and maybe doesn't want to spend five times more money for the last little bit. So I'm a big fan of the 288 view of the customer.

Jim Scott:

And so am I, and honestly when Jay had told me this, I remember that you had written that. Until the thing I find amusing about it is it's another great analogy, it's one hopefully people can latch onto. I feel like we should be smacking people in the face and say, "Hey, 288 customer view. You should check it out. Check it out." To me it comes down to too many people, did focus on what the original goal was instead of stopping, pausing for just a moment and thinking about what they're actually trying to accomplish. Because goals change, and most of the time people are unwilling to reevaluate goals before a goal is complete. Because they need to have closure, perhaps, around a topic. So to me this just seemed like the perfect way to bring this topic back up because hey, focus on what you're trying to actually solve and then reflect.

Dr. Kirk Borne:

Exactly.

Jim Scott:

And keep going if you have to keep going.

Dr. Kirk Borne:

I mean, to me, that's almost like the perfect definition of data ops. When you say the requirements will change, it's not so much even as they change so much as that they migrate. And so for me again, going back to the concept of dev ops applied to data, which I'm calling data ops, you do that minimal viable product which they call the MVP. You build the first viable thing that the end-user can use and see how satisfied they are. Are you moving in the right direction? Is it giving them what they need? Or are you off-base? A lot of time when the end user sees the product or sees that particular solution, they say "Oh, what I really meant was this." So not so much changing the requirement, they're making it clear because, maybe they didn't make it clear in their original explanation, but maybe it wasn't even clear in their mind. Sort of like that old adage all know it when I see it, right? So I can describe what I want for you but I really won't know that's what I meant until I see it.

I taught a graduate course as part of my data science courses I taught at the University, and in my graduate course, it was called scientific databases. It was a heavy focus on the science because I was in the science department, but it was really about data science with applications in many different domains. But I included in that class a section on system engineering and this whole concept of requirements creep, or scope creep, as they call it. The requirements gradually shift and move. So I open the lecture by telling the students that requirements creep is not referring to a person usually.

Sometimes it is. But the requirements creep comes on both sides. And not only comes from the end-user, but also from the development team. I've seen this because I was managing a development team at NASA for our NASA client, and often times my developers said, "Oh wow, wouldn't it be cool if we added this new feature? Wouldn't it be neat if we added this to the interface? Wouldn't it be neat if we collected this other data and added it to this particular product?" And it's like no, wait a minute, the client is not asking us to do all of those extra things.

So the wanna haves and gotta haves, what you need and what you want are not always the same thing. That's something that we have to teach our children and we have to teach ourselves, too, what you want is not necessarily what you need. And so the requirements creep, as it's called, or the scope creep, as the requirements move and shift and start adding to the development and design cycle, that adds more money, adds more time. And the question is can you intercept that before it gets out of control, and that's what the dev ops or the data ops cycle is supposed to be. Deliver the minimal viable thing, that is the thing that works minimally, safely on the right tracks, see how close we are, and if the 80/20 rule kicks in like that 288 view, am I satisfied now? Let's move onto something else and so be it.

Jim Scott:

So, do you have any speaking engagements coming up soon?

Dr. Kirk Borne:

I do have speaking engagements all the time coming up soon. It seems to be the world I live in. What do you want to know?

Jim Scott:

Let's say within the realm of February and March timeframe. You got anything fun you're going to be doing?

Dr. Kirk Borne:

In early February, I'm speaking at a business analytics summit for the University of Texas at Arlington, so right outside Dallas. It'll be actually at the Texas Rangers baseball team Arlington Stadium, so that'll be fun. The Texas Rangers chief of analytics speaking as well, so I'm a big fan of sports analytics, so that'll be great. When March comes around I'm going to be in London for the big Data World Expo at the Excel Center in London. If you've never been to the Excel Center, it's like multiple football stadiums all under one roof. That's sort of basically what you're seeing there.

Jim Scott:

Yeah, the important take away from your note on that location is don't get dropped off at the wrong end of the Excel in London.

Dr. Kirk Borne:

Yes, bring your walking shoes.

Jim Scott:

Yeah because it's got to be close, I haven't put it on the map to figure out exactly, it's got to be close to a mile-long building.

Dr. Kirk Borne:

Yeah, I think so. The thing that amazes me is that they are more like 10 concurrent expositions when I was there last year at the same time, under the same roof. And if you're in any one of those expos, you feel like you're in the biggest convention center you've ever been in and that's just one of the 10 in the building. It's just unbelievable, so I'm doing that in March. I'm also going to Dubai and the United Arab Emirates in March for the Global Smart Energy Summit, so I'm going to be keynote speaker both at the London convention and also the Dubai convention. So those are to-be's coming up on my calendar.

Jim Scott:

Fantastic. So if we pivot just a little bit for a moment, I'd like to know, when we're talking about social media, who is your favorite person to follow on Twitter or whatever your favorite media outlet is? To get fun, interesting, insightful information for when it comes to things like dev ops, data ops, etc.

Dr. Kirk Borne:

Well, focused specifically on data ops, the group called Data Kitchen. So Data Kitchen on Twitter, their handle is datakitchen_io. Their website is datakitchen.io. So they are doing a lot of cool work in data ops, spoken with them at conferences and we're all sort of on the same mindset here, so I really love seeing what they're doing in hearing about them. But generally on data science, I mean is just so many interesting and active people on Twitter. I wish I could combine myself to a top 10 list, but I think I'll try not to go there because there's just so many wonderful people active in that space.

I will just give a little shout out to my own organization, so Booz Allen. Twitter handle is boozallen, and we also have a twitter handle called boozdatascience. That's booze without an e, it's not booze as in drinking booze. It's named after a gentleman named Booz from [inaudible 00:24:17]. As I said, we have nearly 1000 data scientists in our firm and a lot of activity. We're doing our own machine learning, and AI, and data science and analytics, predictive modeling, so I actually go and read my own company's twitter feed frequently during the course of the day to see what new things are happening that maybe, in a 22,000 person company, I'm not otherwise aware of because a lot can happen in a big company.

Jim Scott:

Well that's actually a good indication that it's not just a bunch of fluff, so it's good to know.

Dr. Kirk Borne:

Oh, it's amazing some of the things I see. I mean literally, I just recently saw the predictive analytics handbook for national defense. I said, "Whoa! I didn't even know we had this." And then it turns out one of the big attractors for me to leave the university, almost 3 years ago now, was my wife thought it was quite a drastic change for me to leave a tenured full professor position at a university to go work for a private company.

But one of the things that Booz Allen created was this thing called the field guide to data science, and if you haven't seen it you are really missing something. I mean it's nearly 100 pages of very detailed discussion of different algorithms, different techniques, different methods, and it's not fluff at all. Believe me, there's lots and lots of case studies in their, and they're all relatively short, one or two pages, so it doesn't require a long sit time to try just interesting content. If you get the hard copy it is available as a free download PDF, but if you get the hard copy it has a multi page foldout of an algorithm map of which algorithm to choose for different types of problems you are trying to solve. It is pretty amazing. When I saw that a few years ago, I said "Man, I'd love to work for this company." And you know, my dream came true.

Jim Scott:

Well, before I go on to the next question that I've got for you, I want to throw a shot out then for one of the people that I enjoy when they to put out tweets on Twitter is @lisachwinter. So Lisa Winter, she puts out a lot of good stuff, predominantly data science related but not always. And just fantastic content, yeah. It's tough to narrow down in certain domains, and so I'm just trying to use this as a chance to share some of those that myself and our guests find really interesting, because there's so many out there to sift through all of them is very difficult. So hopefully our listeners will be able to benefit from this and find some new people to follow.

Dr. Kirk Borne:

Yeah, well I know Lisa and her Twitter feed well so thanks for mentioning that. If were going to name specific names and not just organizations, I will break my little promise to not name a name, but I will. Because there's so many that I do not want to do injustice to anyone, there's so many wonderful people. But Bob Hayes, I'll throw out Bob Hayes. So Bob, part of his organization is called Business Over Broadway, so BOB. His name is Bob, b-o-b, and Business Over Broadway is b-o-b.

I love it because they do a lot of sort of surveys and studies of the field, they do a lot of surveys of data scientists in science practice, and produce a lot of really useful market-based information about the state of the field, the state of data science and data scientists, what they're doing and how they're doing it. A lot of really great content there and he works with the company that does a lot of work in the digital marketing, customer analytics area, which I find totally fascinating. Some people might wonder why is an astrophysicist care so much about digital marketing and customer analytics. Well, for me the whole nature of the world is right there, it is behavioral, it is the behavior of people. Whether you're looking at healthcare, or defense, or marketing, or anything, it's about people. And analytics is about following the data trails to see how people move and change into things and what their motives are, so I love to go to digital marketing conferences because I kick off my talks usually with a story about killer asteroids. And people say "Oh my gosh what is that? What has that got to do with this? What's this astronomer doing speaking to us at this conference?"

And so I talk about killer astroid. I say "Well we can measure lots of asteroids in the sky, think about these asteroids may be as your customers. You get lots of data points from these asteroids and you can see where they're going, and build predictive models to see where they're going to go. One day we might discover an asteroid where we predict the orbit will intersect earth and wipe out humanity. Oops. That's not a good thing. So what can you do to change that outcome? So think about your customers in the same way. Follow the data, you can see where your customers like to purchase, what they like to do, how they're doing it, what their purchased patterns have been in the past, and predict future customer behaviors. And maybe one day you'll predict that one of your customers is going to take their business elsewhere."

And I've actually talked with companies to build these customer return levels and of course are trying to find what can they do to keep that customer from taking that action. So it's the same kind of prescriptive analytics, I mean metaphorically the same, as killer asteroid. That is what can we do? What kind of forces and conditions can we sent, knowing how something response to these forces and conditions and inputs and treatments? How can we move it to a better outcome? And so this sort of prescriptive modeling around human behavior or any kind of behavior, whether it's asteroid behavior, or customer behavior, or employee behavior, or health patient behavior, or even a process behavior in a manufacturing plant. All these things are metaphorically the same, in my mind, And so I really enjoy discussing those things and finding analogies in outer space to things right here on Earth.

Jim Scott:

Yeah, that's a great example, and I tell people all the time myself you don't realize how similar your business is to every other business out there.

Dr. Kirk Borne:

Exactly.

Jim Scott:

You start tearing away the layers and you're like "Oh, yeah I guess it does look a lot like that, doesn't it?"

Dr. Kirk Borne:

I tell you, that's really amazing how that is true and it's not so much true nowadays that people are unaware of this, but not too many years ago people were not aware of this, that there was so much similarity. I've done a lot of consulting prior to coming to Booz Allen, when I was at the University I did a lot of independent consulting, and often times people would say "Oh, you're just an astrophysicist. What do you know about this?" The worst statement I ever heard is "Oh, you're just an educator, just a teacher." That's like the biggest insult to a profession that I have ever heard, so don't ever say that to a teacher. But the other amusing one is "Oh, astrophysics. What do you know about data?" Anyway, so the point is we are all using data, we are all learning how to use it better, and we can learn from one another.

Jim Scott:

So there was a post you tweeted recently, and it's been in the news for quite a while, but the posts specifically was about alpha gal and what it requires to generate its neural net and to function. In the post, when we put the podcast up, I'm going to put the link to this article in there. What is your take away from this article for how alpha gal was built?

Dr. Kirk Borne:

I think this whole concept of reinforcement learning is fundamental to that alpha gal process, which is reinforcement learning is basically a version of machine learning where you don't necessarily have a training set, but you have a goal. You establish a goal and of course in a game, the goal is to win. And so as you play a game, well in alpha gal you can play millions of games. It's just a computer unlike me, I have to go back into my day job. I can play one game with you but I don't think it will work. But we can play millions of games find out which moves, which actions lead to better outcomes toward achieving the goal, which is winning the game. The process of reinforcement learning, trying to encode that in algorithm, I guess is what alpha gal has achieved. So not only did alpha gal feed the world's go champion, but it actually learned how to feed itself so to speak. That is it learned so much about playing games against itself that it actually became sort of the super champion and it was beating itself. And it reminded me a lot of this movie called "War Games" from the 1980s with Matthew Broderick.

Jim Scott:

Oh, Matthew Broderick.

Dr. Kirk Borne:

Matthew Broderick. Oh man I mean it's kind of corny, but it's pretty amazing what that movie showed in terms of a lot of things. In terms of AI, machine learning, games teaching themselves, but also about data security and hacking, oh my gosh, that movie was way ahead of its time. It might be a little bit campy for some people but I think it is worth a look, if people have never seen "War Games". But the idea that the machine plays itself in order to learn how to best when the game and there is a very interesting conclusion at the end of the movie because the war games in the computer is about playing different scenarios for how to win a nuclear war and the conclusion of the movie was the computer decided that the only winning strategy is not to play the game.

Jim Scott:

Yep

Dr. Kirk Borne:

And I was pretty insightful.

Jim Scott:

It is a classic movie. "Would you like to play a game?"

Dr. Kirk Borne:

Well of course the whole movie is you think the computer is actually about to launch an all-out nuclear attack on the Soviet Union at the time. You realize that it was just playing the game to try to figure out what is the best strategy and then, as I said, it learns that the best strategy in a nuclear war game is to not start the game in the first place.

Anyway, that's reinforcement learning. So hopefully alpha gal is learning somewhat more beneficial outcomes than that.

Jim Scott:

So before I go on to the next question that I've got, this is where I like to drop in a tip of the day. So this tip of the day, there's a really funny quote. In a previous podcast I mentioned someone that I follow, John Arundell, @bitfield is his Twitter handle. This is absolutely hilarious because I feel like I've lived this. There's a little comic with it and it says "I have a very particular set of skills. If you disable pasting into password fields, I will look for you, I will find you, and I will kill you." Now this is clearly a quote from Liam Neeson in "Taken", but I couldn't agree more because I hate it when I go to sites that they force you to type the password, you can't copy and paste it which means you're never going to use a strong password, which kind of implies to me that they want weak passwords used on their site.

So the tip of the week is what happens in environments when people have shared accounts, shared passwords? They have some common user, they launch and install software where they all use the same password to do it. Instead of setting up the environment properly for secure operations to know who did what and when they did it. Instead, 10, 50, 100 people are all using the same user id and password. It's a horrible bad practice you should not do it do not prevent people from being secure. I even saw a recently John McAfee was blaming Twitter for poor security policies for his account getting hacked I'm basically saying "Look, I'm a security guy, but I don't run twitter security, so people target me. They get my account. I can't do anything about it" You have any thoughts on this topic in general Kirk?

Dr. Kirk Borne:

Well, very interesting to hear the stories. I'm certainly a big fan of being able to cut and paste into the password field because I do try to use very secure passwords and sometimes there's some new combinations of numbers and characters, special characters, uppercase, lowercase, it's kind of tedious to type it all in or remembered for that matter. But certainly turns to sharing passwords, I guess I have a different experience because I worked for years in an organization. Certainly when I was in the national world and now I am in the contracting world with federal government, we have very strict policies about sharing passwords with anybody, including do not share your password even with the system administrator who might ask you to put your password. So I've always been extremely careful about sanitary password practices, and as a manager of contract for many years at NASA I gave those training sessions to my employees every single year. So I live and breathe by that aspect of data security, which is keep your passwords to yourself and never share. I think people should always be very careful to use secure passwords and not to share them with anybody.

Jim Scott:

Yep.

Dr. Kirk Borne:

Just humorously, if you saw my twitter feed at all this past week and I tweeted an article which was the top 20 worst passwords of 2017. And what's amusing is I looked at the top 10 or so and they were basically the same ones that are on the top 10 worst passwords that are used every year, not just 2017. So like the first one on the list is 123456, the second one on the list is the word itself password.

Jim Scott:

Well Kirk, have you thought for a second that maybe these people using these bad passwords are just rotating to that list and they actually do change them?

Dr. Kirk Borne:

Well, you never know what people are thinking. Yeah, I guess you're right, I can't claim I know what people are thinking so I guess I should be careful.

Jim Scott:

Alright, so on that humorous note, I'm going to read to you one of your tweets.

Dr. Kirk Borne:

Uh-oh.

Jim Scott:

Now I'm going to admit before I read it, I'm taking this out of context, purely for the sake of humor.

Dr. Kirk Borne:

Okay.

Jim Scott:

But I'm hoping you can elaborate on this for me. "#blockchain is a cure for baldness." So my question is Kirk, what exactly can block chain do for us?

Dr. Kirk Borne:

You mean besides curing baldness?

Jim Scott:

Yeah.

Dr. Kirk Borne:

Well of course the rest of that hashtag was all about how we over hyped so many of our technologies that we make ridiculous promises, so big data was at that phase a few years ago. I think we've sort of like slowed down the hype on big data, but you know it was going to cure all the worlds problems, it was going to solve poverty, and hunger and you name it. Cure all the world's diseases. So instead of saying big data will now do that, we're now saying AI and machine learning will to that. So we've just shifted the hype from big data to AI, but the hype still exists. And were really going to seller products to people or sell our ideas to people we have to try to shy away from the ideas that are represented with such ridiculous statements. Now of course block chain curing baldness, we can all laugh at that one, but you know there are other things that are a little less clear, which is why say block chain is going to solve all of our data security problems. We were just talking about protecting data with secure passwords and people will say "Well, block chain will solve that problem."

Well, in and of itself block chain doesn't do that because that's just a distributed, decentralized record of transactions, whether it's financial transactions or whatever. In the world of data analytics and data ops which were talking about today, block chain can be used to record who has touched the data, use the data, modified the data. So in a sense it's sort of a promenade tracker of how data has been used in changed over time. But there's nothing built in to the block chain decentralization paradigm that says anything about security or encryption. That's really what bitcoin does, which is an example of a block chain. And you have other kind of block chains, clearly. I mean for example your health record can be a block chain, so you can transfer your health record from one doctor to the next nowadays easily. If you have this electronic medical record in the form of a block chain, You can see who has done what to you and what medications and treatments you have had and so on.

So block chain is really good at sharing again sort of this knowledge of how something has been changed or used by different users and it is decentralized so that it does not need to be stored on one device, but across multiple devices. So it's like a peer to peer ledger, that's really its value. Then you need to add the layers of encryption, like bitcoin in order to get the data security piece of that technology.

Jim Scott:

Yeah so if I were to throw in my two cents on top of this, it's I don't think block chain alone is a complete solution, and I think a lot of people don't understand that. Fundamentally as a concept, the two real unique elements that have come out of the crypto currencies and all of the hype around block chain the concept of shared distributed ledger, which technically speaking is not a new concept, however the technologies that are available to implement it are significantly farther than where they would've been 10 years ago. And then the other is hashing the events in that ledger so that you don't have to worry about, let's just say, a specific oligarchy rewriting history. It's not possible to do that and you get long-term persistence out of it and the ability to prove history wasn't rewritten.

Dr. Kirk Borne:

Exactly. Yeah, I'm glad you brought that up because that's really one of the key things. Someone saw one of my tweets I remember a few months ago where I just grabbed some graphic of the block changes to include in my tweets because I like to include visual elements in my tweets if you haven't noticed. And basically the visual of the block chain look just like a linked data model that is you basically think of a data model where you have primary keys and you connect them, or if you think about just linked data in a data lake where you have common keys that link differently.

So they said "Well what's the special about linked data?" And of course what's really special is that the key, the link if you will, is the hash of the previous entry in the block chain. And so that is to say, you wouldn't have to change every single element of the black chain in order to change that hash to change the previous one and the previous one and the previous one from the previous one and the previous one all the way back to the beginning. There's lots of concepts that prevent that from happening, one of which is proof of work. That is, if you make the block chain, you can basically put a hash on there which is not easily reproducible by anybody before the next time that block is added to. So by the time someone adds to it is Artie too late to try to change anything earlier in the chain. So that level it of encryption, which is what mining bitcoin is about, which is basically guarantees, unless you have an infinite computer and an infinite number of nodes, you can never do it.

It's like 10 to the 176 years or something before you can crack it, so you might get lucky. It's like playing the lottery, you might get lucky and win the lottery, but more often than not you're going to play the lottery millions of times before you get it.

Jim Scott:

Exactly. So Kirk, were coming close to the end of the time. Before we wrap up, one of the things I like to leave the listeners with his actionable advice. So are there 1, 2, 3 different actionable items that you can think of to help our audience be able to deliver success with data science and data ops within their organization? Things that they can do, say, within the next 3 to 6 months.

Dr. Kirk Borne:

Well first of all is to, if you're not already having the development and operations seen as the analytic users, integrated in some capacity, that is talking to one another, if not under the same organizational umbrella at least talking to one another to have that tight coupling. In doing so, think about the minimal viable product. That is what are the small, incremental builds we can do to prove the value of what we are building here. Sometimes people say it, I say this myself, the analytics of data science, the analytics are the products, the outcome, the deliverables from the data team. So the data product might be an API, it might be a model, it might be a recommendation to a customer, it might be a recommendation to a client, it might be Jupiter notebook if you're sharing models with people. So think about the minimal product that you can build and deliver from the development team to the operational users and iterate it. Is it what I want? How can you modify it? How does it need modification? Is it satisfactory already? And start building that iterative, agile mindset.

And the other things part of all this is this concept of the experimental culture, culture of experimentation. So that the scientist in me coming out, but I think a lot of organizations, certainly what people call the algorithmic organization or even now call them mathematical corporation, is doing this process of experimentation rapidly. So think about something like eBay or Google. I heard stories from eBay years ago they basically rammed millions, literally millions of AB tests every single day to see what change of font, change of color, change of location on the screens, change of whatever leads to better consumer engagement. And so do all these little tests, and that's part of the data ops. You do lots of little tests, see what generates positive or negative results and build and build and build contest and build and test and build and test and build and test. So it's that experimental mindset, it scientific in the sense that you hypothesize that this thing will work, you built a model, you run the experiment, you measure the result, and you test whether or not it did improve or deliver the result. Then you refine the hypothesis and try again.

So people can do this with what they already have in place, they don't need to go out and buy something new they just get people talking together more frequently. And one of the things I have discovered from my NASA days, the value in this, which we never quite realized in my early days at NASA, but I think we caught on to later on. You have a test team, a team that actually test the system, the testers, and frequently the testers were just sitting around, I mean that's an improper statement, but they were basically waiting for weeks if not months for something to test because there were builds and deliverables that were on monthly timescales. Whereas with dev ops and data ops you can do, literally, unit tests and incremental tests daily or hourly even. So people will be more active, more involved, more engaged and delivering more value daily and regularly than on these long cycles where you sort of come back. It's almost like the joke was if you can design a battleship and build it and launch it, 20 years later find out it does not float. Well, wouldn't you rather find that out a lot sooner?

Jim Scott:

Oh, fantastic. Well Kirk, thanks a lot. We're basically at the end of our time for today. For all of us here at Data Talks, I'd really like to thank all of you for listening. I'd like to give you a huge thanks Kirk for taking the time to be my guest today on Data Talks. This was quite possibly the best podcast I have ever recorded and you certainly have provided me with a plethora of insights.

Dr. Kirk Borne:

Well thank you Jim, I really love doing it. I appreciate all your insights and questions because that feedback to me broadens my mind, so I really enjoyed this opportunity to learn together.

Jim Scott:

Great. And there's a couple of additional resources that I'll mention really quick and we'll have these on the page with the podcast. Booz Allen has a book called "Mathematical Corporation", all have the link up to that. I'd also suggest if you're not already following Kirk, give him a follow because when he does post his updates to his rocketdatasciencedata.org site, he always tweets them. The last plug that I will put in here is a book that I wrote called "The Practical Guide to Microservices and Containers", which you can find on mapr.com/ebooks. For all of us here at Data Talks, thank you very much for listening. I'm Jim Scott, I'm on Twitter @kingmesal. Be sure to tell your aunts and uncles and maybe even some of your friends about Data Talks. Before we sign off, I'd like to leave you with one final thought. [singing Star Wars theme 00:48:09]

Subscribe to the Podcast

Be the first to hear our newest interviews and other DataOps topics

Subscribe Now

Data talks - A Podcast to Help You Drive a DataOps Culture

Subscribe