Manage Your Sensitive Data from End to End: Data Discovery to Data Protection


Ted Dunning PhD.

CTO, MapR Technologies

Alex Gorelik

Founder and CTO, Waterline Data

Effectively identifying and managing sensitive data is foundational to successful data projects today and in the future. Given the increasing pressure to address compliance and regulatory requirements on their data, customers need a better way to effectively identify and manage ALL of their data and specifically sensitive data. Discovering sensitive data may sound easy, but it's often not. Data is always changing or being duplicated and requires constant tracking.

Moreover, with the amount of data that enterprises are dealing with today, it is nearly impossible to manually create an up-to-date catalog for auditing purposes. The future is sure to bring more compliance and privacy regulations, similar to what we've seen with GDPR and CCPA. MapR has partnered with Waterline Data to help enterprises intelligently tag sensitive data - e.g. Personally Identifiable Information (PII) - at scale through automation using Waterline's AI-driven data catalog platform.

Join Ted and Alex to learn:

  • How built-in security and governance capabilities from MapR complement Waterline Data's enterprise data catalog
  • How to gain a significant advantage with the ability to automatically discover, track, manage, and protect all enterprise data
  • Intelligently tag and manage sensitive data that may be subject to regulatory and compliance oversight


Ted: 00:08 Let's get started. There's a whole point of sensitive data management is as a critical step in the overall governance problem. Simply put, governance is about governing your data and about the forensic's aspect of finding that data. Today we're going to be about the partnership between MapR and Waterline that gives that. One key point of course, is in additional to governing and saving it, I'm sorry and detecting it, you have to be able to prove that you did this. You can't just pretend that you're doing governance.

Ted: 00:48 There's an awful lot of details about this. There's an enormous number of regulations it may affect. I think Alex is going to be talking about that a bit more. But, it isn't a simple thing. It isn't an optional thing. It's something that you really, really have to address.

Ted: 01:08 Let's talk a little bit about the governance and security side first. Simply put, managing this sensitive data consists of a few simple things. We have to find the data, we have to control and protect it, we have to respond to issues as they arise, and we have to document. Now, this is not optional. The sizes are big, they are growing, they're growing well beyond what you can do in a manual system and the risks are very substantial. We're seeing billion dollar fines due to not managing this well. Doing it poorly is going to be much, much more expensive than getting it right.

Ted: 01:50 We can go into the details, I'm going to go into just a few today about how we do this. There's lots and lots of aspects to security and the governance. One of the key things is that it has to be kept very, very simple. MapR station runs secure by default. This really simplifies your life. There's a straightforward recipe to security. That is, be able to express the restrictions that you want to impose. You need expressivity, be able to say what you mean and you need to be able to have the mechanisms available to enforce that, make that really stick.

Ted: 02:34 With MapR being secure by default, you have a big step there as the first part of your attempt. You have to be able to tell who somebody is, you have to be able to express what they're able to do, you have to be able to audit if that's what they've done and then you have to protect the data against adverse circumstances. Like crashes or crazy man with an ax steals disks and so on.

Ted: 03:00 Let's take just a very, very quick, few examples. One particular capability that MapR has for controlling the sensitive data that you find is that we can control access, control something at a level of something called elonging. Elonging is a unit of management in MapR to users, it works just like a directory but, to administrators, there's special properties. One of them is that you can set and entire access control expression. That means we can say, in this example, that we will allow reads according to the financial PII policy. Now, if Jane owns some of the files and there happen to be multiple Bob's in the company, every company seems to specialize in some repeated name. One of my companies it was Michael, another one it was Bob. So, we're using Bob in this example. So, she says user Bob gets permission but, of course it's the DevOps guys who like first name user names. So, that was Bob from DevOps not Bob from Finance. Bob from Finance can't read them, can read the data that's necessary because the VolumeACE and Bob from DevOps cannot.

Ted: 04:29 The point here is that we have administrative controls that are highly expressive and they can say only the financial PII qualified people can do that. That role can change over time, we can also, I'm sorry that's policy. We can also set roles on people and the attendant on a particular role can change over time, without having to go back in and change every user or having to go back in and change every bit of content. That's just an example of how you can say what you mean on that part.

Ted: 05:04 We also have things like comprehensive encryption on the wire, at rest, and layered encryption then as a device level itself. This mean that outside of the security set up, set up by MapR security systems, you can't access data. They also provide comprehensive audits so that you can publish or file access, file update, releasing changes and things like that, to the audit screen and then analyze that using a wide variety of systems that can access that audit stream via open force API.

Ted: 05:47 MapR makes it possible to govern your data. But, of course, if you want to govern you have to know what data, where it is. How are you going to find that out? How are you going to figure out which policies are appropriate? That's where the partnership of Waterline comes in. We've integrated our two systems together in an offering directly targeted at sensitive data management. MapR providing that platform, the security, the compliance ready lineage and things like that. Waterline on the other hand, is providing particularly the discovery and classification of that sensitive data.

Ted: 06:34 So, let's have a talk about how that goes.

Alex: 06:37 Thank you, Ted. This is a great overview. Let's continue and talk about hwy do you need to protect your data? Ted talked about the finance and the GDPR and the CCPA and they are certainly significant but, there is more than that. There's also customers sentiment and brand damage. The Facebook and Cambridge Analyticia scandal caused a lot of people to consider Whether they want to be on social media in general and Facebook in particular.

Alex: 07:06 There's also bad business practices. Inadvertently, leaking salary information or pricing. Can cause all kinds of problems inside the company. So, overall business really need to prepare for highly regulated world and really protect sensitive data. But, what data is regulated?

Alex: 07:32 If you look at this diagram most people will say, "Yeah, credit cards should be, salaries certainly should be protected. Intellectual property but, buying preferences, t-shirt sizes? I'm not sure." However if you look at the regulations, both GDPR and CCPA talk about all the data you know about the person. To GDPR article 24 specifically talks abut behavior and personal preference and attitude. CCPA goes into a litany of examples like how fast somebody drives, what they're sleeping habits are. That's just a few categories. It's just examples.

Alex: 08:12 So, in short, you really need to protect everything you know about your customer. It's no longer looking for a needle in a haystack. Use their social security number hidden inside there somewhere. It's really more about gathering all your hay into well organized stacks so you can say, "This is a stack of everything I know about my customer."

Alex: 08:35 How do you do that? How to identify sensitive data. It's a big challenge. There reason it's so difficult is because the regulations and the policies and the rules are all down in terms of business terms. On the bottom your data is described using technical names. Fields could be named anything and they could mislabeled and they could be unnamed altogether with CSV files with no headers. How do you connect the two to create an operation gap? This is manual, this very, very difficult and most companies have a lot of data but, they focus on the few critical data elements and they struggle even with those. But, with the new regulations when talks about all the data you know about your customer, critical data elements are just not enough.

Alex: 09:29 So, this makes especially difficult because there is so much data. Some of our customers, like Fannie Mae, just ten millions new files per day. Other customers like Kaiser have four billion fields. That's just a lot of data. The problem as I mentioned is it's not well labeled, right? Some tables like an Orical table might have nice name because those created those part of long data architectural process data workhouse but, the file might have cryptic names or another file might have no names at all, it's a headless CSV. How would you know that the first field is a first name? You actually have to look at the data and trying to do this manually, looking at the data, it just doesn't work for a petabyte enterprise.

Alex: 10:21 So, to help with that we went to Aristotle. Which is our AI driven discovery and tagging engine that crawls through all the data sets, process then profiles them and creates a fingerprint which is a collection of features about each field. Then works with those fingerprints to classify them and assign tags which are business terms to each field. Then analysts can curate them and say, "Yeah, you got this right, you got this wrong." This very similar to how new analysts would learn, right? If you hire and analyst they won't know what your account number looks like in your bank. So you would point them to a field and say, "That's what the account number looks like." Then they might see another field.

Alex: 11:06 Now, they assess, "Oh, yeah, that looks like an account number. Is that right?" And you will say, "Yeah, you're right, it's right." They might look for find another data set say, "That's also account number." Somebody might say, "No, you know what that's a department ID." What's the difference? Well, here's the difference. Aristotle does the same way. If you tell it, "Yeah, this one that you tagged as account number is department ID." It now knows, "Okay, here's a fingerprint for account number and here's fingerprint for department ID." So, when it looks at another field, it will say, "Okay, is it more like this or more like that." Maybe it's ambiguous enough where it might say it's both of those but one has higher confidence level, or it might be able to tell no, no, for sure this account number not department ID.

Alex: 12:02 So, the more curation it gets, the better it becomes and usually, the curations is done as part of normal pressing. If somebody works with the data set, they will look at the tags and say, "Yeah, that looks right." Or ask somebody to take look and help them understand it, whether it's right or not.

Alex: 12:25 If you want to comply to regulations first, you need taxonomy. You need to organize your tags in the glossary into taxonomy. Here you can see some examples of it. It covers of course, beyond just identifies and gets into things like social media data, status, you know, gender information and so on.

Alex: 12:49 Then what Aristotle does is it tags field. So, the screen shot is showing is how the fields in the data set called Employee Skills and you can see on the left, each row is a field in that data set so, employee ID for the skills is a field, name, first is a field and so on. For each of the fields Waterline sets yes different tag with different confidence levels and some have curated like first name has been curated by the user and so it doesn't have confidence level, it's a solid dash. Some of them are suggested so they have a dashed line on confidence level. But, based on those now, somebody can write the rule and this rule can contain both data and metadata and that's tag driven. So, instead of the typical approach to things like data quality or discovery where you write the rule and then you have to bind it to each data set and set, "Oh, for this data set, first name is in this field, last name is in this field." Now, you can write by rule. This rule applies automatically to any data set regardless of where it's stored, whether it's Orcale table or MapR. Data link, it applies to any data form whether it's CSV or JSON or Parquet and that applies to any field regardless of what the field is named.

Alex: 14:19 So, in this case the rule says, "If a data set contains fields tagged as last name and first name and the field tagged country and the field tagged country contains any of the U countries, Germany, France, and so on, regardless of the fields are called, tag this data set as GDPR detected. So, this way when a data set comes into the system or when you bring a new data source into compliance Waterline can automatically tag the fields and then use those tags to infer to apply more complex rules that combine data and metadata checks to infer whether GDPR is applicable or CCPA is applicable and so on.

Alex: 15:06 It's a very, very powerful way of managing your data state because it's all automated. The tagging is automated, based on tagging the rules are automated and you don't have to have data engineers spend countless hours taking a rule and binding it to each field or data stewards look at each data set and trying to figure out what each field might mean and tagging it, naming it.

Alex: 15:34 The other problem you sometimes have is that fields by themselves might not be sensitive but when you join them with other data sets now, you create regular risk. So, to help with that, Waterline helps you build what we called data objects. Using fingerprints it can tell you how to join different systems. What, I'm sorry, data sets could be joined and based on that you can create for example, data object that all the data you know about your customer. Chances are, there are pieces within different systems, data object can help you join data across those systems and give this one kind of blast view that you can work against. This just as useful, of course, for people doing analytics because if they want to know build some predictive model on customer return they want to know everything we know about the customer, this will tell them, here's where you look, here's all the information we have.

Alex: 16:40 Of course, even if you get to compliance and go through all your data. Compliance is not a one time event, right? For those of you who remember Plug Buster. My countdown on Plug Buster was my social security number. Nobody thought twice of it because well, it's an obscure number of digits. Now, I wouldn't think of making it so obvious. As more and more data becomes regulated, and these more and more regulations get passed, at this point here's already 50 countries that already passed state privacy regulation We have CPPA, which probably means we're going to have 49 other CCPA passed soon. I mean, more and more regulation is there so, you need more and more compliance rules and more and more data elements become regulated.

Alex: 17:31 So, this is an example that shows you the advantages of our patented fingerprinting approach versus traditional approaches of regular expressions of reference tables. So, I mentioned you came up with tax ID regular expression. You would have to re scan every data set. Even though you already scanned it for other regular expression, for this brand new one you just wrote, you're going to have to go and re scan it. It's going to take a lot of time and because there's so much regulation, so many new elements become sensitive, every time you have a new regular expression reference table, you're going to have to go through all your data.

Alex: 18:09 With Waterline because for the collected fingerprints for every field. All you have to do is, if you tag one of the fields in Waterline as tax ID, we have to compare fingerprint of that field with all the fingerprints in our fingerprint library. You actually don't go out and touch the data anymore. It's a very, very efficient way of handling new regulations and new rules and new tags.

Alex: 18:37 The other challenge you have is both GDPR and CCPA and most of the other data privacy rules talk about how you use the data, right? You just can't use personal data for the purposes other than for which is was provided. CCPA requires explicit consent. So you tell the user how you're going to use it and they have to explicitly opt out. GDPR has explicit consent which means you have to actually explicitly consent to it being used that way. But, in both cases for each data set you actually have to keep track of why you collected the data. If the Pizza Hut has your address to deliver pizza to you they can't use it for marketing unless you explicitly consent to it.

Alex: 19:29 So catalogs become the place where you can track all the compliance metadata. There's really no other place in the enterprise today to keep it. Because catalogs know about all your data assets this is a perfect to keep your compliance metadata. So, this shows you an example of in Waterline can create custom properties. So, you can create business purposes as customer property and keep track of why this data was collected in the first place.

Alex: 19:59 In fact, you can go and write a rule that says, "Tag all data sets that don't have business purpose filled in by GDPR regulated." And have somebody have to go and remediate and fill in business purpose before this data can be allowed to be used, for example. So, you can build self managing, self-healing, and self compliance system that don't 'have to be so many managed. These purpose can also be used as tags when you search for data. So, if I'm an analyst trying to do some marketing problems I might say, "Only show me data sets that were collected for marketing purposes." Because I know I don't have to white list them anymore against consent management I can just use them. So, it's a very convent to surface compliance metadata and make it usable right away in everyday work of the analysts.

Alex: 21:01 There is also data mobility laws. So, for example, GDPR, doesn't allow you to move EU data, data by EU customers out of EU without explicit approvals. Germany has even more strict laws. China has similar laws and other countries are getting into this as well, restricting mobility for their citizens because they feel like once it's gone out of their boarder they can't control it as much.

Alex: 21:31 So, one of the things Waterline provides to help is lineage. Lineage will do two things, one, is we can import it. So, if you have ETL tool, so other ways of capturing lineage, you can import to Waterline for rest APIs. But, a lot of times, people don't have lineage some people pearl scripts or PL sequel or some other way of doing things that has no inherent lineage built in. Then we can infer lineage, again, you see now fingerprinting in content we try to guess where data came from and then the data stewards can curate it or data engineers or approve or improve it. But, this really helps you if you find a data set that GDPR detect data that's not supposed to be there. You can track where it came from and try to understand how it got there so you can fix because just removing that data set might not fix anything because the next day it will get loaded again so, you can find the scripts or the tools that are doing it and change them to maybe filter EU data or figure out how to compliance some other way.

Alex: 22:46 So, in summary, there are six design principles for doing compliance. The first one the most important one is automation. Without automation in any size-able enterprise, even small enterprise, you just won't be able to find all the data that you have to, that's regulated. I mean, take a small example, even if you have just 100 tables, it's a small data base, and each table has 100 columns, that's 10,000 things somebody has to go through. If you use a gestation, have people fill our forms to say, what do you have that's sensitive, you basically relying on them knowing this 10,000 things in their head and being able to accurately predicted it. It's really not practical without automatic and Waterline has a very unique patented fingerprinting and AI technology that is brought to automatically discover and tag the data.

Alex: 23:46 You also need consistency. Regulations apply to all your data regardless of data source, format, and field names. With our rule engine, tags, the automated tag, create this consistent layer and the rule engine can apply those rules consistently, again, across the whole data estate regardless of where the data is.

Alex: 24:11 We've talked about the fact that regulations keep changing. So, agility is very, very important, you have to be able to introduce new rules or discovery and catalog your data limits without have to re scan your whole enterprise because that's just a non-starter. So, again, with fingerprinting you're able to do very efficient incremental processing and just to add Waterline in general, crawls thorough all your data and does incremental crossing. Only looks for new data sets, new participants and existing data sets and processes those automatically. So, everything is always up to date and you don't do any unnecessary de processing of any old data.

Alex: 24:54 Fourth one's persistence. You need a place to keep track of all this and this place has to be somewhere where everybody can access. Waterline catalog gives you a perfect place to keep all you compliance metadata. The state of it, which regulations applicable, what the business purpose is and so forth.

Alex: 25:16 Expressiveness is very important. You're rules need to combine data and metadata, they need to be able to tell things like, if this data, if this field contains this type data and there are these types of data elements in the data set and maybe properties set a certain way, then there are different actions you might need to take. You might notify different people, and so forth.

Alex: 25:42 And finally, you need openness. Compliance really require a lot of tool. Data masking tools, data access there, there's all kinds of different tools that might have to be complaint. Including tools that pertain to the fine and use data for analytics an other purposes.

Alex: 26:04 So, in Waterline we've wrapped everything we do in very robust dressed APIs. And in fact our own catalog use those dressed APIs for the metadata discovery platform and some of our customers develop their own catalog applications for their own purposes that are very specialized but, they use the same APIs that our catalog uses.

Alex: 26:26 So, with that I'd like to turn it over to Ted. Thank you.

Ted: 26:38 I'm back and I think this is a really exciting opportunity. There's some big risks here but, there's some big opportunities. It's very exciting what Waterline can do with the data that we store on our MapR platform. I think we've actually got some really good questions that have been coming in over the chat. We should be moving to that. Everybody should keep in mind that there's they can still be asking those questions. David would you like to jump in and moderate?

Q & A

David: 27:13 Yeah, so just a reminder, you can submit a question at any time in the chat box in the lower left hand corner. So, yeah, let's get going here. So, the first one I see, Ted and Alex, is how are the fingerprint features created?

How are the fingerprint features created?

Alex: 27:32 Let me take this one. Fingerprints are created by basically, reading each data set, recognizing its format, profiling it and then creating a fingerprint out of combination of we call, contents, which is based on profile data itself but, lot of features of the data and so forth and context which is, you know, what other fields are there? Is this a really customer based or product based? For example, CVV code, it's a three digit number, right? By itself it could be anything but, if you see it with a credit card, it's very likely to be a CVV, security code. So, there's a lot of features, hundreds of them, that we've brought to use to classify accuracy.

Ted: 28:28 Alex, I guess I'm supposed to be answering but I actually have a question for you. It really sounds like these fingerprints then are not going to a final interpretation of the data but, they have somehow related information necessary so that when situations change, you can build new interpretations based on the raw fingerprint without going back through all of the data. Is that correct?

Alex: 28:57 That's absolutely correct. That's the power of this AGIL approach, is that fingerprints are collected as we process the data and we don't have to go back to it to classify it. So, it's beyond new classification so for example, a company can add a new field and says, "Yeah I know what this is, this is, department code." We don't have to go back to the data to actually look at fingerprint library and do all the prediction.

Ted: 29:36 That's really cool.

Can fingerprint features be created with text data or any data not in tabular CSV form?

David: 29:40 I think this is a follow up to that question, it's from the same person. Can fingerprint features be created with text data or any data not in tabular CSV form?

Alex: 29:55 Text, we don't do it for text data. We're actually partner with specific prediction engines but, there could be a [inaudible 00:30:05] because [inaudible 00:30:07] APIs, you can actually use your MLP suites to do things like object conception and turn that into tags and tag the data sets back with who this is about, locations times and so forth. So that when people search they can find unstructured data with structured. But, we don't provide this out of the box and then second question was CSV. So, the CM is such a format [inaudible 00:30:38] and JASON, so it's only if there's some structure then we can create a fingerprint.

Ted: 30:52 And MapR is happy to help with the integration efforts going together named entity detectors along with Waterline. That sort of thing is inherently going to be fairly customized.

David: 31:11 Just reading through this one. So, here's another, can you speak to your unstructured classification capabilities as well as how well your solutions skills based on quality of unstructured data. Terabyte, petabyte, et cetera.

Ted: 31:34 Why don't I start that and then hand that to Alex. So, unstructured data classification is something that a lot of people do on MapR platform. They really, the quality, the ability is that really infringe on the particular application. We have a lot of people doing on text, some on images, some on transaction stream which ultimately need keep sometimes unstructured techniques. The feeling of it is good, so it's the system because the extraction is the amount of work for extraction is proportional to the amount of data you have not the amount of processing that you have in the MapR system but also proportional to the amount of data you have so the total time typically does not scale up as you increase data size. Waterline and it's almost time to hand it to Alex on this, Waterline is particularly effective at maintaining the good scaling because of the fingerprint nature. Once data has been processed once, the fingerprints largely private the need to process it again. Unlike a processed new data.

Alex: 32:54 Just as the [crosstalk] so, Waterline was deigned to be a scalable. It's deigned on big data the platform, it runs on Spark, it uses Solar as the search engine. So, all the components are scalable, can scale out and give you the [inaudible 00:33:15] that you need. Then because of [inaudible 00:33:19] usually the classes data that needs processing. So, it's a the very nature of [inaudible 00:33:32].

Do you plan to do GDPR tagging automatically using deep learning?

David: 33:36 So, Alex here's another one for you. Do you plan to do GDPR tagging automatically using deep learning?

Alex: 33:49 So, we're not looking at GDPR at this point. So, as I mentioned, the challenge with GDPR it covers everything you know about your customers. Which is very different with different companies. Maybe some companies would do different things from the payment companies, they put all different things from manufacturing companies and so on. Consumer, retail companies, really, really depends on what your company is doing and most companies have a fairly good idea across the company, what they [inaudible 00:34:23] one person who can do it. So, by [inaudible 00:34:26] tag based from the same tag creates this, we're basically helping companies very quickly and effectively create a [inaudible 00:34:39] govern them.

Ted: 34:47 I'd like to add to that a little bit about deep learning. We get the machine learning technique. You need to balance the difficulties that you have in future extraction, how much domain manage is critical here and then how compile the actually learning step is. Whether it needs to compensate for weakness in the main knowledge or weakness in features. For this kind of tagging, we have very, very strong features. Waterline has very, very strong domain knowledge and so the need for deep learning as itself is much decreased relative to something where somebody's trying to find hate speech in a video, there the features are weak, the main knowledge is difficult, because it's shifting and so you need very, very powerful machine learning techniques. There is nothing magical about deep learning. It's nearly one of many methods we have available for attacking learning problems. For any given problem it may be appropriate or not and for the most part in this sort of domain it's not particularly necessary.

Alex: 36:10 You know, here you don't have a lot of training volume either because each data that might occur hundred times, few hundred times, thousands of times, but, you don't know what it is until it's been classified. So, training becomes challenging.

What algorithms are used for creating feature saved fingerprints to identify P2 elements?

David: 36:46 So, yeah, I have another question here and I think it's for you, Alex. What algorithms are used for creating feature saved fingerprints to identify P2 elements?

Alex: 37:04 So, [inaudible 00:37:10] feature of [inaudible 00:37:14] meta data, context based on what kind of data it is and so forth. We do have some out of the box pertained things like mains and IP address and emails that are intuitive but, like I said, a lot of the data has to be specific to your company. Your account number will be different, the information that you know about your customer will be very different from even your competitors.

Ted: 37:58 And in general, in any mature serious product that uses machine learning you aren't going to be able to say, these are the three algorithms that we use. There's going to be dozens of algorithms used in different points in the product and so it doesn't actually, well, doesn't make sense in a short webinar to try to innumerate all the different ways advance algorithms are used. It's a very, very difficult question to answer concisely.

Can you please provide and example of how a regulation change implement with fingerprint eliminates re scan of the data?

David: 38:31 Thanks guys. So, I think you guys answered this a little bit earlier. So, maybe, I think this might be expanding a little bit on a previous question. Can you please provide and example of how a regulation change implement with fingerprint eliminates re scan of the data?

Alex: 38:56 So, it mentioned that well, before CPVA a driver license was not necessarily regulated by [inaudible 00:39:11] just to protect them but, there was maybe no specific regulation. But, it becomes regulated, you need it to tag one field [inaudible 00:39:22] what it looks. Then it just goes to all the fingerprint [inaudible 00:39:31] well, with that field if driver [inaudible 00:39:34] all the fields are very likely with different levels of customer drivers licenses. And you don't have to re scan data because of the fingerprint and it knows about each [inaudible 00:39:45] not going to be able to do that.

Can your solution be integrated with preexisting features stored?

David: 39:56 Okay. Thanks Alex. I think this one's for you too Alex. Can your solution be integrated with preexisting features stored?

Alex: 40:09 I'm going to need probably more information on that.

David: 40:16 Okay. So, if you could, go ahead, Alex.

Alex: 40:22 My email's on the screen so, shoot me an email, let me know what preexisting features and what kind of features. [inaudible 00:40:37]

my company is global and has already addressed GDPR compliance, does that mean we are done or are there other things we should be thinking about with compliance and new requirement like ACPA?

David: 40:37 Okay. Alright so, my company is global and has already addressed GDPR compliance, does that mean we are done or are there other things we should be thinking about with compliance and new requirement like ACPA? I think that's for you, Alex.

Alex: 41:13 In general, in our experience working with out customer even the ones who feel the GDPR compliance, just recently we're with a customer and we're on Waterline the front point 5% more sensitive data than they expected. That they found manually. The risk is the same. This data will be used and will be leaked if you data set gets hacked you actually have to report it and then you'll be in big trouble because you don't got access. So, I think even for companies that went through manual semi-manual process of compliance, I would highly recommend you to like Waterline to double check and make sure that you don't miss anything and to make sure that any [inaudible 00:42:08] have to be go through this process to get [inaudible 00:42:12] because most companies use either the station or regular expression so some other custom thinks, which are very, very expensive. And stations are quite error prone because there's a lot of data. I mentioned if you have, you know, no matter if you have, there is no way you're going to be able to accurately describe each one of them. Even if you have a small data base with 100 fields, over the 100 tables and each table has 100 fields.

How does this solution compare to open source solutions that might be available?

David: 42:53 Sounds like I got a question for you, I think we're having, I apologize everybody I think we're having a little bit of issues with Alex's phone. Ted, let's jump over to you. How does this solution compare to open source solutions that might be available?

Ted: 43:10 I guess I'm the open source guy. Just by the nature of how open sources developed. It typically is developed to scratch a particular itch at a particular time. As said, it rarely open source, you rarely have the luxury of taking a comprehensive view of a problem and we wanted a little bit of a Swiss cheese sort of approach. That can be very, very good if the sort of problem is on that succumbs to a few particular attacks. Storage, data communication, running an entire class of programs, these are the sorts of things that work well. But, when it comes down to a really elaborated production quality, classification of thousands of kinds of data, it isn't nearly as viable a developed asset. And so, there are no, that I know of, solutions for finding sensitive data that are anything like comprehensive. Typically what you'll have is libraries that help you find pattern but, there will be no patterns provided. Just a general pattern search capability. So, that's just a very far cry from the actual going the last mile and doing hard work of finding these actually patterns and describing them and coming up with what's needed.

David: 44:46 Okay. Thanks, Ted. Ted, which phone would you like to go to next?

Ted: 45:01 Sorry I muted again. I think we're actually pretty much done. Only additional ones that I see are a little bit redundant to the questions we've already answered. So I think we ought to, unless somebody comes up with a new question right now. We ought to declare it done.