Things Every Big Data Executive Should Not Know

Contributed by

11 min read

Sometimes, what you don't know is a valuable thing.

I presented a talk recently in the executive briefing track at the Strata Data conference in London that dealt with this idea. The talk was playfully titled "5 Things Every Executive Should Not Know", and it explored from a new perspective some of the key issues that executives who work with big data must face if their organizations are to be successful. In this posting, I'd like to expand on two of those five things that, for best results, you should not know.

Let's start with a truly fundamental idea about something that it is important not to know:

You should NOT know exactly how your team(s) will implement each major goal.

It's useful to set and communicate goals and to provide guidelines about deadlines and the resources available. But if you also are dictating exactly how those goals will be met, or even if you are just involved in all the minute details, you may be in danger of micromanaging. What's wrong with that?

  • For one thing, it means you may be distracting yourself from other things that are very much your responsibility and are important for you to be doing.
  • Micromanaging also can be demoralizing to your team if they do not feel that they "own" their own tasks or that their experience and ideas are not valued.
  • If you spell everything out yourself, you may lose the chance to take advantage of the specialized or up-to-the-moment expertise of your team members.

This last point may be particularly appropriate if you work in a field that is rapidly changing or work with emerging technologies and approaches -- such as big data tools, streaming, machine learning. Many executives have years of valuable experience to draw on, but they may be less familiar with very new technologies than are some of their younger workers. Executives who are good at their role know how to manage well in traditional settings, but the world is changing quickly. How should executives manage in the world of big data?

Big data technologies, IoT sensor data and new AI and machine learning methods are changing the way business is done across a wide range of industries including manufacturing, telecommunications, retail, financial industries, medicine, heavy industries, transportation and agriculture.

It makes sense to harvest the collective knowledge of your team, and one way to do that is to set the big goals and then, to some extent, get out of their way.

That doesn't mean to have an entirely hands-on approach to management. In a recent conversation, Adrian Cockcroft, currently VP of Cloud Architecture Strategy at Amazon Web Services and previously a cloud architect at Netflix, explained his approach. He feels it is helpful to have 1:1 meetings with senior directors and team leads. Instead of using these meetings as just progress reports, he especially wants to talk about what isn't working. That makes it easier for them to be frank about problems and use the interaction to collaborate on solutions.

In addition, according to Adrian, there's also valuable input beyond just goal setting that you can offer to your team members. Even though it's a problem to tell people exactly what to do as they figure out how they will implement a project, it is a good idea to tell them some things not to do. That way, they benefit from your experience and judgment and do not have to re-discover mistakes form themselves. At the same time, they keep the freedom to think for themselves, to inject new ideas and new expertise and to "buy-in" to project goals because they have had a hand in designing the implementation. For example, you might leave a design plan open for developers but request that they avoid chatty protocols.

Ted Dunning, CTO at MapR, agrees with Adrian, saying "In many ways, the ‘don'ts' outlast the ‘do's' in being useful to guiding successful implementations. You can give people a wide berth for innovation while guiding them away from proven pitfalls.

Similarly, Wayne Cappas, VP Professional Services at MapR, thinks it's an important aspect of a leadership role to weigh in when different factions in a team are in disagreement. He finds it effective to have members of his PS team make their own choices about how to design and implement the projects they build for customers, but it's not a good idea to let a difference of opinion go on too long. In those situations, he finds it best to hear the different points of view and then take responsibility to make an executive decision about the way forward. As soon as he does that, individuals put their energy into making things work instead of arguing about why one approach or one tool might not work. It's another example of striking a balance between micromanagement and giving teams the advantage of a leader's experience.

You should NOT know which machine your data or a particular service is on.

An old-fashioned way to handle data and services is to assign them to particular servers on a rack. That style, illustrated in this figure, has limitations based on non-optimized resources and vulnerability to single-machine failure.

Notice the problem: In a traditional system such as the one pictured here specific data and services are assigned to specific machines. Notice the labels. Systems such as this do not have the advantages that a modern distributed system (dataware) can provide.

Modern dataware to handle storage and management for a distributed, large-scale system can help you avoid the problems that arise when data computation is assigned to one machine. It's helpful to use a data platform designed with these capabilities both on-premises and in cloud deployments:

  • Give you the reliability of a distributed system for your data at scale
  • Orchestrate data and handle issues at the platform level automatically, providing convenience and efficiency

  • Works with containerization frameworks that orchestrate computation across a distributed system

You should NOT know all of the machine learning and AI projects your teams will do in the next 18 months.

The point of this advice is that machine learning and AI projects have enormous potential value, but there also is some risk, as generally, they are also speculative. This idea does not mean to proceed without planning. It is not only useful but essential to plan carefully for your current machine learning project in terms of realistic deadlines, access to data and resources, and a clear course of action that will tie machine-learned-insights to specific business goals. We've previously discussed how best to do this in two blog posts, "AI All Over the Place: Where Does AI Pay Off?" and "With Machine Learning and AI, the Win Isn't Always Where You Think". But the idea here is, what about the next project?

If you have dataware that handles much of the burdensome task of machine learning logistics - the data and model management that is necessary for these projects to work - at the data platform level you have an enormous advantage. Not only will development and deployment your known project work more smoothly, but you're ahead of the game for the next project. You can take advantage of sunk costs in terms of design and infrastructure, which, in turn, lowers the entry cost of additional machine learning projects. With lower entry and overall cost, the chances go up that these projects will pay off. Looking at it another way, with proper design and infrastructure, you can afford to try new things, and that makes you susceptible to success with one or more of them.

This ready-for-the-next-thing approach is particularly useful for machine learning and AI, where the experience of a first project encourages people to recognize great new opportunities when they show up. You're ready to move in a timely manner, to take advantage of an opportunity, without having to build a whole new system for each new endeavor. Not all data platforms have the capabilities to serve as an efficient dataware layer for machine learning systems, so your choice of platform does matter.

If you are to remain able to act when the opportunity arises, it means you won't know all of the machine learning or AI projects your teams will carry out over the next year or two. To know exactly what will happen means that you've given away the necessary flexibility and efficiency you would need, hence this third idea of what you should not know.

As you plan for large scale systems that handle storage and management of data and computation efficiently at the dataware level, please be aware that the MapR Data Platform is designed and engineered to do just that.

Try It for Free

Give it a try, for free: You can try the MapR Data Platform for via web or VM sandbox here.

Additional Resources

Free eBooks on Kubernetes, AI and machine learning, Apache Spark and more here.

Whiteboard Walkthrough video with Ted Dunning "Big Data in the Cloud"

Blog post: "Practical Tips for Data Access and Machine Learning Tools" by Ellen Friedman

Blog post: "CSI, Kubernetes and Dataware: Data Storage for Containerized Applications Just Got Easier" by Ellen Friedman

Blog post: "NVIDIA Data Science Workstation & DGX Pod for AI & Machine Learning" by Jim Scott

Whiteboard Walkthrough video with Ted Dunning "High Level View of MapR's Multi-API Access to Files, Tables and Streams"

This blog post was published May 29, 2019.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now