Contributor: Ellen Friedman

MapR Converge Blog author, Ellen Friedman

Ellen Friedman is Principal Technologist at MapR. She is a scientist with a PhD in biochemistry from Rice University, a committer for the Apache Drill and Apache Mahout open source projects, and a speaker/author on a variety of big data and other technical topics. She is co-author of books published by O’Reilly Media, including AI & Analytics in Production, Machine Learning Logistics, Streaming Architecture,Introduction to Apache Flink and the Practical Machine Learning series. Ellen has been an invited speaker at Big Data London, Strata Data San Jose and London, Nike Tech Talks, Berlin Buzzwords, the University of Sheffield Methods Institute (UK) and NoSQL Matters in Barcelona.

Blog Posts by Ellen Friedman

June 11, 2019 | By Ellen Friedman

Who Are You? Cloud Advantages Vary Depending on Your Needs

Editor's Note: This is the 2nd blog post in a series on Cloud, Kubernetes, and Dataware. The first post in the series, "Rent or Buy: Should You Go to Cloud (or Not)?" How useful is the cloud? The balance between advantages or disadvantages...

Read more
June 06, 2019 | By Ellen Friedman

Data Version Control for AI and Machine Learning

Editor's note: This is the fifth in a series of blog posts on how to build effective AI and machine learning systems. The previous blog post is titled "Data Science vs Computer Science." AI and machine learning have huge potential value...

Read more
June 04, 2019 | By Ellen Friedman

Rent or Buy? Should You Go to Cloud (or Not)?

Editor's note: This article is the first in a series on Cloud, Kubernetes and Dataware. What does "cloud" actually mean? Not only is cloud a huge buzzword, it's also a very real trend that is driving adoption. What is cloud, really...

Read more
May 29, 2019 | By Ellen Friedman

Things Every Big Data Executive Should Not Know

Sometimes, what you don't know is a valuable thing. I presented a talk recently in the executive briefing track at the Strata Data conference in London that dealt with this idea. The talk was playfully titled "5 Things Every Executive Should...

Read more
April 30, 2019 | By Ellen Friedman

With Machine Learning and AI, The Win Isn't Always Where You Think

Editor's note: This is the third in a series of blog posts by this author on how to build effective AI and machine learning systems. The previous blog post is titled "Practical Tips for Data Access and Machine Learning Tools." Even if you...

Read more
April 09, 2019 | By Ellen Friedman

Practical Tips for Data Access and Machine Learning Tools

Editor's note: This is the second in a series of blogs by this author on how to build effective AI and machine learning systems. The previous blog is titled "AI All Over the Place: Where Does Artificial Intelligence Pay Off?" How do you...

Read more
February 06, 2019 | By Ellen Friedman

CSI, Kubernetes, and Dataware: Data Storage for Containerized Applications Just Got Easier

Put your applications in a container, and life gets better. That's a widespread idea that has some serious grounding in reality. Running applications in containers is now commonplace, driven by the benefits of portability (to any cloud or on-premises...

Read more
October 23, 2018 | By Ellen Friedman

AI, All Over the Place: Where Does Artificial Intelligence Pay Off?

Don't be fooled by the hype over AI. Just because there is a lot of hype doesn't mean there isn't huge potential value in AI. It would be a shame to miss out on the very real benefits of using artificial intelligence in your business, simply...

Read more
September 18, 2018 | By Ellen Friedman

Convergence Seminars - Stories from the Wild on Multi-Cloud, Kubernetes, Edge Computing, AI, and Machine Learning

So much is written about big data, it's hard to keep up. What if, instead of reading about it, you take a day to talk directly with executives, practitioners, technical experts, and thought leaders, who are leveraging data at scale across a wide range...

Read more
August 02, 2018 | By Ellen Friedman

Apache Software Foundation – 10 Questions with Board Member Ted Dunning

Editor’s Note: Ted Dunning, Chief Application Architect at MapR Technologies, sat down with Ellen Friedman to discuss his recent re-election to the board of the Apache Software Foundation. In this video, he discusses Apache's charitable core as well...

Read more
August 08, 2017 | By Ellen Friedman

TensorFlow, MXNet, Caffe, H2O - Which Machine Learning Tool Is Best?

If you ask ten data scientists, “Which machine learning tool is best?” you’ll likely get many different answers. But slightly more surprising is that if you ask any one data of those data scientists the same question, you’ll likely still get many different...

Read more
April 12, 2017 | By Ellen Friedman

Multi-Master Replication for Geo-Distributed Data: It’s More Than You Think

Businesses increasingly feel the need for data that can be shared and updated across data centers, whether on premise or from premise to cloud, at massive scale. And they need this to be done with low latency and high consistency, reliably and conveniently...

Read more
April 04, 2017 | By Ellen Friedman

A Strata San Jose 2017 Sampler: Machine Learning, Microservices, Streaming Data, IoT, Cloud & Containers

A couple of weeks ago I had the fun of meeting several hundred of the several thousand people who gathered in the San Jose Convention Center for Strata + Hadoop World 2017. Luck was with me from the start – I not only found a parking place each day, I...

Read more
February 17, 2017 | By Ellen Friedman

Fighting Advanced Persistent Security Threats with Anomaly Detection: Sometimes More is More

Does this sound disturbing? You try to reach a particular website only to find the site is down. But it’s not that simple. You try another site – also not reachable. And another and another… You look to social media for in-the-moment reports about what...

Read more
September 07, 2016 | By Ellen Friedman

MapR Data Platform: What's Important About "Converged?" | Whiteboard Walkthrough

In this week's Whiteboard Walkthrough, Ellen Friedman, Solutions Consultant at MapR, describes what happens when certain fundamental big data capabilities are engineered together as a part of the same technology. This brief overview compares the converged...

Read more
August 16, 2016 | By Ellen Friedman

Getting Past Preconceptions: How to Use Innovative Big Data Technologies

What can you learn from ordering a cup of coffee? Try asking for a cappuccino made with much less espresso than normal… generally, you won’t get what you ask for, even if you ask for it in a variety of different ways. I’ve done this experiment: I love...

Read more
July 19, 2016 | By Ellen Friedman

Mid-year Updates for Big Data Trends: Apache Kafka, Spark, Flink, Drill, and More

In January, I made predictions about six big data trends for 2016 (“What Will You Do in 2016?”). Now we’ve reached the mid-and-a-bit-more year, so it’s a good time to check them out and see how well these predictions match what has happened so far in...

Read more
June 21, 2016 | By Ellen Friedman

How Apache Kafka and MapR Event Store Handle Topic Partitions

Streaming data can be used as a long-term auditable history when you choose a messaging system with persistence, but is this approach practical in terms of the cost of storing years of data at scale? The answer is “yes”, particularly because of the way...

Read more
June 13, 2016 | By Ellen Friedman

SQL Query on Mixed Schema Data Using Apache Drill

You may have heard this statement before: _ Apache Drill does schema discovery on-the-fly._ What does that mean, and why should it matter to you? The power of SQL for business analytics is a given, but the challenge in big data settings is that SQL...

Read more
June 08, 2016 | By Ellen Friedman

Beyond Real-time Data Applications – Whiteboard Walkthrough

In this week's Whiteboard Walkthrough, Ellen Friedman, a consultant at MapR, talks about how to design a system to handle real-time applications, but also how to take advantage of streaming data beyond those in the moment insights. Here's the...

Read more
May 16, 2016 | By Ellen Friedman

Apache Kafka and MapR Event Store: Terms, Techniques and New Designs

Streaming data now is a big focus for many big data projects, including real time applications, so there’s a lot of interest in excellent messaging technologies such as Apache Kafka or MapR Event Store, which uses the Kafka 0.9 API. Terminology can be...

Read more
May 11, 2016 | By Ellen Friedman

Ideal Messaging Capabilities for Streaming Data

What capabilities should you look for in a messaging system when you design the architecture for a streaming data project? To answer that question, let’s start with a hypothetical IoT data aggregation example to illustrate specific business goals and...

Read more
April 25, 2016 | By Ellen Friedman

Evolution of Big Data Storage: How to Support Real-time Analytics at Scale

Organizations embracing big data are ready to put data to work, including looking for ways to effectively analyze data from a variety of sources in real time or near real time. To be able to do so at scale, and at speed, can make an organization free...

Read more
February 15, 2016 | By Ellen Friedman

Beyond Real Time: Getting Value from Streaming Data

Editor's Note: If you're interested in learning more about how streaming data can give you a competitive advantage, be sure to read the free O'Reilly ebook, Streaming Architecture: New Designs Using Apache Kafka and MapR Streams by Ellen Friedman...

Read more
January 05, 2016 | By Ellen Friedman

What Will You Do in 2016? Apache Spark, Kafka, Drill and More

Let’s have some fun. It’s the start of a new year -- we’re on the threshold of something new -- so let’s look forward to what you’re likely to be doing in 2016. Now I know the riskiness of making predictions – especially ones on record – but I’m happy...

Read more
January 04, 2016 | By Ellen Friedman

Better Fraud Detection: Managing Big Data Security

Banks are among the many businesses taking advantage of big data and IoT opportunities, including for mobile payments, online banking, and smart kiosks, but the huge quantities of personally sensitive data from these activities must be protected at all...

Read more
October 30, 2015 | By Ellen Friedman

First In-Hadoop Document Database: MapR Database is A Big Win for Big Data

There’s good news in the world of NoSQL databases that will put a smile on the face of developers and that should also make business leaders happy because it means shorter time-to-value. You can now enjoy the ease and flexibility of a document-style database...

Read more
October 22, 2015 | By Ellen Friedman

Walmart: Harvesting Value from Big Data with Hadoop & NoSQL

When it comes to Walmart, big data meets big retail in an impressive way. Not only is Walmart an industry leader in global ecommerce and brick-and-mortar retail, they’re also a leader in the use of Hadoop-based technologies to implement their new data...

Read more
September 24, 2015 | By Ellen Friedman

Apache Flink: A New Way to Handle Streaming Data

Editor's Note: If you're interested in learning more about Apache Flink, download the new book for free: _Introduction to Apache Flink: Stream Processing for Realtime and More _by Ellen Friedman & Kostas Tzoumas If you’re not already looking...

Read more
April 03, 2015 | By Ellen Friedman

Machine Learning at American Express: Benefits and Requirements

Curious to know how American Express uses machine learning successfully, in production, at very large scale? An audience of over 300 recently got a peek into this big data story thanks to a presentation by Chao Yuan, SVP at American Express who heads...

Read more
March 04, 2015 | By Ellen Friedman

Winning with Hadoop: Decisions That Drive Successful Projects

Big data challenges and opportunities are rapidly spreading across a huge number of organizations, large and small, in a wide range of verticals. Not surprisingly, people are turning to scalable solutions such as Apache Hadoop and NoSQL-based technologies...

Read more
February 17, 2015 | By Ellen Friedman

Real World Lessons: Stories of Hadoop and NoSQL Done Right

One of the best ways to figure out how to succeed with your own large-scale projects is to see what others are doing – what has worked for them and what has not. To help you do that, my co-author Ted Dunning and I have looked at a wide array of real...

Read more
October 16, 2014 | By Ellen Friedman

It’s About Time: Time Series Databases

Recording the time at which a measurement was made or an event occurred can make data much more useful for revealing valuable insights, so it’s no wonder that there’s an increasing interest in time series data and in methods and technologies for building...

Read more
September 18, 2014 | By Ellen Friedman

The Internet of Cat Toys

One cat, a radio collar, and a night on the town – this little adventure turned into an entertaining article in Wired magazine 8 August 2014 by Andy Greenberg about the creative use of a feline investigator to find weak points in security of the neighborhood...

Read more
June 02, 2014 | By Ellen Friedman

Finding the Zebra in a Herd of Ponies- A new look at anomaly detection

The second publication in the O’Reilly Practical Machine Learning series, subtitled A New Look at Anomaly Detection by Ted Dunning and me_,_ is being released this week. In the previous book, which focused on practical approaches to recommendation, we...

Read more
March 31, 2014 | By Ellen Friedman

Making Mahout Fast and Easy

Big changes are underway for the open source machine learning project Apache Mahout, and there’s a lot of excitement over this new work. Mahout is a library of very scalable machine learning algorithms that is part of the MapR Distribution for Hadoop...

Read more
March 03, 2014 | By Ellen Friedman

An Invitation to Practical Machine Learning

This blog was originally posted on the O'Reilly blog and reposted with permission. Does it make sense for me to have a car? If so, which one is the best choice for my needs: a gasoline, hybrid, or electric? And should I buy or lease? In order to...

Read more
February 19, 2014 | By Ellen Friedman

Advances in Apache Mahout: Highlights for the 0.9 Release

Scalable machine learning for Apache Hadoop-based systems got a boost recently when the Apache Mahout PMC approved release of the 0.9 version of Mahout. This release is the second in less than a year, and it’s another step toward a stable, mature scalable...

Read more
November 11, 2013 | By Ellen Friedman

Which Algorithms Really Matter? – Guidelines for How to Choose the Right Algorithms

Ted Dunning, MapR's Chief Applications Architect, recently presented an invited talk titled "Which Algorithms Really Matter?" at the CIKM conference in San Francisco on October 30th, and it's generated a lot of discussion. In less than...

Read more
October 04, 2013 | By Ellen Friedman

Storm is Gearing Up to Join the Apache Foundation

The well-known, open source project Storm is in the process of moving into the Apache Foundation group of open source software projects. This is a big step for Storm and for the community developing this already well-respected software. What is Storm...

Read more
September 11, 2013 | By Ellen Friedman

Building a Simple Recommender

One of the most accessible ways to use machine learning on big data scale is to build a recommender with Apache Mahout. It's one thing to build a recommender but quite another to build one that works really well. With the right approach, however,...

Read more
August 06, 2013 | By Ellen Friedman

Drill-ing on Amazon

Over 40 developers gathered recently at OSCON for an Apache Drill hands-on workshop in Portland, OR to learn what Drill is, how it can be used and to jump in and try it out. Jacques Nadeau, Drill committer and MapR engineer, and Ted Dunning, Drill project...

Read more
April 01, 2013 | By Ellen Friedman

How to Do It: New Approaches to Big Data Machine Learning and Analytics

It’s not just how you store big data but what you can do with it – and that was apparent as Java developers took part in Devoxx conferences in London and Paris last week. Participants had a lot to say about the international presenters, and among those...

Read more
February 19, 2013 | By Ellen Friedman

Apache Drill Making Progress: Boulder/Denver Big Data Meet-up 2-13-2013

A packed room greeted Apache Drill project champion Ted Dunning last Wednesday when he spoke on the CU campus to the Boulder/Denver Big Data group. Apache Drill is a new, truly open source project being developed by an international community. Ted described...

Read more
February 06, 2013 | By Ellen Friedman

Where does Apache Drill fit?

It’s been almost six months since the Apache Drill project launched in August 2012. The project is making great strides both in terms of community participation and code writing. In fact, we’re getting to the point where we hope people are starting to...

Read more

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now