A couple of weeks ago I was asked about how visualization related to Big Data. I am far from a Big Data expert, but I know the basics and have been studying and selling visualization solutions for the past seven years. This article describes my thoughts on visualizing Big Data, not as a definitive statement, but as an exploration of ideas.
Defining Big Data
Different people have different definitions for what constitutes Big Data. For the purposes of this article, I’m going to leave the definition vague. For me, Big Data includes all the movie recommendations in Netflix’s database, the shopping cart data for every customer of a grocery store or the minute-by-minute location of everyone with a GPS mobile phone, as well as smaller data sets such as the list of a million tasks in an enterprise project management system or the work history of everyone in LinkedIn.
Uses of Big Data
Proper use of data visualization requires first understanding the problem being solved. Big Data can be used for different types of problems. Roughly, the problems I see include:
- Automating Decisions
Data used to trigger a decision, such as low inventory levels triggering automatic re-orders or automated credit approvals. Big Data determines the decision here, either explicitly through rules or implicitly through inference-based systems.
- Creating Operational Data
Data used as input into algorithms, such as calculating average highway speeds when determining the optimal trucking route. Big Data influences the result here, but other factors may drive the final result.
- Supporting Decisions
Data used for a user-driven decision, such as determining how to arrange products in a store. Big Data informs the result here, but the final result requires human judgement.
- Creating Personalized Views
Data used to create personalized views for an individual, such as which movies to watch based on previous rentals. Big Data creates the result here, but the final result is a small data subset.
Each of these has different patterns of how people interact with Big Data. While automated decisions cut out people entirely, people are a central part of using Big Data to support decisions. And since visualization is entirely about people, these distinctions become useful.
Visualizing Big Data
Depending on the use of Big Data, the goals and techniques used for visualization will differ.
For instance, Big Data doesn’t need to be visualized when used in automated decisions, except to monitor and improve the algorithms. The actual process of analyzing the Big Data and implementing a decision doesn’t involve a human, so visualization is pointless.
On the other side, it’s difficult to use Big Data to support decision-making without the use of visualization. While you could, in theory, reduce Big Data to a single number, the value of Big Data is in the details, where data visualization shines.
If we look at the use cases illustrated above, the uses of visualization then become:
- Exploring Data
To help people explore data, either to support decisions or to improve the development of the algorithms used in creating operational data or automating decisions. Exploring Big Data often requires new types of visualization platforms that can support navigating and visualizing huge data sets.
- Monitoring Results
To help people debug and monitor the results of using Big Data, such as the improved purchase rate of a new recommendation algorithm. While the data driving the algorithm may be huge, the result data can often be visualized using existing tools.
For instance, you may be looking at 10 billion transactions, but only be concerned about the effectiveness of recommendations for 50,000 products. While 10 billion transactions require new types of visualization, 50,000 products can easily be shown on a heat map. It’s the result data you’re analyzing, not the source data.
- Finding Insights
To help people make better decisions using insights gained from Big Data. In this case, Big Data gets reduced to a manageable size through data mining and aggregation algorithms. While new visualization techniques may help in analyzing Big Data at the transaction level, most visualization today of Big Data is of summarized views…
…which is often good enough. In supporting decisions, it’s important to not attempt to visualize too much data. Analyze data at the level you’re optimizing it at, or one level deeper. Visualization can be used to see greater levels of detail, but greater detail can introduce noise that interferes with identifying the broad trends required for higher-level decisions.
- Exploring Results
To help people explore their own personalized view of Big Data. Here Big Data transforms to Small Data, and existing visualization tools can be used.
For instance, Netflix might analyze a billion movie ratings to recommend a movie to you, but they’ll only present 10-20 movies to you at once. Visualization can help explore more of these results at once, but you likely won’t be viewing the entire Netflix catalogue.
Do you agree with these categories? What did I miss? Do you know of specific visualization techniques well suited for Big Data? Give me your thoughts by leaving a comment below or on Twitter at FastFedora.