Analytics

8 Things You Should Know About Correlations

Correlation
From xkcd, a comic by Randall Munroe

In a recent article, Paul Borsch described how correlations get bandied about without the understanding that a correlation is not a cause. And he’s right…except that, as others have pointed out, “correlation is not causation” often gets used to discount correlations entirely. And correlations often do have causes.

So how do we think critically about correlation and causation, without being enamored by correlations, nor being dismissive of them?

In this article, the first of a three part series, I’ll tackle eight things to know about correlation. In the next two articles, I’ll address key things to know about causes, and then present tips for determining whether a correlation has an underlying cause and, if so, what it is.

Continue reading >

To Hypothesize Or Not?

Steve Miller recently wrote an article entitled “Science of Business vs. Evidence-Based Management” in which he contrasts the hypothesis-driven philosophy of what he calls the Science of Business, which aims to support decisions by discovering business best practices using the scientific method, with the hypothesis-less philosophy of Evidence-Based Management, which aims to support decisions by looking for trends and clusters in historical data without any need to define a reason why that trend or cluster occurred.

The Science of Business he describes as a top-down approach while Evidence-Based Management he describes as bottom-up, driven only by data and not hypothesis. Steve asks whether this distinction is important for business intelligence. In this post, I argue that the distinction is important, that it applies to all decisions and not just those supported by business intelligence systems, and further attempt to define when each approach has merit.

Continue reading >

2011 State of the Union Visualizations: Charts, Graphs & Infographics

State of the Union VisualizationsThe enhanced version of last night’s State of the Union speech demonstrated the value of quality data visualizations. Chock full of charts, graphs and infographics, the visualizations reinforced the President’s message with a clarity and lack of chart junk rarely seen in presentations.

Whether you agree with the President’s assertions or not, or disagree with the biases presented in the charts (as all charts have biases), there’s no denying the beauty of the visualizations. As CEO of a data visualization company, it made me happy to see good visualizations being used in the public discourse.

Below I’ve collected the charts, graphs and select infographics from the speech last night. I excluded the infographics I felt lacked any data visualization components. Click on the image to see a bigger version. A link below each image will take you to the part of the speech displaying the visualization or you can tweet a link to the image.
Continue reading >

Snapshots Are Key To Good Analysis

Do you make your decisions using real-time reports? Does your dashboard only show you your current metrics? Do your metrics fluctuate from report to report?

Your data analysis may be wrong.

By comparing irregular time periods, reacting to normal fluctuations and not knowing the previous state of your data, you can make bad decisions. Snapshots address these issues by recording the results of a data query at a specific point in time. By taking snapshots on a consistent schedule, you can see what’s changed, compare equal time periods for improved analysis and stop reacting to normal fluctuations in your data.

Snapshots help you see what changed and what didn’t. They allow you to manage by exception, detect trends and, when done right, ignore normal fluctuations. In a world focused on real-time reporting, snapshots give you needed longer-term perspective.

Continue reading >

Bayesian Math for Dummies

Steve Miller wrote an article a couple weeks ago on using Bayesian statistics for risk management. He describes his friend receiving a positive test on a serious medical condition and being worried. He then goes on to show why his friend needn’t be worried, because statistically there was a low probability of actual having the condition, even with the positive test.

Understanding risk is an interest of mine, and while I’ve read articles about Bayesian math in the past, the math is above my head. I never studied statistics, nor do I plan to. But I am interested in the concepts behind statistics, so I can understand probabilities better. And I can do basic math. Steve’s article was dense with math I didn’t quite get, but I was able to translate it into something I could understand.

So now, for statistically challenged individuals, I present my translation of Steve’s calculations, Bayesian math for dummies.

Continue reading >

Visualizing Big Data

100,000 rows - heatmap of Java source code A couple of weeks ago I was asked about how visualization related to Big Data. I am far from a Big Data expert, but I know the basics and have been studying and selling visualization solutions for the past seven years. This article describes my thoughts on visualizing Big Data, not as a definitive statement, but as an exploration of ideas. Continue reading >

A Model of User Driven Analytics

Here’s a diagram I threw together a while back breaking out the different parts of user-driven analytics. With automated analytics all the rage right now, I think there’s still a lot of untapped innovation and value on the user-driven side, and this diagram serves as my road map for building that out. Over the next couple months I’ll be talking about parts of this model, where Lab Escape is innovating in it, and where I see other opportunities for companies at the data, visualization and analytics layers.

For now, if you see parts you want me to elaborate on sooner, or have questions, post a comment or e-mail me.

A Model of User Driven Analytics

Use Indeed For Job Searches, Not for Trends

While researching an article requiring detailed employment and hiring data in the software industry, I ran across Indeed.com. For those not yet familiar with Indeed, it is a Google-style search engine for job postings. Type your skills in the search box and get back a list of job postings along with breakdowns by location, job title and estimated salary. Then click over to the “salaries” or “trends” tabs to see a history of the salaries or number of jobs for your skill set over the past couple years.

Fantastic, I thought. A few quick searches and I’ll have just the breakdowns I need to see which technologies dominate, which ones are gaining market share, and which ones are slipping away. In particular, I’m interested in the balance between Java and .Net and how the different web and Rich Internet Platforms compare to each other. Continue reading >