To Hypothesize Or Not?

Steve Miller recently wrote an article entitled “Science of Business vs. Evidence-Based Management” in which he contrasts the hypothesis-driven philosophy of what he calls the Science of Business, which aims to support decisions by discovering business best practices using the scientific method, with the hypothesis-less philosophy of Evidence-Based Management, which aims to support decisions by looking for trends and clusters in historical data without any need to define a reason why that trend or cluster occurred.

The Science of Business he describes as a top-down approach while Evidence-Based Management he describes as bottom-up, driven only by data and not hypothesis. Steve asks whether this distinction is important for business intelligence. In this post, I argue that the distinction is important, that it applies to all decisions and not just those supported by business intelligence systems, and further attempt to define when each approach has merit.

Data-Based vs. Rule-Based Decisions

To apply the distinction beyond business and to decision-making in general, we need new terms. For this article, I’ll define the approaches as:

  • Rule-Based
    Similar to Steve’s “Science of Business (SOB)”, these decisions use a rule as their basis. The rule may be a hypothesis, theory, law or rule of thumb, and may be tested or untested.
  • Data-Based
    Similar to Steve’s “Evidence-Based Management (EBM)”, these decisions use historical data as their basis. No rule or formula exists independent of the data, thus no hypothesis exists.

These don’t map exactly onto Steve’s definitions, but rather highlight a key distinction in his two approaches. I have purposely defined these terms in the context of the decision being made, rather than the supporting analysis, because I think that’s a crucial distinction. In my mind, analyzing historical data to come up with a rule that you apply from that point forward falls into the same category as creating a hypothesis, testing it and then using that rule ever after. Both assume a static or slow-moving system.

Static vs. Dynamic Systems

For every action, there is a reaction.

In static systems, actions produce predictable reactions over time. An action taken 20 years ago would produce a similar reaction to an action taken a week ago. Drop an apple from a tree and it falls. Every time.

In dynamic systems, actions produce differing reactions over time. An action taken 20 years ago might produce the exact opposite reaction when compared to an action taken a week ago. Rules change. Sometimes within days or weeks.

In between lies a spectrum, from systems that change slowly or not at all, to systems that change rapidly. Provided the timescale of the changes dwarfs the timescale of our decisions, we can consider the system static. Systems whose rules change every decade can be considered static when making weekly decisions. Eventually those rules may need to be updated, but not weekly, monthly or even yearly.

Different Approaches For Different Systems

I believe that rule-based decisions are best suited for static systems, while data-based decisions are best suited for dynamic systems.

Rule-based decisions have the advantage of being tested. The greater the number of tests a rule has been through, the higher confidence we assign to that rule. In science, rules start out as hypotheses, then become theories, and finally become laws. Rules help us understand a system better, enabling us to formulate new rules. Rules often validate each other as they are tested, creating a system of rules that we can have confidence in.

Data-based decisions have no such assurance. Correlations and trends might just be flukes. Statistical significance appears randomly all the time when you don’t have a hypothesis. Data-based decisions occur in a black box. You pour data in, a decision comes out. It’s essentially a gut-based decision made by a computer whose “gut” is vastly more analytical, but no less opaque than a human gut.

But given that correlations and trends can be calculated continously as new data arrives, data-based decisions can adapt to a changing environment. Fine adjustments can be made based on new data. Rule-based decisions are rigid. While new rules can always be developed, it’s usually not until after they’ve been stressed to the point of breaking that anyone notices the rule no longer applies (hence the recent housing crisis).

Data-based decisions can work at a finer granularity than rule-based decisions. The Amazon book recommendation system showcases what a data-based system can do. It provides recommendations tailored to each individual person. A rule-based system would require dividing customers into a few dozen segments and creating recommendation rules based on those segments.

The Tradeoff

The tradeoff between rule-based and data-based decisions can be boiled down to one of confidence vs resilence. Rule-based decisions, when tested properly, create a high degree of confidence in outcomes, but aren’t reliable long-term in a changing system. Data-based decisions can adapt quickly, but can be more influenced by flukes and inspire less confidence in their predictability.

On the flip side, rule-based systems driven by hypotheses excel when data is scant or non-existent, such as in new or emerging markets. Validated hypothesis give us a paradigm or understanding we can use to create predictions before any data exists. Data-based systems can only predict after enough data has been generated, limiting their use in new systems or those which change too frequently.

The Future

Over time I see data-based decision-making replacing gut-based decision-making for frequent operational decisions. Rule-based decision-making will continue to play a role in long-term strategic decisions and decisions based on structural aspects of a system. However, I think rule-based decision-making will undergo three key changes:

  1. Exploratory data analysis will become the primary driver of new rules and hypothesis.
  2. Rules will be monitored by data-based systems to ensure relevance and to help fine tune the rules.
  3. Re-formulation of rules will increase as data-based systems are used to detect outdated rules.

So does my distinction between rule-based and data-based decision-making make sense? What did I miss?

2 comments

  1. yandoodan says:

    No one’s commented since 2011? How lonely! I hope you don’t mind a very late poster.

    This is a false dilemma. In a robust methodology data exploration forms a basis for generating hypotheses, which then must be tested aggressively. (“Attempt falsification” is a common formulation.”)

    I see you arrive at something close to this conclusion, but don’t quite make it past the 90 yard line. Here’s the full version: You’ve formulated a hypothesis whenever you take action — whether you admit it or not. Best to make it explicit and probe it for problems before you act on it. When you don’t do this you court colossal failure.

    When you probe your hypothesis you will almost certainly find problems in your data. It takes additional probing just to find out which of these are important. Usable data can be consistently mapped backwards to the real world, showing that a set of rules were correctly followed to create the data. When this fails you got garbage. (Hint: it fails more often than not.) Even when it succeeds, the data “mean” what the rules specify, not what the label at the top of the column says. This can make a hash out of most big data exploratory studies.

    FWIW.

    1. trevor says:

      What you describe still falls under my rules-based approach. Hypotheses can come from data exploration, but they still are static rules. Validating those hypotheses only works in a static system.

      Rule-based decisions in dynamic systems perform poorly, especially when operating at low levels of granularity. For that you need a data-driven (aka learning-based) decision-making system.

      In the cognitive computing world, this would be the difference between a static machine learning model that’s tested and validated as you describe, and a dynamic AI agent that reacts to the environment based on certain goals.

      If a business needs to make fine-grained operational decisions, such as which books to recommend to each of a million customers or which prices to set on each of a million products, well-designed rule-based systems perform worse than well-designed data-based systems.

      For longer term strategic decisions, or for operational decisions in a slowly evolving environment, creating and validating hypothesis as you describe is definitely a better approach.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

«

»