Steve Miller wrote an article a couple weeks ago on using Bayesian statistics for risk management. He describes his friend receiving a positive test on a serious medical condition and being worried. He then goes on to show why his friend needn’t be worried, because statistically there was a low probability of actual having the condition, even with the positive test.
Understanding risk is an interest of mine, and while I’ve read articles about Bayesian math in the past, the math is above my head. I never studied statistics, nor do I plan to. But I am interested in the concepts behind statistics, so I can understand probabilities better. And I can do basic math. Steve’s article was dense with math I didn’t quite get, but I was able to translate it into something I could understand.
So now, for statistically challenged individuals, I present my translation of Steve’s calculations, Bayesian math for dummies.
Steve’s friend received a positive test for a disease. The disease occurs infrequently in the general population. The test accurately identifies people who have the disease, but gives false positives in 1 out of 20 tests, or 5% of the time. Should Steve’s friend be worried by his positive result?
In the example, we know four facts:
- Overall Incidence Rate
The disease occurs in 1 in 1,000 people, regardless of the test results.
- True Positive Rate
99% of people with the disease have a positive test.
- False Positive Rate
5% of people without the disease also have a positive test.
- Present Condition
Steve’s friend had a positive test.
The question is, given this information, what is the chance that Steve’s friend has the disease.
Before he had the test, we’d just use the overall incidence rate, since we have no other information. Thus, his chance would be 1 / 1000 = 0.1%. Given that he’s received a positive test result, the True Positive Rate of 99% looks scary and a 5% False Positive Rate sounds too small to matter. But what are his actual chances of having the disease?
The Long Way
Bayesian math presents an elegant way to calculate the chance Steve’s friend has the disease. Steve presents the math in his article. But let’s do it the long way, which is much easier for me to understand.
To gain an intuitive understanding of the problem, I translated from abstract probabilities to actual numbers of people. This allows us to normalize the percentage rates so we can compare them. Because while it sounds like we can compare the Overall Incidence Rate, True Positive Rate and False Positive Rate of 0.1%, 5% and 99%, each of these rates apply to different sized groups. And, as we’ll see, the size of the group it applies to makes all the difference.
For these calculations, we’re going to look at a population of 100,000 people, all of which we’re going to assume took the test. Out of those people, how many have the disease and how many don’t?
|100||have the disease (1 in 1,000 or 0.1%)|
|99,900||don’t have the disease|
Okay, 100 people have the disease. How many of these people tested positive or negative? Remember that we know that 99% of the people who have the disease test positive.
|100||have the disease|
|99||test positive (99%)|
|1||test negative (1%)|
Out of the 99,900 people who who don’t have the disease, how many of those tested positive or negative? Remember that 5% of those who don’t have the disease test positive anyway.
|99,900||don’t have the disease|
|4,995||test positive (5%)|
|94,905||test negative (95%)|
Now is where it gets interesting. How many people tested positive versus negative in our entire group?
So 5,094 people tested positive, but we know only 99 of those actually have the disease. The probability of actually having the disease if you test positive is then:
|99||tested positive, and have the disease|
|5,094||tested positive in total|
|1.94%||chance of having disease if you tested positive|
Which is the same result Steve arrived at, though with the much quicker Bayesian math.
The Short Way
For those who want a shortcut to arriving at this conclusion, I’ve translated Steve’s equation below.
|Incidence_Rate * True_Positive_Rate|
|( True_Positive_Rate * Incidence_Rate ) + ( False_Positive_Rate * ( 1 – Incidence_Rate ) )|
Or, with the numbers from this example plugged in:
|0.001 * 0.99|
|( 0.99 * 0.001 ) + ( 0.05 * ( 1 – 0.001 ) )|
Which comes out to the same result: 1.94%
With this small understanding, what can you do with Bayesian math?
Let’s try a business example. Suppose you’ve been doing sales demos and you’re trying to determine how effective they are at closing business. Let’s say your close rate is 10%. You discover that 80% of buyers received a demo and only 20% of non-buyers received a demo. Clear and convincing evidence that demos work, right?
So what is the chance of someone buying if they see a demo? Let’s plug in the numbers:
|0.10 * 0.80|
|( 0.80 * 0.10 ) + ( 0.20 * ( 1 – 0.10 ) )|
The result: only a 30.8% chance, or slightly less than 1 in 3 people seeing the demo will buy.
That’s it for now. Tell me how you’re using Bayesian math in your business, or your ideas on how to apply this in the comments below.