How Testing Can Kill

Doctors are sometimes terrible at statistics, and our biases -- including the hugely prevalent interventionalist bias in American medicine -- inform the way we look at numbers and probabilities in really dramatic ways. Here's an article ("Bias in the ER" from Nautilus) I read recently that talks about where doctors fall short, and how they can improve, in interpreting the numbers.

One really egregious example of this is in the way we often look at screening tests. I've had a lot of med students in my time as a tutor in medical school and my time as an educator in residency ask questions that shed light on how rudimentary are understanding often is when it comes to the risks of running a test. And we need our patients to understand this too! I wrote an illustrative example -- let's pretend there exists a disease called blarg cancer.

Drawing (1).jpeg

Let’s do a math exercise with made-up diseases and numbers. Let’s say we had an organ called the blarg and sometimes people get blarg cancer. Blarg cancer is bad and results in significant morbidity and mortality so we want to screen for it. We find a molecule that shows up in the blood of almost every single person with blarg cancer – what a great screen! We are so pumped. But we soon start to realize that sometimes it shows up in the blood of people that DON’T have blarg cancer. So some people we’re screening are showing up as positives even though they don’t have blarg cancer. But no biggie, right? It’s worth it because we’re catching people with blarg cancer we wouldn’t otherwise catch.

Let’s pause the story to take a terminology break: sensitivity is a quality measure of a test that tells us how often the test turns POSITIVE when someone has the disease you’re looking for. If a test is highly sensitive, this means that if you have the disease, the test is going to turn positive almost every single time. This means you’re catching basically everyone with the disease. So the screen for blarg cancer I described above is highly sensitive, because I said the molecule “shows up in the blood of almost every single person blarg cancer.” Another way of thinking about it is that a test with very high sensitivity has a very low false negative rate. People who get a negative test result can rest assured they almost certainly do not have the disease.

Another quality measure of a test is the specificity, which measures how many people WITHOUT the disease will have a NEGATIVE test. We call it specificity because the question we’re asking is: is this test specific to the disease we’re looking at? Will it turn negative every time someone doesn’t have the disease, or are there some cases in which it turns positive for a reason other than the disease in question? A highly specific test means if you get a positive test, you can be pretty sure that this person has the disease. In other words, a test with very high specificity has a very low false positive rate. The screen for blarg cancer I described above doesn’t have great specificity, because I said, “sometimes it shows up in the blood of people that DON’T have blarg cancer.”

Sensitivity and specificity are measures that we calculate for EVERY TEST we do in medicine! We often think of our tests as arbiters of truth-in-diagnosis, but that is a really dangerous myth. Unfortunately we just don’t have magic diagnosis-revealer wands we can wave over patients to determine if they have diseases we’re looking for, and we need to be really careful not to think of the diagnostic tests we do as diagnosis-revealer wands, because it can cause a lot of problems, some of which I’m going to outline now.

Alright, back to our blarg cancer screening test. Let’s say that our test has a sensitivity of 95%. Wow! A+! Such a good sensitivity. That means that if 100 people have blarg cancer, 95 of them will test positive. You’re catching almost everyone.

Let’s say the test has a specificity of 80%. That’s not so bad, right? Still a B? But definitely not as good as the sensitivity. It means that of 100 people that don’t have blarg cancer, 80 of them will have a negative test. In other words, 20 people out of 100 without blarg cancer will test positive.

Let’s figure out what all these numbers mean. Blarg cancer has a prevalence of around 1%. This means that if you picked 100 people from a crowd, 1 of them would have blarg cancer. If your hospital has a patient population of 100,000 then 1000 have blarg cancer. The sensitivity of your test means that if you start screening you will catch 1000*0.95 = 950 of the people with blarg cancer. The specificity of your test means that of the 99,000 people that don’t have blarg cancer, 99,000*0.8 = 79,200 of them will have a negative test. But – uh oh – that means 99,000-79,200 = 19,800 will have a positive test even though they don’t have blarg cancer.

Now most people see that number and say to themselves, “Well, it’s not that big of a deal. There’s a psychological discomfort but at least we’re catching the people that do have cancer!” But it’s not so simple. What do you do after you have a positive screen for blarg cancer? You have to do something about it! So you remove their blargs. All the positive screens, or 19,800 + 950 = 20,750 people have surgery to get their blargs removed.

Surgery complication rates for blargectomy are about 20%. That’s any complication, including anything from a minor infection of the site to something more serious. Perioperative mortality rates are much lower, 2%. That’s not too many people (good job, surgeons!), so it makes you not worry too much. It includes people that died in surgery, died shortly after surgery, or died for a reason directly related to their surgery (e.g. sepsis from surgical infection). So let’s do the math. If 19,800 people got surgery that didn’t really need it (false positives), then 0.02*19,800 = 396 people are going to die from complications of a surgery that they didn’t need.

Well, you think, that really sucks. But we did surgery on 950 people that really needed it! But wait a second – surgery isn’t always effective. Cancer really sucks, and even when we do surgery for blarg cancer, it’s only effective in the long run 30% of the time. Surgery saves the lives of 30% of people with blarg cancer, so you saved 950*0.3 = 285 people.

So think about that. You saved 285 people, but you killed 396. Still want to do that test?

And that’s not taking into account a whole host of other factors, like lead-time bias, complications of surgery that aren’t fatal but are life-altering, and the mental health implications.

“But Monica,” you’re probably thinking, “Blarg cancer and all those numbers are made up and this is a totally hypothetical scenario.” Yes, I made this example up, but it’s a teaching example that reflects REAL LIFE EXAMPLES. We have learned this lesson time and time again. Read up on PSA screeningCA-125 screening, and dementia screening. This lesson is the reason we only screen smokers for lung cancer. This lesson is the reason mammography for breast cancer screening is undergoing so many changes in recommendations.

“Okay, Monica, I’m convinced. So should we stop screening then?” NO!! That’s not what I’m saying at all! There are plenty of screening tests that have held up under scrutiny and proven themselves to be effective and worth potential adverse effects. The medical community currently holds up the pap smear as an example of a good screening test (when it's administered appropriately and at appropriate intervals), with high sensitivity and high specificity, with a high enough prevalence of disease to make it worth it, with the potential to save lives if caught earlier, and with a lower rate of adverse effects.

Cancer sucks. It takes so many lives from us and results in so much suffering and tragedy. Please don’t think that I’m not taking cancer seriously, or trivializing it. But we can do harm too. And we do. We HAVE to be careful in what we do as a medical community. We have to take that seriously.

So what’s the message then? The message is that as physicians it's our job to understand that everything we do incurs risk and that we need to work with patients to use the information we have to determine when that risk is worth it and when it isn’t.

Basically, use your brain. Do the math. Be a critical thinker. IT’S OUR JOB. Our patients need us!