Two recent COVID outbreaks have made headlines because of the high proportion of "breakthrough" cases, where people test positive for COVID despite being fully vaccinated. San Francisco hospitals in July reported 50 staff members tested positive for COVID, despite 75% of them being vaccinated. In Provincetown, MA, a July outbreak of the delta variant led to several hundred new cases, the majority of whom were fully vaccinated.

How can the majority of new cases be among the vaccinated? Some people wrongly (if understandably) worry this means the vaccines aren't working, particularly against the delta variant.

The truth is the numbers show that the vaccines ARE very effective, even against the delta variant.

The problem here is the reporting. News agencies are going for provocative headlines, and in the process are falling for what is known in statistics as the "low base rate fallacy" or base rate bias. In this case, breakthrough infections have a very low base rate, and all of other numbers are meaningless without keeping that in mind.

The low base rate of breakthrough infections should be the focus in reporting. For instance, the headline does not mention that the 50 new cases in July among SF hospital staff is out of 7,500 staff members,

**a breakthrough base rate of less than 1%**. They don't mention that until later in the article.__How can positive rates be higher among the vaccinated?__

What does it mean that, in the delta outbreak in Provincetown, there were a higher proportion of positive cases among vaccinated folks compared to unvaccinated? Does it mean the vaccine is ineffective agains the delta? NO. Because of the low base rate, when vaccination rates are high, more positive cases will be among the vaccinated.

What it really means is just that

*most people in MA are vaccinated*! To understand this, a picture is worth 1000 words. I came across this diagram that clearly shows what's going on.source: https://i.redd.it/cpjwvqv2s0d71.png

The left half of the diagram shows what happens in a highly vaccinated population, based on real-world data from England. There was a 2% chance of getting symptomatic COVID among the unvaccinated (red dots). For the vaccinated, it was a much smaller percentage who got COVID (only 20% of 2%, aka 0.4%). But since so many more people were vaccinated (over 10x as many), that led to a higher number of vaccinated people who tested positive (blue dots). This explains the counterintuitive result that even though the vaccine is working, more people who test positive are vaccinated. There's just so many more vaccinated people! (Notice there are still way fewer vaccinated who end up

*hospitalized*.)

The right half of the diagram shows what happens when a lower percentage of the population is vaccinated. First - and most importantly - there are

*more overall cases and hospitalizations*. Second, since there are more unvaccinated people, the higher proportion of people who test positive are unvaccinated. This matches our usual intuitions.__In summary__: When most people are vaccinated, it brings down the total # of positive cases drastically (yay!), but can end up with a higher proportion of positive cases among vaccinated folks. This can be confusing, because humans aren't great at doing statistics when there's a low base rate involved. It really helps to visualize it.Remember that breakthrough cases are normal and to be expected, and extremely rare in reality (much less than 1% in MA and nationwide). The vaccines help in multiple ways: they reduce positive cases and also lead to a lower rate of hospitalization among those who test positive (only 1% of vaccinated positives from the recent MA delta outbreak were hospitalized).

**TLDR: Vaccines are working well, even when a higher percentage of positive cases are among the vaccinated!**

__Epilogue: Simpson's Paradox__

The base rate fallacy leads to all sorts of funky, counterintuitive results:

- My favorite comes from baseball - In 1989 Andy Van Slyke had a higher batting average than David Justice (0.237 vs. 0.235). In 1990 Van Slyke again had a higher average than David Justice (0.284 vs. 0.282). But when you look at their combined 2-year battering average 1989-1990, David Justice is the one with the higher average (0.278, compared with Van Slyke's 0.261). How can this be? David Justice had far fewer at bats in 1989, so his combined average is skewed closer to his 1990 season.
- Simpson's paradox is what philosophers and statisticians call an association between variables at the population level that reverses when divided into subpopulations. Understanding Simpson's paradox has important ramifications for epidemiology and for equity.
- At the population level, vaccinated people are less likely to get COVID. But when comparing vaccinated to unvaccinated, a higher proportion of new cases can be among the vaccinated. This is Simpson's paradox. Of course, it's not truly a paradox - it is just so counterintuitive you have to think it through carefully. Every. Time.