Since the launch of our public COVID-19 cases data set, one of our data science team’s primary objectives has been to explore different ways to capture the severity of outbreaks inside prisons.
Death rates, hospitalization rates, and positive case counts are all somewhat imperfect estimates of the true burden of a disease, as these measurements depend on the availability of testing, and the relative accuracy of these indicators may vary over time. This fluctuation presents challenges both when evaluating the overall impact of the pandemic in prison populations, and when comparing mitigation strategies employed at different facilities. One particular concern for data scientists is whether our results may look different depending on which outbreak marker is selected to evaluate prisons. This blog post captures our initial exploratory analysis around outbreak markers, and assesses prisons on several metrics that we were able to calculate using commonly available public data. Our results highlight which markers appear to be more consistent, in contrast with those that may be less reliable. These findings may be able to better inform the selection of metrics used in future analyses.
We explored the pros and cons of 6 different outbreak marker metrics.
These metrics were selected based on their use in other publications, expert recommendations, and available public data. Prevailing research supports the use of a combination of case-fatality ratio (mortality risk if infected) and transmission rate (Rt) to measure the likelihood that a disease becomes a pandemic (e.g., this article assesses both metrics for the H1N1 virus). We chose to track the case-fatality ratio, but our past explorations of Rt estimation with the current dataset showed too much statistical uncertainty to use because of inconsistent testing patterns. Hospitalization rates are another common metric that we were unable to use due to data limitations, and such a metric may also prove less reliable in detention settings, where hospitalization is often considered a last resort, pursued less readily than it may be among civilian populations. An extensive discussion of the strengths and limitations of different outbreak measures can be found here.
We focused our analysis on 208 state prisons that met the following criteria:
We look for correlation among the outbreak marker metrics to see whether, on average, higher values on one metric are associated with higher values on the others.
We are also interested in outliers where we saw severe outbreaks, which we defined in the context of this analysis as the facilities with the 10 highest values on one or more metric. We explored whether facilities that were defined as severe on one metric were also likely to be among those rated severe on other metrics. This gives us some idea of whether we would be likely to identify the same general group of facilities as COVID hotspots with all of the metrics.
First, we look at the distribution of each of the 6 metrics across all 208 facilities, which is shown in the histograms below.
Overall, we can see that all of the metrics are skewed toward zero — in other words, most of the facilities have not publicly reported COVID-19 outbreaks, and therefore have few, if any, reported cases or deaths.
In addition to a large count of facilities with 0’s for each metric, we see a small number of non-zero values, indicating some facilities with higher values. The percentage of positive test results shows the most variation across all of the metrics (i.e., more facilities with higher values), although this may be due to differences in testing or reporting practices rather than severe outbreaks. However, if we were to see high percentages of positive tests in facilities with fairly widespread testing, this would be concerning. In certain instances, facilities could be less inclined to report percent positive test results precisely because spread is less controlled, as reporting higher proportions of positive test results can cause alarm. Note that for the percentage testing positive, only 59 facilities are shown, as the remaining 149 facilities evaluated do not report negative test counts.
Next, we examine the correlations between these indicators, as shown in the scatterplots below.
We found the strongest correlation between the absolute and population-adjusted versions of the number of cases (r = .86) and number of deaths (r = .82).
This may be due to the fact that most facilities have reported relatively few cases, which leads to low values on both metrics by default. Population amounts were also relatively similar across facilities — the middle 50% of prisons had 1,015–2,407 inmates.
The next largest correlations were between cases per 100 inmates and deaths per 100 inmates (r = .73) and between total cases and total deaths (r = .65).
Even though death rates typically lag behind case rates by 2–3 weeks during an outbreak, one would expect a correlation between the cumulative cases and deaths once an outbreak has subsided. These metrics may not be as closely tied if we compare cases and deaths during an active outbreak. Further analysis would be required to validate this.
The percentage of positive tests had some correlation with cases and deaths, but there is a lot of variation which may be attributable to some facilities conducting proactive testing while others are only testing inmates who are suspected to have COVID-19. This may explain what appears to be a split in the correlation scatterplots. In several cases, there appears to both be a linear relationship between the two variables for some points, with a separate cluster of points showing almost no correlation at all.
Interestingly, the case-fatality ratio has a near-zero correlation with all five of the other metrics. This may reflect different tendencies towards testing (e.g., only testing those who are very sick versus all inmates).
Alternatively, if these did reflect differences in the risk of dying from COVID-19 across facilities, then examining differences in policies or pre-existing inmate health could prove useful in finding ways to reduce the risk of death among inmates who are infected. However, the absence of correlation could also reflect some cases of understated case-fatality ratios, as anecdotal evidence shows that some incarcerated people diagnosed with coronavirus are released from custody through measures like compassionate release, writs, and suspended sentences before their illnesses resolve. This leaves a plausible option of fatalities resulting after an individual is released from custody, meaning that the positive case is counted in the facility’s data set, but the fatality is not.
Of course, we are especially interested in comparing outbreak markers in facilities where severe COVID-19 outbreaks occurred. This prompted us to look at whether each outbreak marker identified the same facilities as worst-affected.
Because these metrics can be misleading for facilities with small numbers of cases (e.g., the four facilities with case-fatality rates of 100% only report one positive case and one death each), we restricted this analysis to facilities with 10 or more cases.
In the chart below, we look at the percentage of facilities with ten highest values on one metric that also had one of the ten highest values on another metric. For example, the 90% bar for “# deaths” indicates that of the 10 facilities with the most COVID-19 inmate deaths, 9 of these facilities also had one of the highest values for number of cases, number of cases/100 inmates, and/or one of the other metrics we included. In the chart, we see similar patterns to the overall correlation patterns.
Specifically, facilities with the highest total number of deaths and deaths/100 inmates were very likely to be among those facilities with the highest values on other metrics.
Facilities with the highest number of cases and cases/100 inmates were also likely to have high values but slightly less so, with only 6 of the 10 facilities with the highest values on these outbreak markers having similar ranks on other metrics. This is interesting in part because despite the very high overall correlation between number of cases and cases/100 inmates, the facilities with the highest number of cases differs slightly from the set of facilities with the highest number of cases/100 inmates. The facilities with the highest percentage of COVID-19 tests with positive results, and facilities with the highest case-fatality ratio, are not necessarily among those with the highest case and death rates.
For example, the Marion Correctional Institute in Ohio has the largest overall number of cases across facilities in our dataset, with 2,061 inmates testing positive as of June 29, 2020. This facility also has the largest number of cases per capita, with 85 cases per 100 inmates. While other facilities had higher total deaths and deaths per 100 inmates (Pickaway Correctional Institute of Ohio had the highest number of deaths; the Adult Diagnostic and Treatment Center of New Jersey had the highest number of deaths per 100 inmates), Marion was still among the 10 facilities with the highest values on these metrics as well, ranking 5th highest for death count and 7th highest for deaths per 100 inmates.
In other words, across all four of these metrics, Marion stands out as having a particularly severe COVID-19 outbreak. Marion Correctional Institute’s case-fatality ratio, however, is near the median for facilities with 10 or more cases, at 0.63%.
We could not calculate the percentage of positive test results for this facility, as only positive results, not total tests, were reported.
Only 4 of the 10 facilities with the highest percentage of positive test results were designated as having severe outbreaks on the other metrics. However, this divergence may be due to the fact that we were unable to calculate this outbreak marker metric for ~70% of the facilities included in this analysis.
The case-fatality ratio has the least amount of overlap with the other metrics, as only 3 of the 10 facilities with the highest case-fatality ratios had high values on the other indicators.
This could reflect variation in testing policies that overshadows any similarity with the other indicators. However, this could also show a true difference in the aspect of outbreak severity captured by this metric vs. the others: case-fatality ratio theoretically shows how deadly an outbreak is regardless of the number of people infected, whereas the other metrics also incorporate an aspect of how much transmission has already occurred. So, assuming perfectly consistent testing policies across prisons, we might still expect some divergence between case-fatality ratios and the other outbreak measures.
Based on publicly available data, we see that case counts, death counts, and population-adjusted case and death counts are fairly convergent: facilities with more cases are likely to have more deaths. When the data is available, population-adjusted values make it easy to compare facilities of different sizes. We suggest that analysts and decision-makers use population-adjusted case and death counts when possible for evaluating COVID-19 outbreaks in correctional institutions. Although there is some uncertainty introduced into these per capita metrics because population data may not be up-to-date, we believe these metrics still provide a measure of outbreak severity that is most comparable across facilities of different sizes. Our results also suggest that absolute death counts are likely to be reliable if total facility population is unavailable.
Our results do suggest caution in using any of the six metrics we explored for facilities with very small absolute numbers of cases.
And because testing policies vary significantly among facilities (ranging from preventative protocols in which asymptomatic individuals are regularly tested, to a triage approach, where a limited supply of tests are allocated only for symptomatic or high-risk individuals), some facilities that report small absolute case numbers may still be experiencing significant undetected transmission.
We suspect that for real-time decision-making, death-based metrics are unlikely to be sufficient because deaths usually occur several weeks after the onset of the disease.
While percent of positive test results is not strongly related to other outbreak markers, it may still be a useful real-time indicator of the current state of an outbreak and testing coverage.
The case-fatality ratio — the percentage of people with COVID-19 that die from it in a given facility — interestingly shows almost no correlation with the other 5 indicators. This is particularly fascinating given that it was a primary measure used by the CDC to rate the severity of a pandemic until the more comprehensive Pandemic Severity Assessment Framework was finalized in 2013. Further investigation is needed to understand why this rate shows such different results for outbreak severity in prisons. It is not clear if this is because of genuine differences in factors that lead to more cases versus deaths, or simply because this metric is biased by testing procedures and reporting.
Though case-fatality ratio is widely accepted in the public health sphere to monitor disease outbreaks, its applicability in detention settings may be limited, as incarcerated populations frequently fluctuate, and some individuals diagnosed with coronavirus may succumb to the disease after being released from custody.
Calculating the case-fatality ratio is further complicated by the relative opacity of facilities’ testing approaches and internal policies, and initial results lead us to believe that it may not be a reliable indicator for coronavirus spread in jails and prisons.
1. Note that recent decarceral policies may have reduced prison populations further than latest numbers we were able to collect
2. We made use of a public dataset of COVID-cases in criminal justice facilities that is updated nightly from various sources and publicly available data about facility populations.