Preeyanka Shah and Cara Crawford
January 15, 2021

Initial Learnings from the “COVID–19 Cases and Deaths in Criminal Justice Facilities Dataset”

Over the past few weeks, our data science team has been exploring the Covid–19 Cases and Deaths in Criminal Justice Facilities Dataset to find trends in facility outbreaks to-date. As a quick recap, we announced this dataset several weeks ago. It was compiled partnership with Covid Prison Data and the UCLA Law COVID-19 Behind Bars Data Project and aggregates data across a number of sources. The data set provides daily cumulative counts of facility-level positive cases, deaths and tests to-date for residents and staff where available.

Now that we’ve spent some time with it, we want to share some of our initial findings. We think this is important to enable researchers and the broader criminal justice community to make use of this dataset. These include challenges inherent to the dataset, early learnings from a comparison of max rate of spread (max Rt) between facilities and their surrounding counties, and where we are going next.

Data Quality

In an ideal world, we could have COVID-19 outcomes (including active, recovered and asymptomatic cases as well as deaths) for every person in every facility for every day based on frequent broad testing. Unfortunately, COVID-19 data for corrections does not look like this, which has presented challenges for the space.

One of these challenges is data coverage. This dataset is made entirely of publicly available data, so our coverage is dependent on what states report. Most but not all states report some facility-level numbers for adult prisons publicly. What is reported varies from state to state. Mississippi, for example, only reports positive cases, while others like California and Alabama provide information on both cases and testing. Inconsistent coverage made apples-to-apples comparison challenging across state lines.

Another challenge has been inconsistencies in how states report the same information. Some states, like Kansas, report the number of residents who’ve tested positive as a cumulative total, and note separately how many of those have recovered so far. Others, like Tennessee, only include those currently sick in their positive cases and list everyone who’s previously had the illness in the recovered numbers. This challenge is compounded by the fact that we’ve seen some states change reporting patterns over time. We are working to address inconsistencies we’ve found so far in the dataset.

The area most worth discussion is testing, both in how it is carried out and how it is reported. Without mass testing, facilities are almost certainly undercounting confirmed cases. Though some states do not report testing numbers, we can sometimes guess where mass testing is applied based on the shape (and scale) of a positive cases chart or by the notes on state websites when mass testing was implemented. It often shows up in the data as sharp spikes in cases, similar to that seen in North Carolina Correctional Institute for Women.

Image for post

On a very positive note, these spikes mean that facilities are mass testing shortly after an outbreak is suspected, which enables them to take appropriate action. However, it still makes it more difficult to understand how the virus spread over time (earlier, more frequent mass testing would produce a smoother curve in the data). Many states, like Indiana and California, provide extensive statistics around both cases and testing. Pennsylvania goes one step further in making the data easy to access by providing it as a downloadable CSV file.

Initial Analysis — Comparing Max Rt within Facilities to surrounding Counties

One question that’s been top of mind for corrections officials and the public alike is, “What is the relationship between the spread of COVID within facilities and in surrounding counties?” We aimed to compare the magnitude and date of maximum effective transmission rates (max Rt) to see if we could draw any conclusions. We used the methodology outlined in Systrom (adapted from Bettencourt & Ribeiro 2008) and leveraged county case data from the New York Times.

As we began this analysis, we found that for the majority of facilities there was too much uncertainty in the max Rt values to draw strong conclusions. Reliably calculating Rt values relies on the reporting of new cases, which is dependent both on the presence of a test result as well as the presence of the virus. An increase in the Rt on a particular day could be due to either an increase in transmission between residents or to an increase in the number of tests administered. Disentangling these two possibilities becomes even more challenging for facilities that only report positive, and not negative, test results.

Because most facilities (and many counties) implement mass testing after suspicion of an outbreak, and so we see very spiky increases in new cases. This led to very wide confidence intervals on max Rt rates, making it challenging to draw any firm conclusions. While we’d hoped to compare max Rts at scale and draw conclusions in aggregate, this turned out to be infeasible.

Instead, we’ve started to look at individual counties where the testing data was more complete. When we zoom into individual counties, like Jackson County, Michigan, we see many instances in which spread within the prison does not look the same as spread in the surrounding county. In this case, facility outbreaks appeared in separate waves (late March at Parnell and early May at G. Robert Cotton), and in both facilities the max Rt surpassed the rate of spread in the surrounding county.

This result is less provocative than a strong correlation between county and corrections cases might be, but it still helps us to piece together how the pandemic is spreading. Specifically, it tells us that community spread is not necessarily an indicator of future outbreaks inside of nearby facilities, and that corrections systems can’t assume the risk of a new outbreak is past if the disease has already peaked in the surrounding community.

Image for post
Image for post
Image for post

Next Steps

Recidiviz and our partners are committed to making Covid–19 Cases and Deaths in Criminal Justice Facilities Dataset useful to researchers and the broader criminal justice community. In that spirit, improving data quality is an ongoing priority. In a recent quality exercise, we identified 93 issues that affect data quality across 344 prisons and made this backlog publicly available. We also welcome feedback — you can submit suggestions and issues as you use the dataset. Our data exploration efforts are still in the early stages, and we’ll continue to post more findings throughout the upcoming weeks.

Copyright © 2017, Recidiviz. All Rights Reserved.