/

Covid–19 Cases and Deaths in Criminal Justice Facilities

In order to power our model, we need accurate, up-to-date data on the real-world spread of COVID-19 in prisons and jails. A constellation of groups in the criminal justice space have been aggregating this data since the start of the outbreak. In collaboration with them, we’ve tried to round this dataset out with a combination of manual data collection and historical checks for areas where we’ve found gaps. Below you can find the full dataset, along with a list of the organizations who have done the heavy lifting.

Download CSV

Kaplan, Jacob, Hoyos-Torres, Sebastian, Gur, Oren, Concannon, Connor, Littman, Aaron, Jones, Nick. Covid-19 in Prisons in the United States. Covid Prison Data, 2020. Retrieved from https://covidprisondata.com/data.html. See also: Dolovich, Sharon. (2020). UCLA Law Covid-19 Behind Bars Data Project. Retrieved from: https://bit.ly/2xyFfX6. Saunders, Jessica. (2020). Covid Custody Project Working File. Raw unpublished data. Recidiviz. (2020). COVID-19 : Prison / Jail Cases. Retrieved from: https://bit.ly/3dgo77t)

This dataset brings together multiple datasets on the COVID pandemic in prisons and jails, including:

The team behind covidprisondata.com, a public website helping to track the spread of Covid in prisons.
University of California – LA
The team behind the UCLA Law Covid-19 Behind Bars Data Project.
Council of State Governments Justice Center
Recidiviz COVID Team
The team building recidiviz.org/covid, a tool to help criminal justice agencies make decisions about pandemic response.
Brian Chalmers
Camilla Handley
Emma Humphrey
Hailey Hannigan
Kayla Caracci
Kim Carney
Leslie White-Chalmers
Magen Ashley Young
Nancy Wang
Oscar Chagolla
  • This dataset is aggregated from multiple data sources, and updated nightly.
  • You can find the component data at the sources listed in the citation above.
  • For aggregation, we merge all available data into a single entry per day per facility.
  • We resolve merge conflicts by taking the highest value available across all sources for each column in each row. Resolving conflicts per-column-and-row, instead of just per-row, has the downside that columns may not add up for each row on each day (e.g., positive + negative + pending tested counts may not sum to the total tested amount, because they may come from different data sources). We selected this approach because it has the upside of losing the least data (that is, if two data sources collected different data points than one another, the merged row will include all fields collected across all data sets).
  • We use the highest value in the event of a conflict, because most conflicts are due to data collection occurring at different times of day (and counts increase over the course of the day).
  • Each row includes a ‘collections’ field, which lists which data sources contributed to that row’s data. It also has a ‘source’ field, which provides the original source of the data (typically a government website).