/

Covid–19 Cases and Deaths in Criminal Justice Facilities

Download CSV

The “COVID-19 Cases and Deaths in Criminal Justice Facilities Dataset” is the most comprehensive dataset on the spread of COVID-19 inside prisons in the USA.  It contains information on testing, positive cases and deaths on a per-day basis for both inmate and staff populations.  The dataset is based entirely off of data made publicly available by individual state and federal corrections organizations.

A constellation of groups including CovidPrisonData.com, UCLA COVID-19 Behind Bars Data Project and Recidiviz have been aggregating this data since the start of the outbreak.  Recidiviz’s participation has included manual data collection, historical checks for areas where we’ve found gaps, data aggregation and data quality investments.

If you end up using the data in an interesting way or are interested in volunteering with these data efforts, please reach out at covid@recidiviz.org.

Using the Data

This dataset is made publicly available but if you use this data, please cite this work as the following.

Kaplan, Jacob, Hoyos-Torres, Sebastian, Gur, Oren, Concannon, Connor, Littman, Aaron, Jones, Nick. Covid-19 in Prisons in the United States. Covid Prison Data, 2020. Retrieved from https://covidprisondata.com/data.html. See also: Dolovich, Sharon. (2020). UCLA Law Covid-19 Behind Bars Data Project. Retrieved from: https://bit.ly/2xyFfX6. Saunders, Jessica. (2020). Covid Custody Project Working File. Raw unpublished data. Recidiviz. (2020). COVID-19 : Prison / Jail Cases. Retrieved from: https://bit.ly/3dgo77t

Recent Updates

2020-06-18

We’ve updated this page and kicked off internal data quality efforts.  To provide transparency around these efforts, we’ve made our backlog of known issues publicly available.

Major changes to the dataset include:

  • Filling in gaps in death data across a number of states
  • Removing rows without any count data
  • Separating prison and jail data
  • Creating separate columns for cumulative and active cases.

Methodology

About the dataset

This dataset is aggregated from multiple data sources, and updated nightly.

Each row includes a ‘collections’ field, which lists which data sources contributed to that row’s data. It also has a ‘source’ field, which provides the original source of the data (typically a government website).  You can also find the component data at the sources listed in the citation above.  

Aggregation approach

  • We merge all available data into a single entry per day per facility.
  • We resolve merge conflicts by taking the highest value available across all sources for each column in each row. Resolving conflicts per-column-and-row, instead of just per-row, has the downside that columns may not add up for each row on each day (e.g., positive + negative + pending tested counts may not sum to the total tested amount, because they may come from different data sources). We selected this approach because it has the upside of losing the least data (that is, if two data sources collected different data points from one another about the same facility for the same day, the merged row will include all fields collected across all data sets).
  • We use the highest value in the event of a conflict, because most conflicts are due to data collection occurring at different times of day (and counts increase over the course of the day).
  • There are some rows where we determined that the most likely data is not the highest value across available sources.  In those cases, we will override the data and add a note.

On an on-going basis, we review facilities for data quality. Through this process, we’ve discovered missing data, improperly labeled data (e.g active cases being reported as cumulative cases to date) and other issues. We track these issues in a public backlog.

Caveats

  1. Facility-level data aggregated to the state level may not always line up to state-level counts provided by other sources.  If you are looking for the most accurate state-level data, we recommend the Marshall Project: COVID Cases in Prisons dataset.  <https://data.world/associatedpress/marshall-project-covid-cases-in-prisons>
  2. For many facilities, data is sparse.  This can be attributable to limited reporting or limited collection.
  3. We have not invested resources in data quality for staff-related counts NOR county jail facilities.

Public Bug Tracker

If you discover any issues in the data, let us know. The full list of known issues and their status is available through our public backlog.

Additional datasets

In addition to our prison data set, we’ve also collected information for a subset of county jails.  This data uses the same methodology as our prison data set although we have not yet made similar investments in data quality for jails data.

Link to Jails Data

Contributors

This dataset brings together the work of several organizations and individuals who have worked tirelessly to preserve a record of COVID-19 outbreaks in prisons and jails, including:

The team behind covidprisondata.com, a public website helping to track the spread of Covid in prisons.
University of California – LA
The team behind the UCLA Law Covid-19 Behind Bars Data Project.
Council of State Governments Justice Center
Recidiviz COVID Team
The team building recidiviz.org/covid, a tool to help criminal justice agencies make decisions about pandemic response.
Brian Chalmers
Camilla Handley
Emma Humphrey
Hailey Hannigan
Kayla Caracci
Kim Carney
Leslie White-Chalmers
Magen Ashley Young
Nancy Wang
Oscar Chagolla