See our model to understand how COVID-19 might impact your system →
Covid–19 Cases and Deaths in Criminal Justice Facilities
In order to power our model, we need accurate, up-to-date data on the real-world spread of COVID-19 in prisons and jails. A constellation of groups in the criminal justice space have been aggregating this data since the start of the outbreak. In collaboration with them, we’ve tried to round this dataset out with a combination of manual data collection and historical checks for areas where we’ve found gaps. Below you can find the full dataset, along with a list of the organizations who have done the heavy lifting.
Kaplan, Jacob, Hoyos-Torres, Sebastian, Gur, Oren, Concannon, Connor, Littman, Aaron, Jones, Nick. Covid-19 in Prisons in the United States. Covid Prison Data, 2020. Retrieved from https://covidprisondata.com/data.html. See also: Dolovich, Sharon. (2020). UCLA Law Covid-19 Behind Bars Data Project. Retrieved from: https://bit.ly/2xyFfX6. Saunders, Jessica. (2020). Covid Custody Project Working File. Raw unpublished data. Recidiviz. (2020). COVID-19 : Prison / Jail Cases. Retrieved from: https://bit.ly/3dgo77t)
This dataset brings together multiple datasets on the COVID pandemic in prisons and jails, including:
The team behind covidprisondata.com, a public website helping to track the spread of Covid in prisons.
University of California – LA
Council of State Governments Justice Center
The team building recidiviz.org/covid
, a tool to help criminal justice agencies make decisions about pandemic response.
Magen Ashley Young
- This dataset is aggregated from multiple data sources, and updated nightly.
- You can find the component data at the sources listed in the citation above.
- For aggregation, we merge all available data into a single entry per day per facility.
- We resolve merge conflicts by taking the highest value available across all sources for each column in each row. Resolving conflicts per-column-and-row, instead of just per-row, has the downside that columns may not add up for each row on each day (e.g., positive + negative + pending tested counts may not sum to the total tested amount, because they may come from different data sources). We selected this approach because it has the upside of losing the least data (that is, if two data sources collected different data points than one another, the merged row will include all fields collected across all data sets).
- We use the highest value in the event of a conflict, because most conflicts are due to data collection occurring at different times of day (and counts increase over the course of the day).
- Each row includes a ‘collections’ field, which lists which data sources contributed to that row’s data. It also has a ‘source’ field, which provides the original source of the data (typically a government website).