Dirty Data, Bad Predictions: How Civil Rights Violations Impact Police Data, Predictive Policing Systems, and Justice

Article Source: New York University Law Review Online, Vol. 94, pp. 1-15, 2019
Publication Date:
Time to Read: 2 minute read
Written By:

 Jason Schultz

Jason Schultz

 Rashida Richardson

Rashida Richardson



In many jurisdictions, police data is tainted by a history of racial bias, planted evidence, and distorted reporting. Predictive policing systems trained on this data may perpetuate bias.


Policy Relevance:

Use of data from periods troubled by bias should not be used to train predictive policing systems.


Key Takeaways:
  • Increasingly, law enforcement uses predictive policing systems to predict when and where crime will occur, and to allocate police resources.
  • Predictive policing systems in at least 13 jurisdictions may have been trained using “dirty data,” data built during periods when policing practices were flawed, racially biased, and unlawful.
    • In such systems, police may persuade victims not to file reports.
    • Evidence may be planted on innocent people to meet quotas.
    • Ordinarily, less than half of violent crimes and property crimes are reported.
  • Systems trained on dirty data are more likely to be inaccurate or systematically biased, and may perpetuate unlawful or biased policing.
  • Currently, the developers of predictive policing systems have failed to give sufficient assurances that they have screened their data to eliminate bias, although some eliminate obviously biased arrest and stop data.
  • In Chicago, “dirty data,” including many unlawful police stops, was put directly into the city’s predictive system.
    • More than half of Black men under the age of thirty were given a high crime risk score, despite this being the demographic unlawfully targeted by police.
    • The predictive system failed to reduce violence.
    • Developers have no plan to address biased practices.
  • In New Orleans tainted data was probably used in predictive policing; the system’s predictions mirror the unlawful and disproportionate representation of certain populations in the data.
  • In Maricopa County, despite extensive evidence of dirty policing, public information concerning the use of data or the operation of predictive policing systems is lacking, and the risk cannot be assessed.
  • To avoid the harms of using biased data, jurisdictions should restrict the use of data during periods of unlawful and biased police practices.
  • Because of the lack of transparency and oversight mechanisms, it would be difficult for any predictive policing system relying on police data to operate in a fair and just manner



Kate Crawford

About Kate Crawford

Kate Crawford is a Research Professor of Communication and Science and Technology Studies at USC’s Annenberg School for Communication and Journalism and a Senior Principal Researcher at Microsoft Research in New York. Professor Crawford is a leading scholar of the social and political implications of artificial intelligence. Over her 20-year career, her work has focused on understanding large-scale data systems, machine learning and AI in the wider contexts of history, politics, labor, and the environment.