Dirty Data, Bad Predictions: How Civil Rights Violations Impact Police Data, Predictive Policing Systems, and Justice

Artificial Intelligence and Privacy and Security

Article Snapshot


Kate Crawford, Rashida Richardson and Jason Schultz


New York University Law Review Online, Vol. 94, pp. 1-15, 2019


In many jurisdictions, police data is tainted by a history of racial bias, planted evidence, and distorted reporting. Predictive policing systems trained on this data may perpetuate bias.

Policy Relevance

Use of data from periods troubled by bias should not be used to train predictive policing systems.

Main Points

  • Increasingly, law enforcement uses predictive policing systems to predict when and where crime will occur, and to allocate police resources.
  • Predictive policing systems in at least 13 jurisdictions may have been trained using “dirty data,” data built during periods when policing practices were flawed, racially biased, and unlawful.
    • In such systems, police may persuade victims not to file reports.
    • Evidence may be planted on innocent people to meet quotas.
    • Ordinarily, less than half of violent crimes and property crimes are reported.
  • Systems trained on dirty data are more likely to be inaccurate or systematically biased, and may perpetuate unlawful or biased policing.
  • Currently, the developers of predictive policing systems have failed to give sufficient assurances that they have screened their data to eliminate bias, although some eliminate obviously biased arrest and stop data.
  • In Chicago, “dirty data,” including many unlawful police stops, was put directly into the city’s predictive system.
    • More than half of Black men under the age of thirty were given a high crime risk score, despite this being the demographic unlawfully targeted by police.
    • The predictive system failed to reduce violence.
    • Developers have no plan to address biased practices.
  • In New Orleans tainted data was probably used in predictive policing; the system’s predictions mirror the unlawful and disproportionate representation of certain populations in the data.
  • In Maricopa County, despite extensive evidence of dirty policing, public information concerning the use of data or the operation of predictive policing systems is lacking, and the risk cannot be assessed.
  • To avoid the harms of using biased data, jurisdictions should restrict the use of data during periods of unlawful and biased police practices.
  • Because of the lack of transparency and oversight mechanisms, it would be difficult for any predictive policing system relying on police data to operate in a fair and just manner

Get The Article

Find the full article online

Search for Full Article