Author(s)
Source
In Research Handbook on Big Data Law, Roland Vogl editor, Edward Elgar Publishing Ltd., 2020 (forthcoming)
Summary
Machine learned-based software increasingly plays a role in key decision-making systems. The software is “trained” using large datasets; designing effective, fair systems that make accurate predictions when applied to new data is challenging.
Policy Relevance
Humans are responsible for designing effective and fair decision tools.
Main Points
- Machine learning-based tools are increasingly used in important systems that make decisions about credit, university admissions, criminal sentencing, and employment.
- Most "automated" decisions systems rely on humans to make the final decisions; humans are responsible for ensuring that the system is ethical, effective, and unbiased.
- An "automated decision tool" is a computer program that collects data about a particular case, and generates a prediction based on the data for use in a decision; some automated decision tools are based on machine learning.
- A model is “generalizable” to the extent that it performs well beyond the dataset on which it was trained; models can display “generalizability” in many different ways.
- A model derived from New Yorkers’ 2007 loan repayment data is generalizable between sample and population if it performs equally well on the sample and the entire population of New York.
- The model is generalizable over time if it performs equally well at predicting loan repayments in 2007 and 2010.
- An accurate decision tool can perpetuate bias if trained using biased data; because historical discrimination created a wealth disparity between African-Americans and whites, a model that uses wealth to make lending decisions will not help alleviate wealth disparities even if the model is accurate.
- A machine learning-based process focussed too single-mindedly on optimizing accuracy based on the training data, will perform well in training but generalize poorly to the overall population; one algorithm, designed to distinguish wolves from dogs, “learned” to rely solely on the presence of snow in a photograph.
- Some observers prefer human adjudicators because humans do not need large training datasets to generalize; humans use analogies and analyze causation to make generalizations, and craft exceptions to rules as needed in light of the system's overall goals.