Generalizability: Machine Learning and Humans-in-the-Loop

Innovation and Economic Growth and Artificial Intelligence

Article Snapshot


John Nay and Katherine Strandburg


In Research Handbook on Big Data Law, Roland Vogl editor, Edward Elgar Publishing Ltd., 2020 (forthcoming)


Machine learned-based software increasingly plays a role in key decision-making systems. The software is “trained” using large datasets; designing effective, fair systems that make accurate predictions when applied to new data is challenging.

Policy Relevance

Humans are responsible for designing effective and fair decision tools.

Main Points

  • Machine learning-based tools are increasingly used in important systems that make decisions about credit, university admissions, criminal sentencing, and employment.
  • Most "automated" decisions systems rely on humans to make the final decisions; humans are responsible for ensuring that the system is ethical, effective, and unbiased.
  • An "automated decision tool" is a computer program that collects data about a particular case, and generates a prediction based on the data for use in a decision; some automated decision tools are based on machine learning.
  • A model is “generalizable” to the extent that it performs well beyond the dataset on which it was trained; models can display “generalizability” in many different ways.
    • A model derived from New Yorkers’ 2007 loan repayment data is generalizable between sample and population if it performs equally well on the sample and the entire population of New York.
    • The model is generalizable over time if it performs equally well at predicting loan repayments in 2007 and 2010.
  • An accurate decision tool can perpetuate bias if trained using biased data; because historical discrimination created a wealth disparity between African-Americans and whites, a model that uses wealth to make lending decisions will not help alleviate wealth disparities even if the model is accurate.
  • A machine learning-based process focussed too single-mindedly on optimizing accuracy based on the training data, will perform well in training but generalize poorly to the overall population; one algorithm, designed to distinguish wolves from dogs, “learned” to rely solely on the presence of snow in a photograph.
  • Some observers prefer human adjudicators because humans do not need large training datasets to generalize; humans use analogies and analyze causation to make generalizations, and craft exceptions to rules as needed in light of the system's overall goals.

Get The Article

Find the full article online

Search for Full Article