ACADEMIC ARTICLE SUMMARY

Ten Simple Rules for Responsible Big Data Research

Article Source: PLOS Computational Biology, Vol. 13, No. 3, e1005399, 2017
Publication Date:
Time to Read: 2 minute read
Written By:

 Alondra Nelson

Alondra Nelson

 Alyssa Goodman

Alyssa Goodman

 Arvind Narayanan

Arvind Narayanan

 Barbara Koenig

Barbara Koenig

 Emily Keller

Emily Keller

 Jacob Metcalf

Jacob Metcalf

 Matthew Zook

Matthew Zook

 Rachelle Hollander

Rachelle Hollander

 Seeta  Peña Gangadharan

Seeta Peña Gangadharan

 Solon Barocas

Solon Barocas

ARTICLE SUMMARY

Summary:

Use of big data in academic and industry research is growing. Studies of human psychology, biology, and behavior must be ethical. Researchers should start by recognizing that careless use of data can be harmful.

POLICY RELEVANCE

Policy Relevance:

Researchers’ use of big data should be sound, accurate, and maximize good while minimizing harm.

KEY TAKEAWAYS

Key Takeaways:
  • Over the past five years, the use of big data in research by academia and industry has grown; tools of big data research include:
    • Mining medical records for scientific and economic information.
    • Mapping relationships using social media.
    • Using sensors to record speech and actions.
    • Tracking individuals' movements.
  • Complex ethical issues arise in conducting big data research; all big data research on social, medical, psychological, and economic phenomena involves human subjects, and researchers ought to minimize harm.
  • Researchers should acknowledge that data points represent people and that data can harm people; even seemingly neutral datasets used to determine credit risk or shape criminal justice decisions can produce unfair outcomes.
  • Researchers should recognize that privacy is contextual, not simple; just because something has been shared publicly does not mean that use of it in research is unproblematic.
  • Researchers should guard against reidentification of anonymized data; many seemingly nonspecific factors such as battery usage, spatial location, birthdate, gender, zip code, and facial images can be used to identify individuals, especially when combined.
  • Researchers should debate tough ethical choices with colleagues and those in other disciplines on an ongoing basis.
  • Researchers should develop a code of conduct for their organization, research community, or industry.
  • Researchers should design datasets and systems to be audited, developing automated testing procedures and clearly documenting when decisions are made.
  • Researchers should know when to break the rules; in times of natural disaster or public health emergency, one might put aside questions of individual privacy for the greater good.

QUOTE

TAGS

Frank Pasquale

About Frank Pasquale

Frank Pasquale is Professor of Law at the Brooklyn Law School. He is a noted expert on the law of artificial intelligence (AI), algorithms, and machine learning. His work focuses on how information is used across a number of areas, including health law, commerce, and technology. His wide-ranging expertise encompasses the study of the rapidity of technological advances and the unintended consequences of the interaction of privacy law, intellectual property, and antitrust laws, as well as the power of private sector intermediaries to influence healthcare and education finance policy.

Kate Crawford

About Kate Crawford

Kate Crawford is a Research Professor of Communication and Science and Technology Studies at USC’s Annenberg School for Communication and Journalism and a Senior Principal Researcher at Microsoft Research in New York. Professor Crawford is a leading scholar of the social and political implications of artificial intelligence. Over her 20-year career, her work has focused on understanding large-scale data systems, machine learning and AI in the wider contexts of history, politics, labor, and the environment.

danah boyd

About danah boyd

danah boyd is a Partner Researcher at Microsoft Research and the founder of Data & Society. Dr. boyd's research focuses on the intersection of technology and society, with an eye to how structural inequities shape and are shaped by technologies. She is currently conducting a multi-year ethnographic study of the U.S. census to understand how data are made legitimate. Her previous studies have focused on media manipulation, algorithmic bias, privacy practices, social media, and teen culture.