danah boyd Discusses Big Data as a Tool for Inclusion and Exclusion at FTC Workshop

By TAP Staff Blogger

Posted on September 16, 2014


Yesterday, the Federal Trade Commission (FTC) hosted a public workshop entitled “Big Data: A Tool for Inclusion or Exclusion?”. Academics, business and industry representatives, and consumer advocates gathered to explore the use of “big data” and its impact on American consumers, including low income and underserved consumers.

FTC Chairwoman Edith Ramirez points out that “a growing number of companies are increasingly using big data analytics techniques to categorize consumers and make predictions about their behavior. As part of the FTC’s ongoing work to shed light on the full scope of big data practices, our workshop will examine the potentially positive and negative effects of big data on low income and underserved populations.”

Several TAP scholars participated in the workshop. This post highlights the expertise danah boyd brought to the first panel, Assessing the Current Environment.


Dr. boyd points out that “Personalization is only made possible because you actually can position somebody in relation statistically to a whole variety of other actors through networks.” Personalization system creators are looking for correlations or probabilistic connections within the networks; but this involves data sets or people that don’t have a say about how the information is being used.

I think about this for example with Facebook. And keep in mind that all of these businesses have different reasons why they’re doing different things. Facebook wants to give you a service that if you have not signed up to their site before, they want [to ensure that] you don’t end up in this weird desert of no friends, no content, no nothing. Because that’s miserable.

One of the things that they have gotten much better at doing is determining before you’ve even shown up, what is the likelihood that you sit within a particular network. They can do this because of the fact that your friends have most likely added your email address to their system. So your friends made decisions to give information about you to Facebook. [Facebook] can assume that once they have that basic information, they can make decisions about people within the network, what do people like, what are they interested in. And they can start to say, ‘hey, might you be interested in this,’ and give you some channel to start engaging.

Dr. boyd highlights the challenges with data interpolated from networks:

What kinds of data are we talking about? That individual never gave over their information; they didn’t give over their list of friends. Their friends gave away them. And the site was able to interpolate.

We’re not talking about the tradeoff between a known individual and a data analyst. We’re talking about the way an individual is positioned intentionally or unintentionally within this network based on what they have or have not given over or what’s been given over about them without their realization of it.

Ethical Questions

Dr. boyd discussed some of the ethical challenges that arise as companies start to see a trend within the data they are working with.

Example 1: Bing Researcher Predicts Hospitalization

A researcher at Bing data is at a point where he can predict with a high-level of probability, dependent on somebody’s searches, whether or not they’re going to be hospitalized within the next 48 hours. That’s a really interesting puzzle.

Now the question is, what do you do with that information? If you were Microsoft and you were running Bing, does that mean you send up warning signs, ‘You are about to be hospitalized.’ That’s creepy, right? Does that mean you figure out a subtler way, a type of advertisement as a way of suggesting that they might think about it? Or do you not do anything because you don’t want to deal with the liability?

Example 2: JP Morgan Chase Predicts Human Trafficking

JP Morgan Chase does amazing analytic work to predict, with high levels of probability, whether or not somebody is engaged in trafficking of humans, particularly for sex. They can do this based on a whole set of financial patterns that become obvious.

Because they’re a company, they don’t know how to intervene in trafficking, why would they? So of course, they work with law enforcement. That sometimes is a good idea, and sometimes not. A lot of people who have worked on trafficking issues have identified why often law enforcement is not the best intervention point where social services is. How then do we think about the ethics of those responses?

These are examples of ethical questions companies struggle with when they are doing data analytics. They start to see a trend or they start to realize a correlation; and they ask, ‘how do we intervene in an appropriate way?’ Dr. boyd asked, “Are they choosing to do it in a way that we deem to be ethical or appropriate? What do they do with the information that they get? And when and where did they or should they make this information public? It’s not easy to work things out.”

Data Supply Chains

The data being gathered, analyzed, and held about all of us is compiled from a variety of sources: a friend sharing your email address with their favorite social site; Best Buy capturing information about you when you walk in their store; location and data use from smartphones; and, search and clicking behavior when researching vacation plans are a few examples.

Dr. boyd emphasized that we often discuss companies of recognizable brands that we hold accountable. “But we’re also dealing with data brokers—data brokers whose names nobody recognizes who are holding on to data, who are buying data at bankruptcy situations, who are capturing things and pulling together data sources that we don’t even know about. And this is one of the reasons why the space gets very murky.”

Often big data is discussed within specific silos rather than the complexity and breadth of the issues. “Washington has been talking a lot about data supply chains,” said Dr. boyd, “which is a really interesting metaphor. How do we start thinking about holding supply chains accountable when we’re thinking about these data issues? And not just in terms of the data brokers the FTC is looking at, but in terms of all of our own behaviors around this.

Probabilistic Systems Are Not Facts

Dr. boyd’s closing remarks included:

Finally, I want to end with a philosophical point, which I think is also about the state of being. The notion of a fact in a legal sense emerged in the 1890s, it’s a really modern concept. ... For better or worse, one of the things that’s coming up as a new equivalent of fact is rethinking probabilistic understandings. This is the big data element. This stuff is here to stay.

Part of it is understanding what probabilistic systems mean for our whole eco-system. Understanding probabilistic systems, you realize it’s not cleanly fact. How do you hold probabilistic systems accountable? How you think about their role in things like rule of law is going to be very, very messy. I say this because a lot of what we’re dealing with in terms of the systems that we’re trying to hold accountable are probabilistic systems which are not intended or designed to be discriminatory in a traditional sense, in the narrative of a fact. But they’re done in a way which ends up unintentionally doing so.

danah boyd is President and Founder of the Data & Society Research Institute. Additionally, she is a Senior Researcher at Microsoft Research; a Visiting Researcher at Harvard Law School; and an Adjunct Associate Professor at the University of New South Wales. She is also a Research Assistant Professor in Media, Culture, and Communication at New York University.