Ed Felten Explains the Power of Metadata

By TAP Staff Blogger

Posted on November 8, 2013


This summer, many Americans learned for the first time that the National Security Agency (NSA) is collecting the telephone records of millions of US customers and collects and sifts through large quantities of Americans' online data. It is not the content of a phone conversation or email exchange that the NSA is collecting; it is the ‘metadata’ of these digital communications that is being gathered.

Metadata is information generated as you use technology. For phone records, this includes the telephone numbers sending and receiving a call, time and date the call is placed, and length of call; for an email communication, metadata includes the location from which an email was sent and the size of the message.

Professor Ed Felten explained the power of metadata in his recent testimony at the Senate Judiciary Committee hearing on the “Continued Oversight of the Foreign Intelligence Surveillance Act.”

Below are excerpts from Professor Felten’s testimony.

Advances in technology have transformed the role and importance of metadata. When focused on intelligence targets, metadata collection can be a valuable tool. At the same time, unfocused collection of metadata on the American population gives government access to many of the same sensitive facts about the lives of ordinary Americans that have traditionally been protected by limits on content collection. Metadata might once have seemed much less informative than content, but this gap has narrowed dramatically and will continue to close.

Metadata Is Easy to Analyze

Telephony metadata is easy to aggregate and analyze because it is, by its nature, structured data. Telephone numbers are standardized, and are expressed in a predictable format: in the United States, a three digit area code, followed by a three digit central office exchange code, and then a four digit subscriber number.

Further, the massive increases in electronic storage permit us to maintain, cheaply and efficiently, vast amounts of data. This newfound data storage capacity has led to new ways of exploiting the digital record.

Sophisticated computing tools permit the analysis of large datasets to identify embedded patterns and relationships, including personal details, habits, and behaviors. As a result, individual pieces of data that previously carried less potential to expose private information may now, in the aggregate, reveal sensitive details about our everyday lives—details that we had no intent or expectation of sharing.

Americans Inevitably Create Metadata That Can Reveal Sensitive Details of Their Lives

As a general matter, it is practically impossible for individuals to avoid leaving a metadata trail when engaging in real-time communications, such as telephone calls or Internet voice chats.

Freely available software can be used to encrypt email messages and instant messages sent between computers. However, most of these secure communication technologies protect only the content of the conversation and do not protect the metadata. Government agents that intercept an encrypted email may not know what was said, but they will be able to learn the email address that sent the message and the address that received it as well as the size of the message and when it was sent.

Telephony Metadata Reveals Content

In the simplest example, certain telephone numbers are used for a single purpose, such that any contact reveals basic and often sensitive information about the caller. Examples include support hotlines for victims of domestic violence and rape. Similarly, numerous hotlines exist for people considering suicide … for sufferers of various forms of addiction, such as alcohol, drugs, and gambling. Hotlines have also been established to report hate crimes, arson, illegal firearms and child abuse. In all these cases, the metadata alone conveys a great deal about the content of the call, even without any further information.

Today, wireless subscribers can use text messages to donate to churches, to support breast cancer research, and to support organizations such as Planned Parenthood. The metadata alone reveals the fact that the sender was donating money to their church, to Planned Parenthood, or to a particular political campaign.

Metadata can expose an extraordinary amount about our habits and activities. Calling patterns can reveal when we are awake and asleep; our religion, if a person regularly makes no calls on the Sabbath, or makes a large number of calls on Christmas Day; our work habits and our social attitudes; the number of friends we have; and even our civil and political affiliations.

Aggregated Telephony Metadata Reveals Our Relationships

When call metadata is aggregated and mined for information across time, it can be an even richer repository of personal and associational details.

Metadata can identify our closest relationships. Two people in an intimate relationship may regularly call each other, often late in the evening. If those calls become less frequent or end altogether, metadata will tell us that the relationship has likely ended as well—and it will tell us when a new relationship gets underway. Analysis of metadata on this scale can reveal the network of individuals with whom we communicate—commonly called a social graph.

In short, aggregated telephony metadata allows the NSA to construct social graphs and to study their evolution and communications patterns over days, weeks, months, or even years. Metadata analysis can reveal the rise and fall of intimate relationships, the diagnosis of a life-threatening disease, the telltale signs of a corporate merger or acquisition, or the social dynamics of a group of associates.

Read Professor Felten’s full testimony online.

Ed Felten is Professor of Computer Science and Public Affairs and the Director of the Center for Information Technology Policy (CITP)—both at Princeton University. He served as the Chief Technology Officer with the Federal Trade Commission from January 2011 to August 2012. Professor Felten’s research interests include computer security and privacy, and public policy issues relating to information technology.