Flash Cookies and Privacy II: Now with HTML5 and ETag Respawning

By Chris Hoofnagle

Posted on August 1, 2011

In 2009, my team at Berkeley showed that many top websites were tracking users through Flash cookies, and that some advertising networks were "respawning" or reinstantiating HTTP cookies that the user deleted.  Over the past two years, a chorus of advocates, regulators, and businesses condemned the practice of using Flash for unique user tracking.

This chorus was heard by many.  In our followup survey of Flash cookie practices, we found that fewer websites were using Flash cookies.  Thirty-seven of the top 100 websites were doing so, down from 54 in 2009.

However, we found two sites respawning HTTP cookies with Flash.  One--hulu.com--deserves particular attention, because we also identified that site as respawning using a third-party service (QuantCast) in 2009.  While QuantCast contacted us and turned off respawning within days, Hulu.com has moved their Flash respawning in-house.

Hulu is also worth mentioning because it was using a different, more persistent tracking technique to respawn user ids as well.  Using KISSmetrics, Hulu was able to respawn all persistent storage on a users computer including HTTP, HTM5, Cache, and Etag cookies.  Even if users disable first and third party cookies and enable do-not-track and private browing mode, this tracking method is still effective. 

Since the method sets first-party cookies, other sites that use the KISSmetrics service could match up their customers id numbers and share information.  For instance, Hulu could go to an information aggregator, such as Spokeo.com (which also uses KISSmetrics), and acquire information about its users that those users were unwilling to share themselves. 

This is why all those "trust" arguments about privacy fall apart--even if you choose to not trust a given site and only provide minimal or even fake information, that site could go elsewhere and buy the information you were unwilling to share.  This is referred to in the industry as data enhancement, data appends, or "bumping up."

The good news is that Hulu.com learned of our paper Thursday, and it appears as though they have disabled the KISSmetrics tracking.  Additionally, KISSmetrics has moved quickly to in response to the paper, and has posted a new privacy policy directed to consumers.  Previously, the policy only spoke to business users of the service.

We also focused upon HTML5 local storage.  HTML5 local storage is important because it is much more flexible than standard HTTP cookies, and because it allows for a large amount of offline storage (5Mb by default) it will play a big role in the mobile web.  Like Flash, HTML5 cookies can be used to store content that the user wants, but it can also be used to store unique identifiers and mirror HTTP cookies.  We found 17 sites using HTML5 cookies, 7 of which were using it to mirror HTTP cookies.  This is a signal that HTML5 may, like Flash cookies, become a method of backing up tracking identifiers.

Balachander Krishnamurthy and Craig Wills have pointed out that there is more intense user tracking on the web, by an incresingly concentrated group of actors.  Our results lend support to their findings.  We found over 5,600 HTTP cookies on the top 100 sites, up from 3,600 in 2009.  Twenty sites placed 100 or more cookies, and two more than 200.

Over 4,900 of the HTTP cookies we detected were from third-party hosts.  We counted over 600 such third-party hosts in our crawl.  Ghostery, a wonderful plugin provided by BetterAdvertising, currently blocks over 590 third-party trackers.  This is important because the universe of tracking companies is still being enumerated, and because it helps us understand how much coverage the self-regulatory groups really have over tracking companies.

Three trackers detected by Ghostery on irs.govWe found Google-controlled cookies on 97 of the top 100 websites, including popular government domains.  Here, Ghostery displays three different trackers on irs.gov.  (We do not use Ghostery for testing, but it does a neat job of visualizing the tracking web.)  Other third party trackers with a strong presence in the top 100 included scorecardresearch.com (61), and atdmt.com (56). Among top 100 sites, wikia.com, legacy.com, foxnews.com, drudgereport.com, and bizrate.com hosted the most cookies from third party domains.

This work was performed by my Research Experiences for Undergraduates program (REU) students, Mika Ayenson and Dietrich Wambach.  They were supervised by Dr. Nathan Good, Ashkan Soltani, and myself.  Our work is supported exclusively by NSF-TRUST (Team for Research in Ubiquitous Secure Technology).