An Analysis of Google Log Retention Policies

Privacy and Security, Networks, the Internet, and Cloud Computing, Internet and Search and Advertising

Article Snapshot

Author(s)

Helen Nissenbaum and Vincent Toubiana

Source

Journal of Privacy and Confidentiality, Vol. 3, No. 1, Article 2, 2011

Summary

Web search data stored by Google is not sufficiently anonymized and could be used by third parties.

Policy Relevance

Policymakers should consider mandating strong anonymization practices that completely de-identify all stored search information, as well as pushing for external audits of web browser data-logging practices.

Main Points

  • Like other search engines, Google has come under increasing pressure to explain how and why it keeps records of the web searches made by its users; in Google’s case, the company claims that it “anonymizes” and “obfuscates” the personal search logs of individuals after a particular period of time (between nine and eighteen months).
     
  • However, an assessment of Google’s public statements, online videos and technical papers indicates that the steps Google takes to de-identify search logs do not provide strong privacy protection, and leave open the possibility open that in the future this search log data could be pooled and re-identified.
     
  • In order to prevent the identification of particular web search logs with individuals, Google deletes certain portions of the data it has on hand, including the last digits of the browsing computer’s IP address, and a portion of the number identifying the tracking “cookie” within a user’s browser that sends information to Google. However, a number of other pieces of information are not deleted: these include the date and time a search was made, the browser version, and much of the identifying information within the tracking “cookie.”
     
  • Through a variety of tests, the authors determine that these pieces of data are sufficiently unique that, if combined together, they could be used to create a profile of the individual search users. Because these “sanitized” search logs are kept indefinitely by Google and can be shared under the terms of Google’s privacy policies with third parties, there is a risk that personally identifying information might be used to track or profile users without their knowledge.
     
  • Google and other search engines should adopt more comprehensive anonimyzation techniques for their search log data, and be more transparent about their data retention and use procedures.

Get The Article

Find the full article online

Search for Full Article

Share