Jonathan Zittrain Explains Why the Internet Is Rotting

By TAP Staff Blogger

Posted on July 12, 2021


Share

Society can’t understand itself if it can’t be honest with itself, and it can’t be honest with itself if it can only live in the present moment. It’s long overdue to affirm and enact the policies and technologies that will let us see where we’ve been, including and especially where we’ve erred, so we might have a coherent sense of where we are and where we want to go.
- Jonathan Zittrain from his article, “The Internet Is Rotting”

 

Jonathan Zittrain, Harvard law professor and computer-science professor, discusses the ways that online content can change or disappear without warning by problems commonly referred to as link rot and content drift. In “The Internet Is Rotting” (The Atlantic, June 2021), Professor Zittrain emphasizes “Too much has been lost already. The glue that holds humanity’s knowledge together is coming undone.”

 
  • Link rot refers to hyperlinks that are broken. At the time the hyperlink was embedded in an article or web content, the link was valid and provided readers the ability to delve deeper into a piece of information with just a click. But at some point in time, the website that hosted the sourced content removed the content, or set up a redirect, thus eliminating the means for a reader to access the referenced and hyperlinked information.
     
  • Content drift refers to the changes to a piece of web content in the form of retractions, additions, or replacement to content at a specific URL. These changes are often unannounced or not noted.
     

In Professor Zittrain’s article, “The Internet is Rotting,” he explains how the very nature of the World Wide Web enabled a design that omitted any form of centralized management or control; and he provides startling examples of both link rot and content drift, and shares some of the tools being developed to combat this knowledge decay.

 

Below are a few excerpts from “The Internet Is Rotting”:

 

The Internet’s Beginnings

 

Rather than a single centralized network modeled after the legacy telephone system, operated by a government or a few massive utilities, the internet was designed to allow any device anywhere to interoperate with any other device, allowing any provider able to bring whatever networking capacity it had to the growing party. And because the network’s creators did not mean to monetize, much less monopolize, any of it, the key was for desirable content to be provided naturally by the network’s users, some of whom would act as content producers or hosts, setting up watering holes for others to frequent.

 

Unlike the briefly ascendant proprietary networks such as CompuServe, AOL, and Prodigy, content and network would be separated. Indeed, the internet had and has no main menu, no CEO, no public stock offering, no formal organization at all. There are only engineers who meet every so often to refine its suggested communications protocols that hardware and software makers, and network builders, are then free to take up as they please.

 

This absence of central control, or even easy central monitoring, has long been celebrated as an instrument of grassroots democracy and freedom. It’s not trivial to censor a network as organic and decentralized as the internet. But more recently, these features have been understood to facilitate vectors for individual harassment and societal destabilization, with no easy gating points through which to remove or label malicious work not under the umbrellas of the major social-media platforms, or to quickly identify their sources. While both assessments have power to them, they each gloss over a key feature of the distributed web and internet: Their designs naturally create gaps of responsibility for maintaining valuable content that others rely on. Links work seamlessly until they don’t. And as tangible counterparts to online work fade, these gaps represent actual holes in humanity’s knowledge.

 

Link Rot and Content Drift

 

As Google puts it, “The web is like an ever-growing library with billions of books and no central filing system.”

 

Now, I just quoted from Google’s corporate website, and I used a hyperlink so you can see my source. Sourcing is the glue that holds humanity’s knowledge together. It’s what allows you to learn more about what’s only briefly mentioned in an article like this one, and for others to double-check the facts as I represent them to be. The link I used points to https://www.google.com/search/howsearchworks/crawling-indexing/. Suppose Google were to change what’s on that page, or reorganize its website anytime between when I’m writing this article and when you’re reading it, eliminating it entirely. Changing what’s there would be an example of content drift; eliminating it entirely is known as link rot.

 

Link Rot Example:
In 2010, Justice Samuel Alito wrote a concurring opinion in a case before the Supreme Court, and his opinion linked to a website as part of the explanation of his reasoning. Shortly after the opinion was released, anyone following the link wouldn’t see whatever it was Alito had in mind when writing the opinion. Instead, they would find this message: “Aren’t you glad you didn’t cite to this webpage … If you had, like Justice Alito did, the original content would have long since disappeared and someone else might have come along and purchased the domain in order to make a comment about the transience of linked information in the internet age.”

 

The first study [investigating link rot], with Kendra Albert and Larry Lessig, focused on documents meant to endure indefinitely: links within scholarly papers, as found in the Harvard Law Review, and judicial opinions of the Supreme Court. We found that 50 percent of the links embedded in Court opinions since 1996, when the first hyperlink was used, no longer worked. And 75 percent of the links in the Harvard Law Review no longer worked.

 

Content Drift Example:
This month, the best-selling author Elin Hilderbrand published a new novel. The novel, widely praised by critics, included a snippet of dialogue in which one character makes a wry joke to another about spending the summer in an attic on Nantucket, “like Anne Frank.” Some readers took to social media to criticize this moment between characters as anti-Semitic. The author sought to explain the character’s use of the analogy before offering an apology and saying that she had asked her publisher to remove the passage from digital versions of the book immediately.

 

Preserving Humanity’s Knowledge

 

The project of preserving and building on our intellectual track, including all its meanderings and false starts, is thus falling victim to the catastrophic success of the digital revolution that should have bolstered it. Tools that could have made humanity’s knowledge production available to all instead have, for completely understandable reasons, militated toward an ever-changing “now,” where there’s no easy way to cite many sources for posterity, and those that are citable are all too mutable.

 

The Wayback Machine:
What are we going to do about the crisis we’re in? No one is more keenly aware of the problem of the internet’s ephemerality than Brewster Kahle, a technologist who founded the Internet Archive in 1996 as a nonprofit effort to preserve humanity’s knowledge, especially and including the web. Brewster had developed a precursor to the web called WAIS, and then a web-traffic-measurement platform called Alexa, eventually bought by Amazon. That sale put Brewster in a position personally to help fund the Internet Archive’s initial operations, including the Wayback Machine, specifically designed to collect, save, and make available webpages even after they’ve gone away. It did this by picking multiple entry points to start “scraping” pages—saving their contents rather than merely displaying them in a browser for a moment—and then following as many successive links as possible on those pages, and those pages’ linked pages.

 

Amberlink:
A complementary approach to “save everything” through independent scraping is for whoever is creating a link to make sure that a copy is saved at the time the link is made. Researchers at the Berkman Klein Center for Internet & Society, which I co-founded, designed such a system with an open-source package called Amberlink. The internet and the web invite any form of additional building on them, since no one formally approves new additions. Amberlink can run on some web servers to make it so that what’s at the end of a link can be captured when a webpage on an Amberlink-empowered server first includes that link. Then, when someone clicks on a link on an Amber-tuned site, there’s an opportunity to see what the site had captured at that link, should the original destination no longer be available. (Search engines such as Google have this feature, too—you can often ask to see the search engine’s “cached” copy of a webpage linked from a search-results page, rather than just following the link to try to see the site yourself.)

 

Perma:
Taking inspiration from Brewster’s work, and indeed partnering with the Internet Archive, I worked with researchers at Harvard’s Library Innovation Lab to start Perma. Perma is an alliance of more than 150 libraries. Authors of enduring documents—including scholarly papers, newspaper articles, and judicial opinions—can ask Perma to convert the links included within them into permanent ones archived at http://perma.cc; participating libraries treat snapshots of what’s found at those links as accessions to their collections, and undertake to preserve them indefinitely.

 

Robustify:
In turn, the researchers Martin Klein, Shawn Jones, Herbert Van de Sompel, and Michael Nelson have honed a service called Robustify to allow archives of links from whatever source, including Perma, to be incorporated into new “dual-purpose” links so that they can point to a page that works in the moment, while also offering an archived alternative if the original page fails. That could allow for a rolling directory of snapshots of links from a variety of archives—a networked history that is both prudently distributed, internet-style, while shepherded by the long-standing institutions that have existed for this vital public-interest purpose: libraries.

 

Possible Legal Solutions

 

A technical infrastructure through which authors and publishers can preserve the links they draw on is a necessary start. But the problem of digital malleability extends beyond the technical. The law should hesitate before allowing the scope of remedies for claimed infringements of rights—whether economic ones such as copyright or more personal, dignitary ones such as defamation—to expand naturally as the ease of changing what’s already been published increases.

 

Compensation for harm, or the addition of corrective material, should be favored over quiet retroactive alteration. And publishers should establish clear and principled policies against undertaking such changes under public pressure that falls short of a legal finding of infringement. (And, in plenty of cases, publishers should stand up against legal pressure, too.)

 

For those times when censorship is deemed the right course, meticulous records should be kept of what has been changed. …

 

In those cases, there should be a means of record-keeping that, while unavailable to the public in just a few clicks, should be available to researchers wanting to understand the dynamics of online censorship. John Bowers, Elaine Sedenberg, and I have described how that might work, suggesting that libraries can again serve as semi-closed archives of both public and private censorial actions online. We can build what the Germans used to call a giftschrank, a “poison cabinet” containing dangerous works that nonetheless should be preserved and accessible in certain circumstances. (Art imitates life: There is a “restricted section” in Harry Potter’s universe, and an aptly named “poison room” in the television adaptation of The Magicians.)

 

It is really tempting to cover for mistakes by pretending they never happened. Our technology now makes that alarmingly simple, and we should build in a little less efficiency, a little more inertia that previously provided for itself in ample qualities because of the nature of printed texts. Even the Supreme Court hasn’t been above a few retroactive tweaks to inaccuracies in its edicts. As the law professor Jeffrey Fisher said after our colleague Richard Lazarus discovered changes, “In Supreme Court opinions, every word matters … When they’re changing the wording of opinions, they’re basically rewriting the law.”

 

Read the full article, “The Internet Is Rotting” by Jonathan Zittrain (The Atlantic, June 30, 2021).

 

Jonathan Zittrain is the George Bemis Professor of International Law at Harvard Law School. He is also a professor at the Harvard Kennedy School of Government, a professor of computer science at the Harvard School of Engineering and Applied Sciences, director of the Harvard Law School Library, and co-founder and director of Harvard’s Berkman Klein Center for Internet & Society.

 

Professor Zittrain’s research interests include the ethics and governance of artificial intelligence; battles for control of digital property; the regulation of cryptography; new privacy frameworks for loyalty to users of online services; the roles of intermediaries within Internet architecture; and the useful and unobtrusive deployment of technology in education.


Share