Inside Story

What is a library?

Targeted by hackers and sued by publishers, the Internet Archive continues to push boundaries

Kieran Hegarty 6 November 2024 1181 words

On a mission: an Internet Archive staff member at the organisation’s twentieth anniversary celebration. Carlos Avila Gonzalez/San Francisco Chronicle via Getty Images


Richard Brautigan’s 1971 novel The Abortion tells the story of a librarian in charge of a library with a voracious and unusual collection development policy: it accepts, at any time of the day or night, self-authored books in any form. Such volumes include It’s the Queen of Darkness, Pal, a book of science fiction by a worker for the city’s sewers, and Love Always Beautiful, a book rejected by publishers 459 times. “This book has set the world’s record for rejections,” its author tells the librarian. Because he has to be on hand at all hours to receive these “unwanted… and haunted volumes of American writing,” the librarian can never leave.

While parts of the novel haven’t dated very well, The Abortion helpfully questions and reimagines the idea of what a library is, and what it could or should be.

Before I wrote a PhD on how library collections are being reimagined in an era of digital and social media, I worked in libraries for around a decade. I arrived at my PhD topic partly because I was fascinated by the anxiety I sensed among many library administrators and some of my colleagues about the library’s role in an age of online information.

When I started my first library job in the late 2000s, Google had recently become the most visited website in the world. Managers, abuzz with the liberatory promise of “Web 2.0” and “tech” in general, declared “we need to be more like Google!” I shuddered. “No,” I thought, “libraries need to be less like Google.” The future of the library, as a conceptual space and physical place, was wide open, it seemed.

What would it mean to operate a library of, and for, the internet? Many libraries have struggled with the transition to a digital information environment. They were established and gained legal protection during the nineteenth and twentieth centuries in a media landscape centred on a particular media artefact: the book. The word library comes from the Latin librarium meaning “book-case, chest for books.” When some tech boosters framed the popular uptake of digital technologies and the internet as marking the end of the book, they predicted the library’s days were numbered too.

Over the past thirty years, libraries have significantly amended their programming, collections and spaces to show their relevance in a “digital age.” In doing so, they have had to apply and adapt existing collecting frameworks to a context that operates on different principles and brings in a more diverse array of actors with different interests and intentions.

The British Library, for example, provides access only in its reading rooms to web pages it has preserved. What was accessible yesterday to anyone with an internet connection becomes today, in its archived form, accessible only within the walls of the library. Until our archaic legal deposit laws were changed in 2016, the National Library of Australia had to contact creators of websites to ask permission to collect their page, an enormously labour-intensive process. If the creator wasn’t forthcoming, the library couldn’t proceed. Who knows what we lost in the meantime?

If the internet does have a central library, it may well be the Internet Archive. Launched in 1996, the Archive comes from the same social and cultural milieu that produced Google and other tech behemoths, the San Francisco Bay Area (which was also where Brautigan established himself as a writer in the 1960s). Yet it operates on radically different principles. Instead of being “more like Google,” the organisation considers — and is creating — what a library of, and for, the internet would be.

The Archive’s innovations are considerable. It is constantly “crawling” the web, collecting content as it goes, and making it publicly available at a scale few libraries would fathom. Its Wayback Machine, which contains over 800 billion web pages, is the definitive record of the web’s past, with content dating back to the early web.

Like Brautigan’s library, the Archive’s collection development policy is voracious, collecting and providing access to all manner of born-digital and digitised content in line with its utopic mission to “provide Universal Access to All Knowledge.” In pursuit of its founder Brewster Kahle’s goal of building the Library of Alexandria anew, it has gathered and provides controlled access to millions of digitised books, audio recordings, videos and software programs.

The Internet Archive was “born digital.” It was also born Californian, in all its confidence and (sometimes) naivety about the liberatory power of technology. Through its operations, it has pushed the boundaries of what a library could or should be. As it faces lawsuits and cyber-attacks, the question is: has it pushed these boundaries to breaking point?

The past five years in particular have not been easy for the Archive. Reports have emerged since 2018 that it is being used by Islamic State, far-right groups and Covid-19 conspiracists to host extremist propaganda long after it has been taken down from other platforms. Unfettered openness to information has consequences.

Its critics are now disputing the idea that it is a library at all. Earlier this year a consortium of major publishers sued the Archive over its National Emergency Library, which saw the Archive suspend its online lending restrictions on 1.4 million digitised books during the COVID-19 pandemic. The publishers claim the Archive “badly misleads the public and boldly misappropriates the goodwill that libraries enjoy and have legitimately earned.” The Archive, in their mind, is a “pirate site” branding “itself as a library.” In September, the US Court of Appeals upheld a ruling against the Archive, forcing it to reconsider the scope of its digital lending. Then, last month, the Archive was hit with a massive cyber-attack that took its services offline for days. Stabilising the site was a mammoth task.

Yet, in pushing the idea of the library, the Archive has become a key infrastructure that underpins so much of the contemporary information environment. Libraries and other collecting organisations around the world rely on its Archive-It service to curate their own web archives. Wikipedia has an “InternetArchiveBot” that finds broken citations in Wikipedia articles and replaces them with links to the Wayback Machine. The Archive builds software and expertise that underpins and sustains web archives globally. It runs an interlibrary loan system, lending out-of-print and rare works to library users around the world. It is cultural infrastructure for a digital age.

The thing with infrastructure is that you often don’t notice it until things go wrong. As sociologist of science Susan Star writes, infrastructure “becomes visible upon breakdown.” Things sitting in the background — the mundane, the ordinary — become taken for granted, and our reliance on them becomes starkly apparent only when they are no longer there. We had a glimpse of this when the Archive was taken offline last month, and the lawsuits from major publishers continue to threaten its very future.

The Internet Archive takes risks that few libraries would dare contemplate. Sometimes those risks have negative consequences. But it takes those risks on behalf of all libraries and their users. It needs our support. •