The Backrooms of the Internet Archive

Like many bits of Internet Culture, this simple image of an empty series of rooms represents a deep-repressed or recently-remembered memory of a common Internet Legend, or it’s just a shot of nothing.

If the answer is that it’s a shot of nothing, let’s get you up to speed.

This image floated around message boards in the 2010s, posted with commentary or as a general use for a slightly off-putting photograph of a less-than-well-maintained location, and was, by most standards, rather indistinct. The internet, after all, is filled with odd images and weird drawings that cause a reaction, often after many different attempts to achieve the effect. Survivorship Bias for memes, one might say. So if one more image of an indistinct indoor landscape was out there, not much was going to happen of it.

That changed in 2019, when the image was given a legend and history, made up out of the air, that it was a rare photograph of The Backrooms. The phrasing of the original declaration speaks for itself:

“If you’re not careful and you noclip out of reality in the wrong areas, you’ll end up in the Backrooms, where it’s nothing but the stink of old moist carpet, the madness of mono-yellow, the endless background noise of fluorescent lights at maximum hum-buzz, and approximately six hundred million square miles of randomly segmented empty rooms to be trapped in
God save you if you hear something wandering around nearby, because it sure as hell has heard you”

If this writing strikes you as some sort of odd, rather dramatic addition to the image of a room, then you’re being introduced to creepypasta, or as some might call them, urban legends and campfire stories. It’s part of the overwhelming need for humans to tell tales that excite and frighten, to compose meaning or horror out of the darkness, and even the mundane.

The concept of the “Backrooms” also touches on a very frequent theme of many different horror and science-fiction movies – that there are service tunnels and hard to access areas woven throughout life, known only to a special few. Movies such as The Matrix, The Adjustment Bureau, Us, Beyond the Walls, Dark City, The Cube, and many more have explored this theme – or used it as a jumping off point to tell another story.

The difference, here, is nobody really knew where that very first image came from. For a very long time.

This extended period of not finding the original source of the image left an unfinished tune, a half-written poem, about where it came from and what it meant. And the lack of information in the image as it showed up on these image boards seemed to ensure the mystery would never be found.

So people filled in the blanks.

A Subreddit called /r/backrooms, an extended web video series called Backrooms, and endless CGI models and creations meant to extend the legend and the origin story became years of effort by thousands to draw the missing pieces of a puzzle that was never a puzzle.

A constantly shifting set of games with titles based off The Backrooms were created and presented for a willing and happy audience; it’d be unfair to choose one or even a few to highlight – there are dozens.

All of them represent the efforts to bring you into a state of heightened fear or paranoia as you lurked through a series of dark hallways, overlit carpeted spaces, and a growing dread. There’s no question there was a huge audience for this, and it is sometimes thought that this entire legend brought mainstream attention to liminal spaces, a perception of the in-between geographies of less unsettling locations. It is now enjoying life as an aesthetic movement.

Supporting this explosion of creativity and storytelling was the continued fact that nobody knew where the photograph came from. This situation, of a core image having a completely shadowy and unexplained origin, is arguably the foundation of its power.

That changed, recently.

This appears to be the origin of the Backrooms Photograph.

In March 2003, there was a former furniture store called Rohner’s Home Furnishings in Oshkosh, Wisconsin whose second floor was being renovated by the (somewhat) new tenants, HobbyTown.

Renovating the space from the sale of furniture to a new remote-controlled racing car track (among other aspects) meant pulling down partitions and ripping out carpet. This inspired taking photographs of the process, one of which, DSC001561.JPG, was the legendary “Back Rooms” image.

18 times in the last 20 years, crawlers affiliated with the Internet Archive moved through this page and grabbed portions of it, speculatively, to store for future research and reference. As the whole image was grabbed, reading the metadata of the original image reveals the date it was taken (June 12, 2002), and the camera used (a Sony Cyber-Shot model). The great unknown image, the unsettling photo of a mysterious place and time, was revealed.

However the original, anonymous user stumbled onto this photograph, it appears it was taken from either the Wayback directly, or the Wayback Machine crawled the same site the user had found, and kept that webpage’s preservation for over 20 years.

Emerging, Blinking, Into the Light

Naturally, as news of the Backrooms being “found” travels throughout the world, responses have wildly ranged.

For some, this is a proof that “with enough eyeballs, all problems are shallow”. While we might argue about the relative worth of a given effort, the fact that it is possible for word to travel about a mystery to the point of being solved means that the world is a hair less intimidating and scary. Our shared efforts and cooperation can find the answer to a seemingly impossible-to-answer question. The fact that an image with basically no information and a blurry set of components could be tracked down and revealed is a miracle.

For others, the mystery being solved removes a little bit of magic and wonder from the world. It says that there’s no kayfabe, no holding of mystery in our hands without peeking further to tear out the secret. In this perspective, something special has been lost.

But there’s another lesson as well.

The Internet Archive’s crawlers moved through the pages of a hobby store multiple times over the years, capturing HTML, photographs, and time-stamping the process, with the equivalent care of an at-risk website, a politician on the national stage, or a legendary and obvious moment in history provided via a PDF file.

This agnostic, wide-ranging crawl likely represented both the original source of the image, and a persistent, dependable URL to reference back to it, as thousands are doing at this very moment.

This is the mission of the Wayback Machine – be the dependable, accessible connection to web history, and therefore all history. Give the Internet its Memory, which would otherwise be lost.

If you mourn the loss of legend and mystery in our quest to keep the truth transparent, available and persistent, don’t worry – the process of internalizing and analyzing the image to give the Backrooms history its full and complete story has already begun:

Here’s to the next mystery, and the next unsettling information being brought into the light and presented for the education, research and entertainment of the Internet, courtesy of the Wayback Machine.

Internet Archive and the Wayback Machine under DDoS cyber-attack

The Internet Archive, the nonprofit research library that’s home to millions of historical documents, preserved websites, and media content, is currently in its third day of warding off an intermittent DDoS (distributed denial-of-service) cyber-attack. According to library staff, the collections are safe, though service remains inconsistent. Access to the Internet Archive Wayback Machine – which preserves the history of more than 866 billion web pages – has also been impacted.

Since the attacks began on Sunday, the DDoS intrusion has been launching tens of thousands of fake information requests per second. The source of the attack is unknown.

 “Thankfully the collections are safe, but we are sorry that the denial-of-service attack has knocked us offline intermittently during these last three days,” explained Brewster Kahle, founder and digital librarian of the Internet Archive. “With the support from others and the hard work of staff we are hardening our defenses to provide more reliable access to our library. What is new is this attack has been sustained, impactful, targeted, adaptive, and importantly, mean.”

Cyber-attacks are increasingly frequent against libraries and other knowledge institutions, with the British Library, the Solano County Public Library (California), the Berlin Natural History Museum, and Ontario’s London Public Library all being recent victims.

In addition to a wave of recent cyber-attacks, the Internet Archive is also being sued by the US book publishing and US recording industries associations, which are claiming copyright infringement and demanding combined damages of hundreds of millions of  dollars and diminished services from all libraries. 

“If our patrons around the globe think this latest situation is upsetting, then they should be very worried about what the publishing and recording industries have in mind,” added Kahle. “I think they are trying to destroy this library entirely and hobble all libraries everywhere. But just as we’re resisting the DDoS attack, we appreciate all the support in pushing back on this unjust litigation against our library and others.”

Book Talk: Attack from Within by Barbara McQuade

Join us for a VIRTUAL book talk with legal scholar BARBARA McQUADE on her New York Times bestseller, ATTACK FROM WITHIN, about disinformation’s impact on democracy. NYU professor and author CHARLTON McILWAIN will facilitate our discussion.


“A comprehensive guide to the dynamics of disinformation and a necessary call to the ethical commitment to truth that all democracies require.”

Timothy Snyder, author of the New York Times bestseller On Tyranny

American society is more polarized than ever before. We are strategically being pushed apart by disinformation—the deliberate spreading of lies disguised as truth—and it comes at us from all sides: opportunists on the far right, Russian misinformed social media influencers, among others. It’s endangering our democracy and causing havoc in our electoral system, schools, hospitals, workplaces, and in our Capitol. Advances in technology including rapid developments in artificial intelligence threaten to make the problems even worse by amplifying false claims and manufacturing credibility.

In Attack from Within, legal scholar and analyst Barbara McQuade, shows us how to identify the ways disinformation is seeping into all facets of our society and how we can fight against it. The book includes:

  • The authoritarian playbook: a brief history of disinformation from Mussolini and Hitler to Bolsonaro and Trump, chronicles the ways in which authoritarians have used disinformation to seize and retain power.
  • Disinformation tactics—like demonizing the other, seducing with nostalgia, silencing critics, muzzling the media, condemning the courts; stoking violence—and why they work.
  • An explanation of why America is particularly vulnerable to disinformation and how it exploits our First Amendment Freedoms, sparks threats and violence, and destabilizes social structures.
  • Real, accessible solutions for countering disinformation and maintaining the rule of law such as making domestic terrorism a federal crime, increasing media literacy in schools, criminalizing doxxing, and much more.

Disinformation is designed to evoke a strong emotional response to push us toward more extreme views, unable to find common ground with others. The false claims that led to the breathtaking attack on our Capitol in 2021 may have been only a dress rehearsal. Attack from Within shows us how to prevent it from happening again, thus preserving our country’s hard-won democracy.


BARBARA McQUADE is a professor at the University of Michigan Law School, where she teaches criminal law and national security law. She is also a legal analyst for NBC News and MSNBC. From 2010 to 2017, McQuade served as the U.S Attorney for the Eastern District of Michigan. She was appointed by President Barack Obama, and was the first woman to serve in her position. McQuade also served as vice chair of the Attorney General’s Advisory Committee and co-chaired its Terrorism and National Security Subcommittee.

Before her appointment as U.S. Attorney, McQuade served as an Assistant U.S. Attorney in Detroit for 12 years, including service as Deputy Chief of the National Security Unit. In that role, she prosecuted cases involving terrorism financing, foreign agents, threats, and export violations. McQuade serves on a number of non-profit boards, and served on the Biden-Harris Transition Team in 2020-2021. She has been recognized by The Detroit News with the Michiganian of the Year Award, the Detroit Free Press with the Neal Shine Award for Exemplary Regional Leadership, Crain’s Detroit Business as a Newsmaker of the Year and one of Detroit’s Most Influential Women, and the Detroit Branch NAACP and Arab American Civil Rights League with their Tribute to Justice Award. McQuade is a graduate of the University of Michigan and its law school. She and her husband live in Ann Arbor, Michigan, and have four children.s an assistant professor of English at Emory University with a courtesy appointment in quantitative theory and methods. He is the author of American Literature and the Long Downturn: Neoliberal Apocalypse (2020). His writing has appeared in the New York Times, the Washington Post, the Los Angeles Review of BooksThe RumpusDissent, and other publications.

Author of the recent book, Black Software: The Internet & Racial Justice, From the Afronet to Black Lives Matter, Dr. Charlton McIlwain is Vice Provost for Faculty Development, Pathways & Public Interest Technology at New York University, where he is also Professor of Media, Culture, and Communication at NYU Steinhardt. He works at the intersections of computing technology, race, inequality, and racial justice activism. He has served as an expert witness in landmark U.S. Federal Court cases on reverse redlining/racial targeting in mortgage lending and recently testified before the U.S. House Committee on Financial Services about the impacts of automation and artificial intelligence on the financial services sector. He is the author of the recent PolicyLink report Algorithmic Discrimination: A Framework and Approach to Auditing & Measuring the Impact of Race-Targeted Digital Advertising. He writes regularly for outlets such as The Guardian, Slate’s Future Tense, MIT Technology Review and other outlets about the intersection of race and technology. McIlwain is the founder of the Center for Critical Race & Digital Studies, and is Board President at Data & Society Research Institute. He leads NYU’s Alliance for Public Interest Technology, is NYU’s Designee to the Public Interest Technology University Network, and serves on the executive committee as co-chair of the ethics panel for the International Panel on the Information Environment.

Book Talk: Attack from Within by Barbara McQuade
Thursday, June 6 @ 10am PT / 1pm ET
Register now for the virtual event!

DWeb Fellows: Where Are They Now? (Part 1)

Guest blog by ngọc triệu from the DWeb Camp Core Organizing team. 

Since the program kicked off in 2019, the DWeb Fellowship has welcomed 62 fellows from more than 20 countries across five continents, spanning North America, South America, Asia, Europe, Africa, and Oceania. 

Recently, I had the opportunity to reconnect with some of the DWeb Fellows from previous cohorts. We caught up on how we’ve been since our last encounter, delved into our current projects, and reminisced about our shared experiences at DWeb Camp.

In this post, let’s join Stacco (Fellow 2023), Remy (Fellow 2022), and me in our conversations below!

*Please note that the conversations have been edited for length and clarity.


Q1 ngọc: Thank you for taking the time to chat with me today! Can you start by introducing yourself and sharing what you’re working on right now?

Stacco: Hi, yes! I’m Stacco from, which is the project I represented at DWeb Camp 2023. DisCO is a cooperative intersection of feminist and anticapitalist responses to a lot of things like DAOs, what we want to do in the workplace, and how we want to spend our time productively. And it’s also a critical approach to designing technology. DisCO was founded in 2018, but it came out of the experience of a cooperative that we founded on distributed principles, called Guerrilla Media Collective. And a lot of the stuff that we prototype in Guerrilla Media Collective with economics and governance have led into DisCO.

Remy: Hey there, I’m Remy, from the 2022 cohort. I’m currently working at the Open Technology Fund as a programme specialist. I’ve worked kind of on an array of projects, but we get a series of applications that focus on circumventing Internet censorship in authoritarian contexts, whether that be research projects, community, convenings and so on. 

Q2 ngọc: What’s one thing from DWeb Camp that you’ve taken with you into your current work?

Remy:  When reflecting on my experience at DWeb Camp, I find two significant takeaways: 

The first one being that, at that time, I was coming from a very academic space. So, most of the work that I was doing was really focused and consolidated within academia, which was a fairly small realm of people. It was mainly archivists that I was working with, so we had kind of a narrow lane and scope. 

However, upon engaging with the broader web community, I was exposed to a diverse array of individuals working on directly related projects, each with intersecting identities and roles. I remember meeting a speculative fiction author, and I was like: “Wow, this is kind of an interesting addition to this cohort of people that I don’t think I would have naturally included in a conference!”. I also got to unlearn what a conference is and looks like, you know, I’m going to show up with my little briefcase and give my presentation, because that’s what academia looks like. And then coming in, it was a much more kind of relaxed vibe and open conversation with an array of different people. So I thought that was really interesting and opened my eyes like, wow, we do need to include more people in these conferences that we’re at, because designers play just as big a role as researchers and developers. 

And then I would say, the second thing that I learned was really what it takes from the ground up to develop a mesh network. I always kind of come back and think about that — all the love and time that it takes, and the patience to care for these systems. It really got me on a whole journey about thinking of systems of care, and what those look like in technical spaces.

Stacco: Following DWeb Camp 2023, I invited brandon (Fellow 2022, 2023) and mai (DWeb Fellowship Director 2019-2023) to Spain for a meeting called “DisCO Remastered”, which mai covered in an article. From this experience, we developed two prototypes, including one called “community supported digital commons,” inspired by the principles of community-supported agriculture. We have people who are more conscious about the food they eat and where it comes from. So how about we have that type of consciousness for the digital tools that mediate our daily lives? Having community funding and accessibility for digital commons is very important to ensure fair compensation for labor and improve accessibility to technology. 

Additionally, collaborating with brandon, we aim to explore cooperative alternatives to platforms like Spotify, but going much further. What if the musicians could develop their own technology with torrents? What if they could take full control of their work and earnings? 

Also for me, I really love the diverse age ranges, genders, sexual orientation, and provenances of DWeb Camp, especially among the Fellows. The Fellows was a super varied group and it was really fascinating to engage with people whose experiences differed from mine. I’m like, “Oh, your background is totally different from mine, let me find out about it!” There was like this commonly held space, and that really inspired me. When I was writing the introduction for our newly released website, I was actually thinking of the Fellows! 

Q3 ngọc: We’re gonna get a little bit retrospective here, what motivated you to apply (or reapply) for the Fellowship? Did the program meet your expectations and were you able to accomplish what you set out to do as a Fellow? 

Stacco: Yeah, absolutely. So the first time I couldn’t go. The second time, I applied again because I wanted to get a taste of what the decentralized community is like. More than the projects, I wanted to see what the humans behind them are like. There were a lot of contradictions which I also saw at Camp that were very interesting: There were projects which I had no interest in whatsoever, and there were other projects that I found really interesting. There’s also humans that I wanted to meet. I had been collaborating with brandon from Resonate Coop for four years and it was a great chance to meet him in person. It really was maybe like the best week I had last year. I was really, really happy. And I was really happy because of the human connections. 

With brandon king, I did a presentation that was quite successful. It was very great because we spoke about technology in a critical way and we mixed it with music, the audio, and the video. Then we left all the devices behind and we walked into the forest. That was really special. Some of the human connections that were fostered have carried on. That’s the quality time that you can only get, especially post-pandemic, by sharing a physical space. 

We were also really privileged. If you think about it, at least for the Fellows, for a week, we didn’t have to think about money or anything. We ate, we slept, we walked, we rested, we played guitar, and we danced. And that took money to do. Only that didn’t come from Mars, but money, which is like a pittance compared to some of the budgets that are being handled. So it makes you think, well, with about the distributions of value, what would life be like if it was more like DWeb all over? 

Remy: I remember, I found the Fellowship through a mutual colleague who worked at the Internet Archive. And at the time, I was really interested in the Internet Archive because I was working at a small human rights organization. We were using the Internet Archive all the time and I thought it was a really cool project. I was interested in finding out who these people that run it are and what does it look like?

And then the Fellowship popped up. At that time, I had been inhabiting a tiny little bubble that no one else really understood: I was a master’s student caught between an archivist school and public policy and people were kind of looking at me cross-eyed for talking about distributed archives or decentralised archives. So when I found out the Fellowship, I was like, wow, here’s a group that I really like and admire, and they are talking about the same thing I’ve been talking about. That’s kind of what motivated me — maybe I can learn from a lot of these people who are probably much more developed in the work than I am, and I can share this small use case that I’ve been doing and working on. 

When I read the blurb about DWeb Camp, I was like, it’s a group of people going to the woods and talking about tech. I thought it didn’t even seem real. I was wondering, like, is this real? I didn’t have much of an expectation rather than a feeling that I am going to meet really interesting people that are really smart and working on interesting projects. And then I was pleasantly surprised by how many projects I had been aware of, there were projects I’d written about in my papers as things to look at, and then I was able to meet them at the DWeb naturally. 

I mean, you’re just chatting, and then you were like: “Your project sounds really familiar. What’s it called?” And then you were like, “Whoa, that’s crazy. I was writing about your project!” I was just shocked that I was naturally coming across those people in the space, it felt like a very surreal moment. I got to meet Mark, who’s the director of the Wayback Machine. And subsequently, I’ve seen him so many times at other conferences that we’ve been to. And it’s always like, I just get so excited and happy and like, want to give him a big hug. It takes me back to that special time that we all spent together. 

ngọc: What’s one piece of advice or recommendation you’d like to share with the future cohort? 

Remy: Well, that’s a good question! I’d say, be confident in your ability and skills that you’re bringing and know that it’s a space of people that want to collaborate and work with you. It can be incredibly intimidating, walking into a space where you don’t know anyone and sometimes it feels like maybe there’s pre-existing communities of people that already know each other, but have the confidence to just walk up and start talking to them and know that it’s a very open community and everyone is really welcoming.

It just sometimes takes the courage within you to make that first step forward and just walk into a circle of people and say: “Hi, this is who I am.” I know it’s always easier said than done, but I have thought that that was when the most natural conversations happen. And you know, be kind to yourself. A lot of these conferences can feel like a marathon sometimes and it can feel like you’re missing out on this or that, but the experience is always there so if you’re feeling a little bit overwhelmed or burnt out, just step out. Some of my favourite moments from Camp were sitting with the Fellows and making buttons and just giving ourselves a second to breathe outside of everything else that was going on. 

So be confident, be courageous, and be kind to yourself when you’re there experiencing it. Another thing that I found really helpful was journalling. I journaled two or three times a day to help remember how I was feeling and what I was doing. And that was a really interesting experience to look back and read on. That would be my tips for people going to DWeb. 

Stacco: I’d advise people to not go crazy and try to join every talk or session at Camp. Just be where you are and you’ll find interesting people to talk to and interesting projects to collaborate on. While you’re there, make yourself known. In addition, don’t be shy and don’t be afraid to challenge people in a friendly way. The most special thing are the Build Days, when you’re setting up camp and we’re getting to know people. Don’t miss it, that’s my recommendation! 


Thanks to Remy and Stacco for joining the conversation and sharing their experience as a DWeb Fellow. 

We’re currently at the final stage of reviewing all 2024 Fellowship applications. Stay tuned to meet our new cohort in June! 

End of Term Web Archive – Preserving the Transition of a Nation

It’s that time again. The 2024 End of Term crawl has officially begun! The End of Term Web Archive #EOTArchive hosts an initiative named the End of Term crawl to archive U.S. government websites in the .gov and .mil web domains — as well as those harder-to-find government websites hosted on .org, .edu, and other top level domains (TLDs) — as one administrative term ends and a new term begins. 

End of Term crawls have been completed for term transitions in 2004, 2008, 2012, 2016, and 2020. The results of these efforts is preserved in the End of Term Web Archive. In total, over 500 terabytes of government websites and data have been archived through the End of Term Web Archive efforts. These archives can be searched full-text via the Internet Archive’s collections search and also downloaded as bulk data for machine-assisted analysis.

The purpose of the End of Term Web Archive is to preserve a record of government websites for historical and research purposes. It is important to capture these websites because they can provide a snapshot of government messaging before and after the transition of terms. The End of Term Web Archive preserves information that may no longer be available on the live web for open access.

The End of Term Archive is a collaborative effort by the Internet Archive along with the University of North Texas (UNT), Stanford University, Library of Congress (LC), U.S. Government Publishing Office (GPO), and National Archives and Records Administration (NARA). Past partners include the University of CA’s California Digital Library (CDL), George Washington University, and the Environmental Data and Governance Initiative (EDGI).

Four images of captured between 2008 and 2020 captures from: 2008 Sept. 15; 2013 Mar. 21; 2017 Feb. 3; and 2021 Feb. 25

We are committed to preserving a record of U.S. government websites. But we need your help to complete the 2024 End of Term crawl. 

How can you help?! 

We have a list of top level domains from the General Services Administration (GSA) and from previous End of term crawls. But we need volunteers to help us out. We are currently accepting nominations for websites to be included in the 2024 End of Term Web Archive.

Submit a url nomination by going to
We encourage you to nominate any and all U.S. federal government websites that you want to make sure get captured. Nominating urls deep within .gov/.mil websites helps to make our web crawls as thorough and complete as possible. 

Individuals and institutions nominating seed urls are recognized on the individual contributors leaderboard and the institutions leaderboard!

Explore the End of Term Web Archive with full text search and download the data!

Eyeing the Future: Harkness Eye Institute’s Ophthalmology Journals Preserved at Internet Archive

When the decision was made to move the Harkness Eye Institute in New York City from its home of nearly 90 years, no one knew what to do with its vast collection of academic journals. Dr. Daniel Casper, Columbia University professor emeritus of ophthalmology, found himself tasked with the job.

Dr. Daniel Casper, Columbia University professor emeritus of ophthalmology

The Columbia University Irving Medical Center’s Department of Ophthalmology had operated the Institute on Manhattan’s 165th Street in Washington Heights since 1933. Its stately brick building was possible thanks to a $5 million gift from philanthropist Edward Harkness. In 1922, NY-Presbyterian Hospital announced that the current location would be demolished to create a new cancer center, and the Eye Institute would be relocated to other locations on the Medical Center campus.

The move meant emptying the 9-floor Institute, including the John M. Wheeler Library. The collection consisted of a rare book collection; more than 160 ophthalmology journals (7,000 volumes) published in English, French, Japanese, German, and Spanish, dating back to the 1800s; ophthalmic textbooks; and a collection of ophthalmic and medical memorabilia. For many years, the library maintained a small museum with antique ophthalmic instruments and other memorabilia on the first floor of the Eye Institute. In the 1950s the space was converted to clinical use so most of the museum artifacts were placed in storage. With its recent move, the department could accommodate the rare books and memorabilia, but not the large collection of journals and some textbooks—leaving the fate of the remaining items in the air.  

E. S. Harkness Eye Institute, circa 1933.

It was the end of an era for Casper, who has worked at the Institute since 1986 and was a frequent user of the library’s resources. He said he felt somewhat responsible for saving as much of the library contents as possible. “The Wheeler Collection really was on the brink of a landfill,” said Casper. 

He spent his first year of retirement looking for a suitable home for the library contents. Recognizing the unique historic value of many of the journals, he approached the National Library of Medicine, the National Eye Institute, and the American Academy of Ophthalmology Museum, among others, all of whom replied in a similar manner—they had neither the space nor the resources to maintain the collection. 

Casper had no luck finding a place to rehouse the sizable donation, until he reached out to the Internet Archive. Soon after making contact, an Archive staffer in New York came to take measurements to ship the remaining Wheeler Collection to the Archive. A few days later, a truck arrived and 23 pallets of journals and books were loaded. The items will be safely stored in a physical archive and scanned so the public can have digital access online.   

“The preservation and electronic dissemination of this collection is truly a dream come true,” Casper said, who appreciates that the donation process was seamless, with no charge to the university, and the journals will live on for future generations in a more accessible format.

“I did not realize the Internet Archive would take a collection like this. People spent huge amounts of effort putting these works together. It would have been unfortunate to just throw it all away.”
Dr. Daniel Casper,
Columbia University professor emeritus

Tracking older print articles that have never been digitized can be time consuming for researchers, and many previous studies are overlooked because they can be difficult to identify and locate, Casper said. With digital access to journals, researchers can avoid reinventing the wheel in their research and build on past scholarly evidence more easily, he said.

“I did not realize the Internet Archive would take a collection like this,” Casper said. “People spent huge amounts of effort putting these works together. It would have been unfortunate to just throw it all away. That would imply the collection is worthless, but it has value.”

Casper hopes the digitization of the Wheeler Collection leads to an acceleration of advances in science as researchers will eventually have free, online access to this invaluable collection of knowledge.

“I’ve become an Internet Archive booster. It saved us,” he said. “The Internet Archive is an incredible resource.”

New Audiobook Anthology Highlights Public Domain Folktales from 1928

After Laura Gibbs retired from teaching mythology and folklore at the University of Oklahoma, she wanted to continue sharing her love of storytelling with digital learners everywhere. Following her own passion for making folk stories as accessible to all as possible, she began volunteering with a nonprofit that produces free audio books for the public.

Gibbs, who now lives in Austin, devotes one to two hours each day to recording and reviewing audio for LibriVox, a volunteer community of readers who record free public domain audiobooks. Her most recent project involved finding folktales, fairy tales and mythology in the Internet Archive that were recently released into the public domain to compile an anthology, “Tales from 1928,” available to read at Internet Archive or listen via LibriVox.

Tales of 1928: Listen | Read

Gibbs selected short stories from 20 books that were published in 1928, as those works are now in the public domain in the U.S. and can be shared, remixed and reused without copyright restrictions. In curating her collection, she was thoughtful about how to remix the creative works in a package that would appeal to listeners. 

“The variety of folktales and fairy tales in the world is just enormous. So many think it begins and ends with the Brothers Grimm,” said Gibbs, of the German folklorists. “My number one goal was to have worldwide coverage—stories not just from Europe, but also from Africa, the Middle East, South Asia, East Asia, and the Americas.”

Overall, Gibbs has recorded nine books of African folktales with more than 200 stories available for listening here.

Gibbs also wanted stories with accessible language—not too many old fashioned “thee” or “thou” references. Once she decided on the line up, she invited people to record each story, and was pleased with the response from new and experienced readers to volunteer for the project.

In addition to producing the anthology, Gibbs “proof listens” to book chapters by other readers before they are shared with the LibriVox community. The work involves careful attention to detail—listening for background noise (a car honking, phone ringing, etc.) or misspoken words. Gibbs flags the noise by marking the exact time, which she then reports back to the readers for re-recording.

Gibbs said she’s enjoyed the range of materials she gets to review. “It’s fun discovering weird, random stuff in the public domain,” she said. Her proof listening projects are listed here.

Bambi: A Life in the Woods: Listen

Recently, Gibbs proof listened to the English translation of the 1928 classic, “Bambi: A Life in the Woods,” by Felix Salton, translated by Whittaker Chambers. “The book is fantastic, and the reader is the best…she performed all the different voices of the animals and even the individual fawns,” she said. “If anybody wants something beautiful and inspiring to listen to, it’s now available at LibriVox and also at the Internet Archive, where LibriVox hosts all its audio files.” 

Gibbs plans to continue creating audio folktale anthologies by year. She’s already started on works from 1927. She added: “For the rest of my life, we are going to have new content entering the public domain, year by year, so I’ll keep going.”

For more on Gibbs’s curation of African folk tales see: Library as Laboratory Recap: Curating the African Folktales in the Internet Archive’s Collection | Internet Archive Blogs

For more on the public domain works from 1928, see: Public Domain Day Celebrates Creative Works from 1928 | Internet Archive Blogs

Taking the Words Out of My Mouth (With AI)

Everyone has a different idea of what they’d do with a time machine. Mine’s pretty simple: Head back to 2012, find myself working on a side project to film a documentary, and grab my younger self by the lapels and shout, “A 1099 IS NOT A REIMBURSEMENT! GET AN ACCOUNTANT!” before whatever energy sending me into the past gave out.

That simple mistake on my part had truly stunning financial consequences. When the tax bills and penalties started hitting my mailbox around 2014, it became a mass of stress. The IRS is undefeated in the award for Most Intense Collection Letters, and they were coming on a regular basis, even as I started directing more and more of my paycheck towards paying the debt.

A few friends generously sent me money to help beat back the interest and costs, but the combination of this and other debts had me backed into a corner, so I decided on a simple plan: Run a Patreon campaign where I told stories and opinions in short episodes, which were then supported by the audience, with all the proceeds going into debt repayment. The resulting podcast, Jason Scott Talks His Way Out of It, helped dig me out of that hole.

But it had an interesting side effect – coming up with new topics of discussion and consideration, once a week, meant that I had to mine deeper and deeper into my own outlook and relevant stories. After the first fifty episodes, I turned further inward. After one hundred, it became equal parts emotional and technical. And as I’m heading towards three hundred episodes, I’m surprised I have anything left to say at all. But I apparently do, and having now been doing this podcast weekly for six years, its episodes span a truly panoramic spectrum of topics.

(This is what a standing recording booth looks like – putting your head in an Audio Mailbox to maintain quiet background noise while speaking. It gets very hot in there.)

The Patreon gave patrons early access to the episodes, but the episodes are all open and uploaded some months later to the general world, including this collection at Internet Archive. Download, listen, remix, whatever you’d like – you have my complete permission and blessing.

Episodes have been uploaded to the Internet Archive since 2019, but in 2022, an opportunity try out new technology came up – the Whisper project, open sourced and instantly downloadable, could be implemented for transcription, either as part of video or just a basic audio file. And thanks to the project, I had many audio files, and began experimenting with using Whisper against them.

Speech recognition, the process of turning spoken words in a microphone or pre-recorded audio files into written words or issued commands, has been around a very long time – decades and decades. The Internet Archive is excellent for doing a dive into historical citations; a fast “text contents” search found these points of discussion in a 1979 issue of the Silicon Gulch Gazette newsletter:

What has changed is the combination of much faster computers, much more analysis of speech, and advances in cross-referencing the resulting training to make chips and, in this case, a program that is using other disciplines within computer science to pattern-match audio, to the point of adding capitalization and punctuation from the implications in the words. Turning this against my growing collection of podcasts, it wasn’t long before I’d say what has continued to be a theme: when it works, it’s shockingly good, and when it doesn’t, it’s shockingly bad.

As an experiment and exploration, it was very useful to let the program run, shoot out a block of text, and generate the resulting timing blocks for the purposes of subtitles or transcription:

…but it would have been foolhardy to 100% walk away and let it do transcription without a second human-driven scan through the results to find mistakes. I’ve been that human, and I’ve seen things.

I’ve seen the resulting transcriptions do great jobs with proper name capitalization, odd and challenging punctuation, and paragraph breaks. I’ve also seen it knock itself silly on my New York accent and non-obscure phrasing, and definitely making a poor guess on my made-up word “Cowicature”. The algorithm works great, except when it doesn’t.

And here we get to a turn of phrase I’ve come to adapt, which is an alternate term for AI: “Algorithmic Intensity”. The human need to give life and will to machinery is a very long-lived one; but most who look at the code behind this mechanism would agree – it’s just code. The only difference is that the amount of computing power and data to derive the outcome dwarfs numbers considered unattainable a decade or two ago.

And the speed can’t be beat – 5 years of weekly podcasts took roughly 4 hours to completely transcribe by Whisper, and that amount of time was simply because it was set as a lower-priority action in the queueing system. Knowing how long the total time for all the uploaded episodes of my podcast are sitting in that collection is a little involved, but my back of napkin estimation is it’s two and a half solid days of me talking about technology and emotion, ranging subjects from programming and compilers to summer camp and family. I can assure you – I was not going to transcribe these podcasts anytime soon, and I was not going to ask someone to do it. While I’m proud of my work, I’m in no position to be able to record four podcasts in a month and create paragraphs of text from them.

Or, for that matter, descriptive summaries.

Which brings me to the latest, continual tinkering with the tools and environment available to analyze materials with algorithms. I began asking a large language model to look at the generated transcriptions and create a summary of a given podcast episode.

Two-plus years into generalized algorithmic intensity access, it’s still very much a lumpy and oddly spell-casting endeavor. Instead of asking it to “summarize this transcription”, my request reads like a headmaster at a school or a Dungeons and Dragons game text:

You read transcripts of a podcast and carefully write out descriptions, in the form of narrative paragraphs, to accurately describe the content of the podcast. Longer and more complete descriptions are better, and encouraged. You describe the main subjects, conclusions by the participants, and provide helpful context for the subjects. The podcast you listen to is:

…followed by the transcription of the podcast, time-codes and all.

A matrix of calculation, fast beyond my reckoning but not less mysterious-and-not-mysterious as CPUs and networking itself, begins analyzing the language in the transcriptions, cross-connecting ideas mentioned, occasionally volunteering more information based on matches to terms, and within a few minutes, language comes out.

…in this language, it’s easy to find places where there’s an algorithm in the mix, a machine putting out syllables and phrases based on what I said. Like a kaleidoscope or a magnifying glass, there are occasional strange distortions and what approaches funhouse-mirror reflections of what I put in.

And oddly enough, it brings up memories.

It reminds me of being in my first neighborhood and all the kids circled around a fun and weird toy, a tape recorder, where you pressed two buttons and the … cassette, it was called? Would turn slowly and you would shout whatever came to the top of your head, press STOP, and then rewind and hear your own voice. It was distorted and weird, but it was my own voice, and I’d not heard it before from outside my own head. The world shifted, a little bit.

It reminds me of being in a computer lab at MIT in my twenties, playing games on a lunch break during a temp job I had on campus, and finding out that if I went to this obscure IP number and downloaded this very large binary file and ran it, this gargantuan program would pop up and begin asking me what “address” to go to, and if I knew some of the addresses, they would turn into full-color, fully-laid out, beautiful print-like presentations along a grey background. And, to my utter disbelief, looking at a menu option called “view source” would let me see the bizarre “code” behind these lovely looking pages. It was called Hypertext Mark-Up Language, HTML, and messing with that odd and ever-changing set of written directives inside textfiles would be a significant part of my next quarter century.

There have been all these times the times that computers (on my desk or lab or out in some nebulous “net” that girdled the Earth) provided me with a new set of buttons and knobs to press. To play around. To see what hums and clicks, and to do my best to understand what is happening underneath. In many ways, these computers have done very well by me; and they have also brought stress, sleepness nights and confusion; even anger.

Put another way, these newest concepts and tools, these Intense Algorithms, sweeping the news and the discussion around virtual watercoolers, are the latest in a long line of topics about what computers will bring to human lives. It’s a deep and varied discussion and it deserves consideration and time, and respect of the opinions of others.

And while (hopefully) the world processes this newest use of computers, we’ll see where, if anywhere, they’ll help with the mass of information and digital materials that sites like the Internet Archive provide.

Asking an LLM to look at the full text of a nearly 200-page book about the raising of ducks, such as the legendary DUCKS; AND HOW TO MAKE THEM PAY (1924), provides a paradise of questions you can ask the book as if it was able to answer in full sentences:

Q: How, exactly, do you make ducks pay?

A: The book “Ducks: and how to make them pay” by William Cook outlines various methods for making duck farming profitable. It covers the selection of suitable duck breeds, breeding techniques, and the proper management of feeding and housing. Additionally, it provides insights into marketing strategies for selling ducks and their eggs. The goal is to guide readers on how to efficiently manage duck farms to maximize income from their operations.

Q: What is, in your opinion, the best duck?

A: Choosing the “best” duck depends on what you’re looking for! For farming purposes, the Pekin duck is often considered ideal because it grows quickly and has a high yield of meat. If egg production is your priority, the Khaki Campbell duck is highly recommended, as it is one of the most prolific egg layers. For those interested in ornamental ducks, breeds like the Mandarin or Wood Ducks are known for their striking appearances. Each breed has its own advantages depending on your goals and the specific requirements of your duck farming or hobbyist endeavors.

(…as an aside, the book has one of the best bookplates inside the front cover, one which only a human being would truly apprecate.)

And that’s how I see the near future with this newest use of computers – an implementation of these tools to make materials easier to find, to summarize and help point researchers and students, and allowing new ways to work with a staggering amount of information in the online stacks. Hiding from it won’t be a solution; but asking hard questions of it might be.

Meanwhile, six years of my spoken-word memories of the ways computers have affected my life are sitting in a collection, open to all and welcome for anyone to use to tinker with. Have a great time.

A Happy Ending For Some Manual Labor (And a Call for Support)

A great saga of rescue and preservation is coming towards its end, and there’s a chance to bask in the victory, and help push towards its conclusion.

I got word in 2015 of a collection of manuals inside a business that was getting out of the manuals business, and while a lot of well-meaning people talked a good game, they wanted to cherry-pick (people getting rid of stuff hate cherry-pickers), and I drove down to show I was serious, and after a week of work with MANY volunteers and contributors, we ended up with pallets of documentation inside boxes, tens of thousands of unique manuals, many nowhere else.

Then they were stored in a storage unit. Then they were stored in a closed coffee house. Then they were transported to Internet Archive’s Physical Archive. Then they were stored until last year, 2023.

Last year, a group called DLARC, doing digitizing and indexing projects around ham radio and radio technology, worked with me and the archive to sort four pallets of the manuals for products related to the history of radio/network technology, and off they went overseas to be scanned. And as of this month, the evaluated, professionally-scanned and available-to-the-world manuals are finished, except for a few stragglers. The loop has closed!

You can browse the collection of thousands of scanned manuals here:

The Manuals Plus Collection

And now, the pitch.

The company doing the digitizing does lots of digitizing for the Internet Archive. They are well-paid and legitimate professional contractors who are sent the items, and who do careful scanning to the best of the materials’ ability to provide access to the information, and then do quality checks, and then upload them. When they’re humming, they’re processing a pallet every couple of weeks (with lots of mitigating factors).

I’ve negotiated a situation where, if money is sent in, the remaining pallets that should be scanned can just be sent along without sorting them for DLARC funds, DLARC will fund any that happen to overlap with their mission, and the rest will just be done.

That’s if money is sent in.

How much money? The number approaches hundreds of thousands of dollars. So I’m looking for both big-ticket supporters (who can mail me at [email protected]) or individuals.

Here is a specialized donation link:

If we make less than we need to scan them all, then we’ll only scan up to where it’s paid for. I believe we can close it out, but if the interest/money isn’t there, then it isn’t there – fair enough. Browse the collection as it grows into thousands of manuals as it is and consider if you want to be part of all that. That’s definitely happened.

But what a happy ending it would be to push all these manuals through the process, and close it up. That’s why I’m popping up to talk about it, and why I hope you would consider contributing towards it, for a non-profit that deserves your support generally.

Internet Archive Stands Firm on Library Digital Rights in Final Brief of Hachette v. Internet Archive Lawsuit

Today, the Internet Archive has taken a decisive final step in our ongoing battle for libraries’ digital rights by submitting the final appellate reply brief [PDF] in Hachette v. Internet Archive, the publishers’ lawsuit against our library. This move reaffirms Internet Archive’s unwavering commitment to fulfilling our mission of providing universal access to all knowledge, even in the face of steep legal challenges.


Statement from Brewster Kahle, founder and digital librarian of the Internet Archive:
“Resolving this should be easy—just sell ebooks to libraries so we can own, preserve and lend them to one person at a time. This is a battle for the soul of libraries in the digital age.”

This process has taken nearly four years to work through the legal system, and in that time we’ve often fielded the question, “Why should I care about this lawsuit?” By restricting libraries’ ability to lend the books they own digitally, the publishers’ license-only business model and litigation strategies perpetuate inequality in access to knowledge.

Throughout this legal battle, Internet Archive has remained steadfast in our mission to defend the core values of libraries—preservation, access, and education. This fight is not just about protecting the Internet Archive’s digital lending program; it’s about standing up for the digital rights of all libraries and ensuring that future generations have equal access to the wealth of knowledge contained within them.