Skip to main content

Internet Archive will ignore robots.txt files to keep historical record accurate

internet archive robots txt server
Internet Archive
The Internet Archive has announced that going forward, it will no longer conform to directives given by robots.txt files. These files are predominantly used to advise search engines on which portions of the page should be crawled and indexed to help facilitate search queries.

In the past, the Internet Archive has complied with instructions laid out by robots.txt files, according to a report from Boing Boing. However, it has been decided that the way that these files are calibrated is often at odds with the service that the site sets out to provide.

“Over time we have observed that the robots.txt files that are geared toward search engine crawlers do not necessarily serve our archival purposes,” stated a blog post that the organization published last week. “Internet Archive’s goal is to create complete ‘snapshots’ of web pages, including the duplicate content and the large versions of files.”

Robots.txt files are increasingly being used to remove entire domains from search engines following their transition from a live, accessible site to a parked domain. If a site goes out of business, and is rendered inaccessible in this way, it also becomes unavailable for viewing via the Internet Archive’s Wayback Machine. The organization apparently receives queries about these sites on a daily basis.

The Internet Archive hopes that disregarding robots.txt files will help contribute to an accurate representation of prior points in the web’s history, removing their capacity to muddy the waters with instructions intended for search engines.

The organization has already ceased referring to robots.txt files on sites and pages related to the U.S. government and the U.S. military, to account for the enormous changes that can be made to domains between one administration and the next. This decision has caused no major problems, so there are high hopes that discontinuing the use of the files more broadly will be helpful.

Brad Jones
Former Digital Trends Contributor
Brad is an English-born writer currently splitting his time between Edinburgh and Pennsylvania. You can find him on Twitter…
GPT-5 to take AI forward in these two important ways
gpt 5 advance ai in two important ways memory reasoning kevin scott reid hoffman discuss

Breaking Down Barriers to AI Innovation with Reid Hoffman & Kevin Scott

We could soon see generative AI systems capable of passing Ph.D. exams thanks to more "durable" memory and more robust reasoning operations, Microsoft CTO Kevin Scott revealed when he took to the stage with Reid Hoffman during a Berggruen Salon in Los Angeles earlier this week.

Read more
Dell’s cheapest student laptop just got a big discount
The Dell Inspiron 14 laptop on a white background,

It used to be that laptops were pretty rare in a classroom, but in the past 5 years or so, having a laptop has become a necessity, especially with so much of the coursework being online. To that end, grabbing a laptop is important, but you don't have to spend a ton of money on one of the best laptops to get something that's perfect for a student workload. In fact, Dell has a couple of laptops that work great for students, like the Dell Inspiron 14. While it usually goes for $500, Dell has discounted it quite heavily down to just $280, so it's almost half of and perfect for those who are on a budget.

Why you should buy the Dell Inspiron 14
There quite a few configurations of the Dell Inspiron 14, but if you don't need anything too fancy, this configuration brings the price down quite substantially. Under the hood is a Qualcomm Snapdragon processor, which is the sort of processor you'd find on some of the best phones. While that does mean it isn't as powerful when it comes to running Windows 11, it's more than enough to get the basics done, such as accessing content online, or running something like the Google Suite or Microsoft 365.

Read more
Adobe clarifies new AI terms and conditions after high-profile users revolt
Adobe Creative Cloud Suite apps list

Adobe updated the terms and conditions for its popular Creative Cloud suite of photo and video editing apps on Thursday, setting off a wave of protests and vitriol from its users, who were upset that the new rules seemingly granted the company rights to "access [user] content through both automated and manual methods, such as for content review.” On Friday, the company was forced to clarify those changes and unequivocally state that, no, Adobe does not own artists' works, nor will it use that content to train its AI systems like Firefly.

The controversy began Thursday when Creative Cloud users opened their apps to discover themselves locked out from using the programs, uninstalling them, or even contacting customer support, until the new terms were agreed to. Users were not amused.

Read more