Wayback Machine: Difference between revisions

Content deleted Content added
mNo edit summary
Tags: Visual edit Mobile edit Mobile web edit
m Filled in 1 bare reference(s) with reFill 2
 
(41 intermediate revisions by 30 users not shown)
Line 1:
{{Short description|Digital archive by the Internet Archive}}
{{For|the time machine from Peabody's Improbable History and its namesake|Wayback Machine (Peabody's Improbable History)}}
{{self-reference|For help citing the Wayback Machine in the English Wikipedia, see [[:Help:Using the Wayback Machine]]}}
{{Use mdy dates|date=March 2022}}
Line 9:
| logo_alt = Stylized text saying: "INTERNET ARCHIVE WAYBACK MACHINE". The text is in black, except for "WAYBACK", which is in red.
| screenshot =
| url = {{Official URL}}Plainlist|
* {{URL|https://web.archive.org}}
* {{Onion URL|web.archive6zg5vrdwm4ljllgxleekeoj43lqayscd4d4kmhnyblq4h3ead}}<ref>{{Cite web|url=https://github.com/yt-dlp/yt-dlp/issues/9707|title=wayback machine onion support · Issue #9707 · yt-dlp/yt-dlp|website=GitHub}}</ref>
}}
| type = Archive
| commercial = No
Line 17 ⟶ 20:
| owner = [[Internet Archive]]
| founded = {{ubl|{{start date and age|1996|05|10}} (private)|{{start date and age|2001|10|24}} (public)}}
| current_status = Act<!-- Do not change status unless the outage is multiple days long. isitdownrightnow.com is useful to track short duration outages. We are an encyclopedia not a website health check with up to the minute changes. -->ive
| current_status = Active
}}
 
The '''Wayback Machine''' is a [[Web archiving|digital archive]] of the [[World Wide Web]] founded by the [[Internet Archive]], an [[501(c)(3) organization|American nonprofit organization]] based in [[San Francisco, California]]. Created in 1996 and launched to the public in 2001, it allows the user to go "back in time" to see how websites looked in the past. Its founders, [[Brewster Kahle]] and [[Bruce Gilliat]], developed the Wayback Machine to provide "universal access to all knowledge" by preserving archived copies of defunct web pages.<ref>{{Cite web |last=Kahle |first=Brewster |date=2005-11-23 |title=Universal Access to all Knowledge |url=https://archive.org/details/SDForumBK |archive-date=2022-08-14 |archive-url=https://web.archive.org/web/20220814164546/https://archive.org/details/SDForumBK |access-date=2022-06-05 |website=[[Internet Archive]]}}</ref>
 
Launched on May 10, 1996, the Wayback Machine had saved more than 38.2 billion web pages at the end of 2009. As of January 3, 2024, the Wayback Machine has archived more than 860 billion web pages and well over 99 petabytes of data.<ref name="auto1">{{Cite web |date= |title=Internet Archive: Wayback Machine |url=https://web.archive.org/ |archive-url=https://web.archive.org/web/20230313021854/https://archive.org/web/ |archive-date=2023-03-13 |access-date= |website=web.archive.org}} The current number of archived pages can be seen at the archive's [https://web.archive.org/ home page].</ref><ref name="auto">{{cite web |last1=Kahle |first1=Brewster |title=A Message from Internet Archive Founder, Brewster Kahle |url=https://archive.org/donate |website=Internet Archive |access-date=10 January 2024}}</ref>
 
==History==
Line 66 ⟶ 70:
As of December 2020, the Wayback Machine contained over 70 petabytes of data.<ref>{{cite web |url=https://blog.adafruit.com/2020/12/01/donate-to-the-internet-archive-digital-library-of-free-borrowable-books-movies-music-wayback-machine-internetarchive/ |title=Donate to the Internet Archive: Digital Library of Free & Borrowable Books, Movies, Music & Wayback Machine |publisher=adafruit |access-date=December 2, 2020 |archive-date=December 2, 2020 |archive-url=https://web.archive.org/web/20201202065323/https://blog.adafruit.com/2020/12/01/donate-to-the-internet-archive-digital-library-of-free-borrowable-books-movies-music-wayback-machine-internetarchive/ |url-status=live }}</ref>
 
The Internet Archive, as of January 2024, attests to have stored well over 99 petabytes of data so far.<ref name="auto1"/><ref name="auto"/>
The Internet Archive, as of January 2024, attests to have stored well over 99 petabytes of data so far.<ref>{{Cite web |date= |title=Internet Archive: Wayback Machine |url=https://web.archive.org/ |archive-url=https://web.archive.org/web/20230313021854/https://archive.org/web/ |archive-date=2023-03-13 |access-date= |website=web.archive.org}} The current number of archived pages can be seen at the archive's [https://web.archive.org/ home page].</ref><ref>{{cite web |last1=Kahle |first1=Brewster |title=A Message from Internet Archive Founder, Brewster Kahle |url=https://archive.org/donate |website=Internet Archive |access-date=10 January 2024}}</ref>
 
{{Bar chart
Line 93 ⟶ 97:
 
===Website exclusion policy===
Historically, the Wayback Machine has respected the [[robots exclusion standard]] (robots.txt) in determining if a website would be crawled – or if already crawled, if its archives would be publicly viewable. Website owners had the option to opt-out of Wayback Machine through the use of robots.txt. It applied robots.txt rules retroactively; if a site blocked the Internet Archive, any previously archived pages from the domain were immediately rendered unavailable as well. In addition, the Internet Archive stated that "Sometimes, a website owner will contact us directly and ask us to stop crawling or archiving a site. We comply with these requests."<ref>{{cite web|url=https://web.archive.org/collections/web/faqs.html#exclusions |title=FAQs - Some sites are not available because of Robots.txt or other exclusions. What does that mean? |website=Internet Archive Wayback Machine |archive-url=https://web.archive.org/web/20110415130934/https://web.archive.org/collections/web/faqs.html#exclusions |archive-date=April 15, 2011}}</ref> In addition, the website says: "The Internet Archive is not interested in preserving or offering access to Web sites or other internet documents of persons who do not want their materials in the collection."<ref>{{cite web|url=https://www.archive.org/about/faqs.php#2 |title= Frequently Asked Questions |website=Internet Archive |archive-url=https://web.archive.org/web/20140417122600/https://archive.org/about/faqs.php |archive-date=April 17, 2014|url-status=dead}}</ref><ref>{{cite news |url=https://motherboard.vice.com/en_us/article/nekzzq/wayback-machine-deleting-evidence-flexispy |website=Vice |title=The Wayback Machine Is Deleting Evidence of Malware Sold to Stalkers |last=Cox |first=Joseph |date=May 22, 2018 |access-date=May 23, 2018 |archive-url=https://archive.today/20180522192132/https://motherboard.vice.com/en_us/article/nekzzq/wayback-machine-deleting-evidence-flexispy |archive-date=May 22, 2018 |url-status=live}}{{cbignore}}</ref>
 
On April 17, 2017, reports surfaced of sites that had gone defunct and became [[parked domain]]s that were using robots.txt to exclude themselves from search engines, resulting in them being inadvertently excluded from the Wayback Machine.<ref>{{cite web |title=Robots.txt meant for search engines don't work well for web archives |url=https://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/ |website=Internet Archive |date=April 17, 2017 |access-date=June 29, 2019}}</ref> The Internet Archive changed the policy to now require an explicit exclusion request to remove it from the Wayback Machine.<ref name="using" />
Line 107 ⟶ 111:
When the Wayback Machine archives a page, it usually includes most of the hyperlinks, keeping those links active when they just as easily could have been broken by the Internet's instability. Researchers in India studied the effectiveness of the Wayback Machine's ability to save hyperlinks in online scholarly publications and found that it saved slightly more than half of them.<ref>{{cite journal |last1=Sampath Kumar |first1=B.T. |last2=Prithviraj |first2=K.R. |date=October 21, 2014 |title=Bringing life to dead: Role of Wayback Machine in retrieving vanished URLs |journal=Journal of Information Science |volume=41 |issue=1 |pages=71–81 |doi=10.1177/0165551514552752 |s2cid=28320982 |issn=0165-5515}}</ref>
 
"Journalists use the Wayback Machine to view dead websites, dated news reports, and changes to website contents. Its content has been used to hold politicians accountable and expose battlefield lies."<ref name="usn1">{{cite web |url=https://www.usnews.com/news/articles/2016-08-17/wayback-machine-wont-censor-archive-for-taste-director-says-after-olympics-article-scrubbed |first1=Steven |last1=Nelson |date=Aug 17, 2016 |website=U.S. News & World Report |title=Wayback Machine Won't Censor Archive for Taste, Director Says After Olympics Article Scrubbed |archive-url=https://web.archive.org/web/20170106151933/http://www.usnews.com/news/articles/2016-08-17/wayback-machine-wont-censor-archive-for-taste-director-says-after-olympics-article-scrubbed |archive-date=January 6, 2017 |url-status=live |access-date=May 14, 2017}}</ref> In 2014, an archived social media page of [[Igor Girkin]], a separatist rebel leader in Ukraine, showed him boasting about his troops having shot down a suspected Ukrainian military airplane before it became known that the plane actually was a civilian Malaysian Airlines jet ([[Malaysia Airlines Flight 17]]), after which he deleted the post and blamed Ukraine's military for downing the plane.<ref name="usn1"/><ref name="NewYorker-2015-01-26">{{cite magazine |title=What the Web Said Yesterday |url=https://www.newyorker.com/magazine/2015/01/26/cobweb |url-access=limited |magazine=[[The New Yorker]] |access-date=May 14, 2017 |url-status=live |archive-url=https://web.archive.org/web/20150125141230/http://www.newyorker.com/magazine/2015/01/26/cobweb |archive-date=January 25, 2015
|first=Jill |last=Lepore | author-link=Jill Lepore | date=January 26, 2015
}}</ref> In 2017, the [[March for Science]] originated from a discussion on [[Reddit]] that indicated someone had visited Archive.org and discovered that all references to [[climate change]] had been deleted from the White House website. In response, a user commented, "There needs to be a Scientists' March on Washington".<ref>{{cite news |title=The March for Science began with this person's 'throwaway line' on Reddit |url=https://www.washingtonpost.com/news/speaking-of-science/wp/2017/04/21/the-march-for-science-began-with-this-persons-throwaway-line-on-reddit/ |date=April 21, 2017 |first1= Ben |last1=Guarino |newspaper=Washington Post |access-date=April 23, 2017 |url-status=live |archive-url=https://web.archive.org/web/20170423081417/https://www.washingtonpost.com/news/speaking-of-science/wp/2017/04/21/the-march-for-science-began-with-this-persons-throwaway-line-on-reddit/ |archive-date=April 23, 2017}}</ref><ref name=":1">{{cite news |url=https://www.washingtonpost.com/news/speaking-of-science/wp/2017/01/24/are-scientists-going-to-march-on-washington/ |url-access=subscription |date=Jan 25, 2017 |first1=Sarah |last1=Kaplan |title=Are scientists going to march on Washington? |newspaper=The Washington Post |access-date=January 31, 2017 |url-status=live |archive-url=https://web.archive.org/web/20170131152535/https://www.washingtonpost.com/news/speaking-of-science/wp/2017/01/24/are-scientists-going-to-march-on-washington/ |archive-date=January 31, 2017}}</ref><ref>{{cite news |last1=Foley |first1=Katherine Ellen |title=The global March for Science started with a single Reddit thread |url=https://qz.com/965485/the-global-march-for-science-started-with-a-single-reddit-thread/ |date=April 22, 2017 |work=Quartz |access-date=April 23, 2017 |url-status=live |archive-url=https://web.archive.org/web/20170424004314/https://qz.com/965485/the-global-march-for-science-started-with-a-single-reddit-thread/ |archive-date=April 24, 2017}}</ref>
 
Furthermore, the site is used heavily for verification, providing access to references and content creation by [[Wikipedia community|Wikipedia editors]].<ref name=9mil>{{Cite web|url=http://blog.archive.org/2018/10/01/more-than-9-million-broken-links-on-wikipedia-are-now-rescued/|title=More than 9 million broken links on Wikipedia are now rescued|first=Mark|last=Graham|date=October 1, 2018 |website=Internet Archive Blogs |url-status=live |archive-url= https://archive.today/20230408194542/http://blog.archive.org/2018/10/01/more-than-9-million-broken-links-on-wikipedia-are-now-rescued/ |archive-date= 8 Apr 2023 }}</ref> When new URLs are added to Wikipedia, the Internet Archive has been archiving them.{{r|9mil}}
 
In September 2020, a partnership was announced with [[Cloudflare]] to automatically archive websites served via its "Always Online" service, which will also allow it to direct users to its copy of the site if it cannot reach the original host.<ref>{{Cite web |last=Graham |first=Mark |date=September 17, 2020 |title=Cloudflare and the Wayback Machine, joining forces for a more reliable Web |url= http://blog.archive.org/2020/09/17/internet-archive-partners-with-cloudflare-to-help-make-the-web-more-useful-and-reliable/ |access-date=September 17, 2020 |website= Internet Archive Blogs}}</ref>
Line 179 ⟶ 183:
Alexander Rose, executive director of the [[Long Now Foundation]], suspects that in the long term of multiple generations "next to nothing" will survive in a useful way, stating, "If we have continuity in our technological civilization, I suspect a lot of the bare data will remain findable and searchable. But I suspect almost nothing of the format in which it was delivered will be recognizable" because sites "with deep back-ends of content-management systems like Drupal and Ruby and Django" are harder to archive.<ref>{{cite journal |last1=LaFrance |first1=Adrienne |title=The Internet's Dark Ages |url=https://www.theatlantic.com/technology/archive/2015/10/raiders-of-the-lost-web/409210/ |journal=The Atlantic |date=October 14, 2015 |access-date=May 14, 2017 |url-status=live |archive-url=https://web.archive.org/web/20170507173716/https://www.theatlantic.com/technology/archive/2015/10/raiders-of-the-lost-web/409210/ |archive-date=May 7, 2017}}</ref>
 
In 2016 in an article reflecting on the preservation of human knowledge, ''[[The Atlantic]]'' has commented that the Internet Archive, which describes itself to be built for the long-term,<ref>{{cite web|title=The Entire Internet Will Be Archived In Canada to Protect It From Trump |url=https://motherboard.vice.com/en_us/article/the-entire-internet-will-be-archived-in-canada-to-protect-it-from-trump |publisher=Motherboard |access-date=May 14, 2017 |url-status=live|archive-url=https://web.archive.org/web/20170516221604/https://motherboard.vice.com/en_us/article/the-entire-internet-will-be-archived-in-canada-to-protect-it-from-trump |archive-date=May 16, 2017 |date=November 29, 2016 }}</ref> "is working furiously to capture data before it disappears without any long-term infrastructure to speak of."<ref>{{cite web|last1=LaFrance |first1=Adrienne |title=The Human Fear of Total Knowledge |url=https://www.theatlantic.com/technology/archive/2016/06/knowledge-compendia/485507/ |website=The Atlantic |access-date=May 14, 2017 |url-status=live |archive-url=https://web.archive.org/web/20161202040113/http://www.theatlantic.com/technology/archive/2016/06/knowledge-compendia/485507/ |archive-date=December 2, 2016 |date=June 3, 2016 }}</ref>
 
==See also==
Line 190 ⟶ 194:
* [[Time capsule]]
* [[Z-Library]]
* [[archive.today]]
{{div col end}}