This page is part of an ongoing effort by the Snopes newsroom to teach the public the ins and outs of online fact-checking and, as a result, strengthen people's media literacy skills. Misinformation is everyone's problem. The more we can all get involved, the better job we can do combating it. Have a question about how we do what we do? Let us know.
Here at Snopes, archiving web links is key to our fact-checking practice. And thanks to numerous archival resources on the internet, that practice has become easier than ever. Keeping records on the internet is essential to understanding not just the history of the web, but also to help us track whether a tweet was ever deleted, or if someone amended a statement on a web page.
But this is not just unique to our roles as fact-checkers. Governments also keep archives of the websites of each administration, in the interests of transparency and public access. Former U.S. President Donald Trump's White House website is trumpwhitehouse.archives.gov, while Barack Obama's White House website can be found at obamawhitehouse.archives.gov. And the Clinton administration established the first White House website in 1994. These sites are labeled as "historical material, "frozen in time."" Some federal sites are "harvested" and saved by the Federal Depository Library Program Web Archive , which aims to "provide permanent public access to Federal Agency Web content."
Estimates about the average lifespan of a webpage vary over time. In 1997 Scientific American estimated it was 44 days, and the New Yorker in 2015 suggested it could be 100 days. But some web pages can be deleted in a matter of hours especially if they are of a politically sensitive nature.
In 2014, when Malaysia Airlines Flight 17 was shot down over Ukrainian airspace, a Ukrainian separatist leader Igor Girkin also known as Strelkov reportedly wrote, "We just downed a plane, an AN-26." While an AN-26 is a Soviet-built, military cargo plane, the photographs on the post appeared to be of a Boeing 777. The Wayback Machine saved the post, which was deleted from Strelkov's page only a couple hours later. By the time a journalist tweeted a picture of the saved webpage writing, "Grab of Donetsk militant Strelkov's claim of downing what appears to have been MH17," Strelkov's page had been edited and the claim deleted. The only proof of that post was the saved screenshot on archive.org. While the post could possibly have been misleading, the incident revealed the Internet Archive's role in collecting receipts that became useful to journalistic investigations.
The Internet Archive (archive.org) is considered to be one of the largest such archives of the internet, with around 625 billion web pages saved since its founding in 1996. Its Wayback Machine allows users to go through 25 years of web history, and the organization partners with the Federal Depository Library Program and other organizations through Archive-It.
The Internet Archive is not the only online database. Others include archive.today, perma.cc, the U.K. Web Archive (specific to sites from the United Kingdom and a collaboration with U.K. Legal Deposit Libraries), and Time Travel. Wikipedia also has a long list of international archiving efforts.
How to Archive a Web Page
The most straightforward site to get started on, however, is archive.org. Here, you simply input a link into the Wayback Machine to see if it already exists, by clicking on "Browse History." Below that, another option allows you to "Save Page Now," and create a new link.
If you want to browse through the history of a web page, you will get directed to all the past instances it has been archived, organized like a calendar, down to the month, day, and time it was saved. You can click on a date (indicated by a blue bubble) to get access to a webpage. The larger the bubble, the more times a page was archived on that day. We should note that a green link indicates a webpage was redirected, and may not work, so users should click on blue links.
The top of the search results page also tells users how many times a webpage was archived, and the date range. The top bar shows the years the pages were saved while the calendar below it allows us to click on the month, day, and time.
Archive.org also has a large collection of books that we have frequently relied on in our research.
On archive.today you can also search for whether a link has been archived before, and also archive one yourself.
How Do We Know Archived Pages Are Not Manipulated?
While people have screenshotted webpages and tweets in the past, it is easier to manipulate simple images than it is to edit an already archived webpage. According to an article by professor of computer science Michele C. Weigle, published by the Social Science Research Council (SSRC):
In addition, screenshots are static. There can be no interaction with the page—no scrolling, no hovering, no clicking of links or even revealing what web pages the links on the page referred to.
Web archives, on the other hand, record the entire contents of a web page, including its source HTML and embedded images, stylesheets, or JavaScript source. Upon playback, the user can interact with the archived page, including clicking links to explore what the web page was connected to. In addition, public web archives are created and stored by independent archival organizations, such as the Internet Archive. We trust that the contents of these public web archives have not been tampered with or maliciously manipulated.
However, archived links are not perfect, and come with a range of possible glitches, according to SSRC:
Although web archives provide a valuable service, they are not perfect, and archiving a web page is very different from archiving a physical object or even a static file such as a PDF. Web pages have become increasingly more complex over the years, with many loading hundreds or even thousands of images, stylesheets, and JavaScript resources, which can include advertisements and trackers. These JavaScript resources are executed by web browsers, and many of their interactions cannot be captured by all web archives. The embedded and linked nature of HTML makes the direct replay of archived web pages difficult, so web archives must make some limited transformations to the original web page. This includes rewriting links and locations of embedded resources so that they are loaded from the archive instead of the live web. This prevents someone from viewing a web page captured in 2012, for instance, and seeing an advertisement from 2018 embedded in that 2012 web page.
With all the imperfections in archival resources online, here at Snopes we have still relied on them for numerous fact checks, including ones about the Twitter history of public figures like Raphael Warnock, old quotes from magazines, and much more.