Internet Archive

from Wikipedia, the free encyclopedia
Internet Archive
Wayback Machine logo 2010.svg
motto Universal access to all knowledge
description Web archiving /
digital library
Registration optional
languages Surface engl.
owner Internet Archive, San Francisco CA
Originator Brewster Bald
Published 1996

The Internet Archive in San Francisco is a non-profit project that was founded by Brewster Kahle in 1996 and has had official library status since 2007. It was started as a pure web archive , where you can view archived websites with the so-called Wayback Machine . As early as 1999, it was expanded to include additional archives, so that it is now a digital library that contains a significant collection of texts and books, audio files, videos, images and software. The Internet Archive has set itself the task of long-term archiving of digital data in freely accessible form and attaches great importance to access options for blind or otherwise restricted users.

In addition to its function as an archive, the Internet Archive also sees itself as an activist for an open and free Internet and the preservation and distribution of works in the public domain .

Origin and history

Old logo until 2001
Brewster Kahle , founder of Alexa Internet and the Internet Archives (2015)
Mirror server with the data from San Francisco in the Egyptian Bibliotheca Alexandrina

Brewster Kahle founded the Internet Archive in May 1996 as a non-profit organization under Section 501 (c) (3) of US Income Tax Law . Right from the start, it received a large amount of data from Alexa Internet . As part of web archiving, it saves so-called mementos , i. H. Snapshots of websites and Usenet posts.

From 1999 onwards, the aim was expanded to include a comprehensive, freely accessible library with the addition of the Prelinger Archives and later other collections. Today the Internet Archive comprises a collection of over ten million books and texts, almost three million videos and films, over three million audio files, 150,000 computer programs and more than one million image files. The Wayback Machine web archive now contains more than 431 billion web pages.

The data is stored on 20,000 hard drives in four data centers . A mirror server of the data from San Francisco is located in the Egyptian Bibliotheca Alexandrina . In August 2014, the collection reached a size of 18.5  petabytes .

The archive has been officially recognized as a library by the US state of California since the beginning of May 2007.

According to a statement on its website, since the elections on November 8, 2016 , the Internet archive plans to keep a permanently updated copy in Canada .


Web archive

The Wayback Machine (“Take Me Back”) is an online service with which you can call up the saved websites in different versions. The pages to be saved are selected via the Alexa Internet service . All URLs stored there are called up and archived regularly. An Internet resource that has not yet been saved can also be recorded manually by searching for the page and then confirming the recording (file contents, e.g. JPG images, are saved without prior request). The total volume was around 150 billion pages in November 2009 and grew to over 273 billion pages by October 2016.

In 2006, Archive-It was another web archive service for individual web archiving. Institutions and individuals have the opportunity to create digital backups of their collections and to determine the release of the data themselves. Archive-It has over 400 partners from 16 countries around the world, mainly from universities, state archives, museums and art libraries, public libraries and other public institutions and NGOs . Archive-It offers participating partners a full-text search for their content, but also the option of exporting structured data sets enriched with metadata for researchers.

The inclusion of the Prelinger Archives in 1999 was the first expansion of the Internet Archive beyond web archiving. Today it contains a good three million videos and films that are under free license or public domain. An archive for television programs is also being worked on here .

Text archive

In the Million Book Project will be through the Internet Archive books that by the expiry of the copyright (US copyright law ) or otherwise in the public domain have become digitized and made available for download. The digital copies are part of the Open Library . More than ten million books and texts have now been archived.

There are several scan centers (twelve in 2009), for example in Richmond . Scanning is carried out per order, ten US cents per page (as of 2009). The clients, mostly libraries, receive the digitized material, a text file generated by OCR , a persistent Internet address and the option to save the digitalized material on the association's servers. There are also cooperation agreements with self-digitizing libraries for individual services such as OCR and redundant hosting.

Software archive

The Library of Congress granted six exceptions to the Digital Millennium Copyright Act in December 2006 . The Internet Archive may therefore save computer software or games that have become abandonware with the intention of preserving them if the original hardware, formats or technology are out of date. In 2013, the Internet Archive began offering classic games as playable web browsers - streaming via MESS emulation, e.g. B. the Atari-2600 video game ET the Extra-Terrestrial . From December 23, 2014, thousands of classic DOS computer games will be presented for teaching and research purposes using DOSBox emulation in the browser .

Internet Archive in San Francisco (1996–2009)
New headquarters of the Internet Archive since November 2009 in a former " Christian Science " church
Internet Archive in the Bibliotheca Alexandrina . Behind the glass panes are the racks with the archive computers.
Video of a demonstration of Internet Archive digitization technology by Brewster Kahle, March 29, 2013.

Audio archive

The audio archive has contained over three million sound recordings since 2017 . These range from radio broadcasts and radio features to audio books , poetry readings , live concert recordings and music that has been uploaded by users. The archive can also be used to publish podcasts .

Image archive

More than 1¼ million image files are already available in the image archive. These are images of works of art ; there is a collection of images from the Metropolitan Museum of Art with over 100,000 entries, images of historical maps, astronomical recordings from NASA , record covers and also freely available recordings from private individuals .

Book archive - digital scans of cited books

In order to improve the credibility of quoted quotations from books on Wikipedia , there has been a cooperation between Wikipedia and the Internet Archive since 2019 . A start was made on adding digital scans of the cited books to the references in Wikipedia articles. The passage in question is shown on two pages. An example of this is quote number 104 (as of November 14, 2019) in the English-language article about Martin Luther King .


The Internet Archive is financed by donations and grants from various foundations, institutes and associations in the fields of education, research, science, etc. In April 2019, the Internet Archive indicated the following donors: Andrew W. Mellon Foundation , Council on Library and Information Resources , United Nations Democracy Fund , Federal Communications Commission Universal Service Program for Schools and Libraries (E-Rate) , Institute of Museum and Library Services (IMLS) , Knight Foundation , Laura and John Arnold Foundation , National Endowment for the Humanities (Office of Digital Humanities) , National Science Foundation , The Peter and Carmen Lucia Buck Foundation , The Philadelphia Foundation , Rita Allen Foundation .

See also


Web links

Commons : Internet Archive  - collection of images, videos and audio files

Individual evidence

  1. Internet Archive: Contact
  2. Internet Archive, Prelinger Archives, and Project Gutenberg Literary Archive Foundation: FILED ON BEHALF OF PETITIONERS - INTEREST OF AMICI CURIAE
  3. About the Internet Archive accessed on October 29, 2016.
  4. Internet Archive: Wayback Machine. Retrieved May 6, 2020 .
  6. ^ Internet Archive forum: Internet Archive officially a library
  7. ^ Brewster Kahle: Help Us Keep the Archive Free, Accessible, and Reader Private. In: Internet Archive Blogs. November 29, 2016, accessed April 21, 2017 .
  8. ^ Archive-It About Us
  9. Prelinger Archives: Thousands Of Old Films To Watch, Remix & Use In Your Own Projects, accessed on October 29, 2016.
  10. ^ The digital Alexandria. In: The time. 4/2008.
  11. Internet Archive wins copyright reprieve. In: The Register. December 1, 2006
  12. Ross Miller: US Copyright Office grants abandonware rights ( English ) Retrieved February 7, 2013.
  13. ^ Tilman Baumgärtel: Timothy Leary, the games developer - How do you get historical computer games? The Internet archive streams dozens of classics, and Timothy Leary prepares games for research in New York . In: The time . November 14, 2013. Accessed on November 14, 2013: "Because the Internet Archive 'streams' the games, i.e. you don't load them onto your own computer, you don't violate copyright law when you use the programs."
  14. Adi Robertson: The Internet Archive puts Atari games and obsolete software directly in your browser ( English ) In: The Verge . October 25, 2013. Retrieved October 29, 2013.
  15. Internet Archive's Terms of Use, Privacy Policy, and Copyright Policy ( English ) December 31, 2014. Retrieved January 8, 2015: "Access to the Archive's Collections is provided at no cost to you and is granted for scholarship and research purposes only."
  16. Abby Ohlheiser: You can now play nearly 2,400 MS-DOS video games in your browser ( English ) In: The Washington Post . January 5, 2015. Retrieved January 8, 2015.
  17. Each New Boot a Miracle by Jason Scott (December 23, 2014)
  18. collection: softwarelibrary_msdos in the Internet Archive (December 29, 2014)
  19. Kris Graft: Saving video game history begins right now . Gamasutra. March 5, 2015. Retrieved March 5, 2015.
  20. How to Host Podcast MP3 on . In: TurboFuture . ( [accessed August 4, 2017]).
  21. ^ Süddeutsche Zeitung: Photo evidence. Retrieved November 14, 2019 .
  22. ^ Drew Hansen, The Dream: Martin Luther King Jr. and the Speech that Inspired a Nation . HarperCollins, 2005, ISBN 978-0-06-008477-6 , p.  98 .
  23. ^ About the Internet Archive. In: Homepage. Internet Archive, 2019, accessed April 23, 2019 .

Coordinates: 37 ° 46 ′ 56.3 "  N , 122 ° 28 ′ 17.6"  W.