Google Books

from Wikipedia, the free encyclopedia

Google Books (German: Google books ) is the largest private collection of retro-digitized books . It is owned by the US company Google LLC . The collection can be viewed in excerpts. According to their own information, their goal is to make the knowledge written down in books available primarily through digitization for a full-text search . In 2019, to mark the 15th anniversary of Google Books, the company announced that it had scans of over 40 million books in more than 400 languages.

Temporarily empty bookshelves at the University of Michigan : "Digitization in progress" for Google Books, 2008

description

Google Books draws on two sources:

  • Google Print in the narrower sense, the - not further controversial - cooperation project with publishers, and
  • Google Library , in which books from large academic libraries are scanned en masse without the prior consent of the rights holder, which was and is legally controversial.

history

In October 2004, Google Print presented itself at the Frankfurt Book Fair (press conference with Google founders Sergey Brin and Larry Page ). In December 2004, search results from scanned books began to appear in the results lists of the English search interface Google.com. Google has set out to scan 15 million books by 2015. That corresponds to around 4.5 billion pages. Since April 2005 there has been a separate search for the contents of the program. In October 2005, German and other language user interfaces were presented at the Frankfurt Book Fair. On November 4, 2005, the search page was officially presented with an extended search (query by time period is possible). On November 17, 2005, Google announced the renaming of the service in the company's own weblog. Since then, inquiries from print.google.com have been forwarded to books.google.com.

In September 2008, Google announced that it would digitize newspapers together with North American newspaper publishers. The digitized version should be searchable and navigable with the web browser and appear like in the print edition with the photographs, headlines and advertisements.

There is now a cooperation with Internet Archive on a number of books . There are editions there in various formats, for the PDF reference is made to Google (where it is not available for non-US users for works after 1864, see controversies ).

In 2009 and 2012, the data records for the Ngram Viewer were created in various languages from the corpus of Google Books .

Cooperation with the publishers

Google receives books from publishers or receives PDF files from them. The books are scanned and included in the index as e-texts using OCR . Users can only see a comparatively few pages of the individual book. After a few pages, only registered users (free of charge) can see a number of other pages. A number of pages are blocked from the start. After the daily quota has been exhausted, no further pages can be viewed. As a rule, the table of contents is freely accessible, and often also the index.

Google tries to protect the content with a kind of copy protection (" digital rights management "). The fact that this is not always applied in full, however, can be easily understood from various specialist books. Pages viewed can even be read from the browser cache using certain methods after being displayed in the web browser and can be merged into a PDF file using the appropriate tools.

Cooperation with libraries

Google Books note in Michigan University Library, 2007.

Since around 2005, Google has been scanning the entire inventory of the University of Michigan library (over 7 million volumes) as well as large parts of the US university libraries of Harvard University and Stanford University , the New York Public Library and, in Europe, the Bodleian Library of the University of Oxford . The libraries at the University of Virginia , the University of Wisconsin-Madison , Princeton University , the University of California, and the University of Texas at Austin also participate.

At the end of 2006, two other institutions joined the network of libraries that have books digitized by Google: the National Library of Catalonia ( Biblioteca de Catalunya ) in Barcelona and the library of the Universidad Complutense Madrid .

On March 6, 2007, the Bavarian State Library in Munich announced that it would be the first German library to cooperate with the project. Around one million copyright-free works from historical holdings and special collections are now to be digitized. The only exceptions to the digitization project are the manuscripts and incunabula collections as well as rare and particularly valuable historical prints. In January 2014, the Regensburg State Library , which is subordinate to the Bavarian State Library, announced that it would be digitizing its copyright-free holdings together with Google. By the end of 2014, 70,000 books from the Regensburg library should be online.

In July 2008, the Lyon City Library announced that it was the first French library to have its books digitized.

On June 15, 2010 the Austrian National Library (ÖNB) announced that Google is digitizing its copyright-free book inventory. The cost of digitizing the around 400,000 books is around 30 million euros and is borne by Google. ÖNB General Director Johanna Rachinger described this project as one of the largest public-private partnerships in the Austrian cultural landscape. 400,000 volumes from the 16th to the 19th century (with the exception of those books for which conservation concerns speak against it) are to be recorded in full text - around 120 million book pages will then be available online and free of charge.

Fierce criticism from authors and publishers led Google to suspend scanning of copyrighted books until November 2005. By this time, the rights holders should indicate which books they do not want to have made accessible ( opt-out solution). While Google invokes the fair use of US law and is supported by renowned lawyers, publishers and authors' associations demand that no book be placed in the program without consent ( opt-in ). In October 2005, lawsuits were filed against Google by authors and publishers in the United States.

Application in research

An article published in Science in December 2010 reported on the possibilities of using Google Books for the quantitative analysis of culture ( Culturomics ). About 4% of all books that were ever printed were available to scientists for their analyzes. They converted the books into a massive database of the words in the books ( n-grams ). The approach can be used for research in various fields such as lexicography , evolution of grammar , collective memory , technology adoption , fame , censorship or historical epidemiology . The research team estimated z. B. on the basis of the database that the size of the English vocabulary has almost doubled within the last century. Another study compared Sigmund Freud's cultural influence with that of Charles Darwin . Freud therefore lost influence; Darwin overtook Freud in 2005.

criticism

Problems with the selection of digital copies

The historian Jean-Noël Jeanneney - former director of the French National Library who runs a free European digitization project with Gallica - argues that Europe should set up an alternative to the Google digitization project. Above all, he criticizes Google for the hegemony of English and the accumulation effect (he calls it the “eye-catching method”, the term “ranking” is common, see: PageRank ), which leads to the fight for the reader's attention intentional concentration on the list leader takes place. The stronger provider always becomes stronger at the expense of the weaker. This makes Google particularly important for advertising. Jeanneney would like to oppose this “capitalist” Google principle with a model in which the state has the say in matters of cultural memory. 19 national and university libraries in Europe have signed the French national library's appeal to prevent the US from imminent intellectual and cultural hegemony.

The problem that Google Books, with its market dominance, is obscuring alternatives through its selection practice, is also seen in Germany, especially when researching specialist areas such as local history or dialect research.

Problems with the search functions

There is no assignment of systematic subject groups and keywords to the books as in library catalogs . It is not possible to select books from a specific subject. Google assumes that it is sufficient for the thematic search to include all the words in the books. Entering a keyword can only deliver results in the language used. It is not taken into account that searches are often carried out across languages ​​and that a word can be used in several subject areas and can have different meanings.

Text recognition problems

Digitized from Google. Visible fingers of the staff on the scanner.

In 2007, Der Spiegel criticized the often poor OCR quality and the poor metadata .

There are cases where even the author's name was incorrectly recognized by the OCR, so that the work cannot be found under the author's name. The visible quality of the pages is repeatedly criticized. This concerns the points missing text passages and visible fingers of the staff at the scanner.

copyright

Google Books has come into the spotlight in particular, as not only works that are free of copyright, but also works that are protected by copyright are included in the project. The "Google Book Settlement" is a settlement proposal that Google Inc. has drawn up in response to a class action lawsuit brought against them by US publishers and authors. The objection period for publishers and authors ("non-participation period") was extended from May 5, 2009 to September 4, 2009.

In September 2011 it became known:

“In a surprise move, authors' groups slammed their one-time university partners with a lawsuit demanding that the schools surrender digital collections and stop working with Google (NSDQ: Wollen). The lawsuit opens a new phase in the fight over digital libraries and comes the same week that Google's controversial books settlement is expected to die in court. "

“In a surprising process, authors' associations missed their former allies, the universities, with a legal action in which they demand that the universities give up the digital book collections and stop working with Google. The lawsuit opens a new round in the battle for digital libraries and comes in the same week that the controversial 'Google book settlement' is expected to be put to an end in court. "

In November 2013, in the copyright proceedings of the American Authors Guild against Google, the application for a jury trial was rejected and at the same time it was stated that Google Books was basically covered by the “ fair use ” principle. This judgment was confirmed in the second instance in October 2015.

In the USA, for example, the American Society of Journalists and Authors criticized the deal as an internal trade in favor of those involved. Members of the competition project of the Open Content Alliance also criticize Google's approach, which does not take copyright into account .

Criticism from Germany

The verdict also affects non-American publishers and authors, as Google can be reached worldwide via the Internet. Google could then make any work by German-speaking authors who have not lodged an objection in the USA (deadline September 4, 2009) available for viewing in digitized form on its platform, without the authors being able to object to this. The collecting society Wort (VG Wort) has drawn up its own proposed regulation for the German book market . On the one hand, VG Wort is criticizing and suing aspects of this possible settlement in an American court. On the other hand, VG Wort is working with Google on the planned implementation of the agreement.

In the Heidelberg appeal, writers, publishers and scientists in Germany are calling for copyright protection against its erosion. In the manifesto, two things are connected: the criticism of the Google book digitization with a criticism of the open access policy in general. This has fragmented the critics of the rapidly advancing Google digitization project. The Heidelberg Appeal sees a major problem in comparison. In the Frankfurter Allgemeine Zeitung the suspicion of a “coupon settlement” is suggested, in which self-appointed plaintiffs' lawyers negotiate an “agreement” with Google in order to achieve a lavish fee and a market-dominating position for Google.

On September 1, 2009, the federal government criticized the proposed agreement. She demanded that at least a separate class should be created for German rights holders and that they should be excluded from the blanket agreement. In addition, Google's copyright infringement and the “first do, then ask” behavior hinder projects such as the European online library Europeana , which safeguards author's rights in advance.

Criticism from the EU Commission

On the occasion of an expert hearing by the EU Commission on September 7th, 2009, Google announced that it would respond to the concerns of publishers and authors and involve their representatives in the supervision of the Google Books project. Books that are copyrighted and deliverable in Europe should not be scanned and made available online without express permission. At the same time, the EU Commission announced that it wanted to change copyright law, as only the USA would benefit from the advantages of digitization and online marketing due to the legal situation.

other projects

  • The Open Content Alliance u. a. with competitor Yahoo , the Internet Archives and the University of California has cataloged and digitized books on a large scale as part of the Open Library , as free content and under strict copyright protection.
  • The competitor and possible partner of Google, the Internet mail order company Amazon , offers scanned books in full text, but focuses on books currently available in stores, and wants to promote the sale of printed books. With its “Search Inside the Book” search, Amazon makes the front page, blurb, table of contents, index and two pages in front of and behind a search result available.
  • The search engine A9.com , Amazon's own search engine that links Microsoft's search technology with Amazon's “Search Inside the Book”, shows on the basis of a still small, English-language catalog, how you can search across scanned books, images and websites.
  • MINERVA , a European project to coordinate the digitization of European cultural assets.
  • Wikisource , a free online project for the collection and edition of texts that are either free of copyright ( public domain ) or are under a free license .
  • Project Gutenberg , is a volunteer digital library accessible via the Internet. Started in 1971, it is the oldest digital library in the world.
  • Gallica , the French National Library's digitization project.
  • European Library , a European initiative that aims to improve access to the digitized works of the member countries.
  • Large publishers such as Random House (Bertelsmann) are beginning to digitize their book collections and make them searchable for search engines. Random House put parts of its book inventory (5000 titles, more to follow) online in February 2007. With Insight , the company enables its customers to search a set number of pages per title.
  • Another online library, Zeno.org, has been available since October 2007 .
  • Libreka - the German answer of the Börsenverein des Deutschen Buchhandels to Google Book Search

literature

Web links

Commons : Google Books  - collection of pictures, videos and audio files

Individual evidence

  1. ^ 15 years of Google Books. In: blog.google. October 17, 2019, accessed April 12, 2020 .
  2. Google Books: US court slows Google's book plans. zeit online, March 23, 2011, accessed May 17, 2012 .
  3. Google founders present Google Print in Frankfurt. heise online, October 8, 2004, accessed on May 17, 2012 .
  4. Google Books: Richter puts disputing parties under pressure. zeit Online, April 9, 2010, accessed May 17, 2012 .
  5. ^ Jen Grant, Product Marketing Manager : Judging Book Search by its cover. Official Google Blog, November 17, 2005, archived from the original on November 8, 2017 ; accessed on November 20, 2018 (English).
  6. ^ Punit Soni: Bringing history online, one newspaper at a time . googleblog.blogspot.com September 8, 2008
  7. For example, from the book "Active Directory for Windows Server 2008" by Addison-Wesley, ISBN 978-3-8273-2740-6 , almost the entire content of the book is open on the Internet on some days
  8. ^ Jens Redmer: The Bavarian State Library becomes the largest non-English library partner. In: Inside Google Book Search. March 6, 2017, accessed March 20, 2017 .
  9. ^ Klaus Ceynowa: THE "BSB-GOOGLE-DEAL". (PDF; 3.5 MB) One million books from the Bayerische Staatsbibliothek online. In: Library Magazine. News from the State Libraries in Berlin and Munich No. 1/2008. Pp. 4-8 , archived from the original on November 21, 2010 ; Retrieved May 17, 2012 .
  10. ^ State Library Regensburg: Events & Reports ( Memento from November 10, 2014 in the Internet Archive )
  11. Google digitizes the Lyon City Library , derstandard.at, July 13, 2008
  12. APA: Google digitizes the national library. Archived from the original on December 16, 2013 ; Retrieved February 13, 2012 .
  13. ^ John Bohannon: Google Opens Books to New Cultural Studies (PDF; 184 kB). In: Science, Volume 330, December 17, 2010. p. 1600.
  14. Jean-Baptiste Michel, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, Erez Lieberman Aiden: Quantitative Analysis of Culture Using Millions of Digitized Books . In: Science, Vol. 331, December 16, 2010. pp. 176-182.
  15. Peter Bürger: Grim about Google Books . Telepolis , January 6, 2012.
  16. a b For examples see: Searching for books with Google . In: Grüner Anzeiger , November 12, 2009, p. 30.
  17. Malte Herwig: The emptied library . In: Der Spiegel, 12/2007, p. 186 f.
  18. criticism on blog side of VÖBB (Association of Austrian Librarians) ; Retrieved February 10, 2013
  19. a b Google Book Settlement. Google, archived from the original on February 10, 2013 ; accessed on March 20, 2017 (English).
  20. Hearing on Google settlement postponed, waiting for Settlement 2.0. Archived from the original on March 19, 2013 ; Retrieved May 17, 2012 .
  21. a b ASJA calls amended Google settlement “fundamentally unfair to writers”. American Society of Journalists and Authors, February 18, 2010, archived from the original February 25, 2012 ; accessed on March 19, 2017 (English).
  22. ^ Jeff Roberts: Authors To Universities: Give Up Your Google Books - paidContent. September 12, 2011, accessed March 20, 2017 .
  23. heise.de : Google Books declared legal after eight years in court. November 15, 2013. Retrieved November 15, 2013 .
  24. heise.de : Court of appeal confirms: Google Books is legal in the USA. October 16, 2015, accessed October 16, 2015 .
  25. Google Book Settlement It is time for the federal government to intervene - FAZ. Retrieved May 17, 2012 .
  26. Google book search federal government objects - Spiegel Online. Retrieved May 17, 2012 .
  27. Concessions to European publishers? Focus Online, September 7, 2009
  28. EU Commission wants to amend copyright Handelsblatt, September 7, 2009
  29. Amazon "Search Inside the Book"
  30. Stuart Applebaum: Insight, newly launched digital search & browsing service to offer 5,000-plus Random House, Inc. US titles (PDF file; 31 kB), press release, February 27, 2007
  31. ^ Insight Web Service , Random House