Wikidata

from Wikipedia, the free encyclopedia
Globe icon of the infobox
Wikidata
Website logo
The free knowledge database that anyone can edit
Wiki project to centralize data and facts
languages multilingual
operator Wikimedia Foundation
Registration optional
On-line October 29, 2012
www.wikidata.org

Wikidata is the name of a freely editable knowledge database which, among other things, aims to support Wikipedia . The project was started by Wikimedia Germany and provides certain data types for Wikimedia projects as a common source, for example dates of birth or other generally applicable data that can be used in all articles of the Wikimedia projects.

Since the start of 2012, Wikidata has seen a comparatively strong growth in content pages; there are now over 88.5 million data objects (as of August 2020).

The information in Wikidata is under the Creative Commons license CC0 1.0 Universal (CC0 1.0) Public Domain Dedication . This means that the data can be used freely without specifying an author. All other namespaces are under the Creative Commons Attribution / Share-Alike license (CC BY-SA 3.0).

construction

Structure of a Wikidata statement

The main namespace of Wikidata is a collection of objects (denoted by the prefix “Q” and a number immediately following it). The objects in turn consist of any number of statements and assertions (denoted by the prefix “P” for English property and a number). These properties can be further specified with qualifiers (also identified with the prefix “P”). Claims supported by sources are called statements . In the course of the integration of the Wikipedia sister project Wiktionary , which is developing a multilingual dictionary, it was possible to create lexemes directly as data objects in Wikidata.

Like all projects of the Wikimedia Foundation, Wikidata is operated on the basis of MediaWiki , which had to be expanded to include components called Wikibase in order to be able to handle structured data. Wikibase consists of the Wikibase repository, in which the data is stored and managed, and the associated Wikibase client, which can be used to access the repository.

Wikidata enables, for example, authority data to be maintained in a central location, such as the Library of Congress Authorities (LCAuth), which are used by Wikimedia projects, in particular Wikipedia . In the same way, data on program versions can be edited once in Wikidata instead of individually on many projects, which means a considerable reduction in workload, especially for smaller projects.

history

Development of the data object numbers on Wikidata
Example of a list of US presidents with spouses shown as a graph.
Wikidata query service
Wikidata query service with SPARQL example at query.Wikidata.org

Wikidata was the first new project of the Wikimedia Foundation since 2006. The Wikidata project was first proposed in September 2004 by Erik Möller . Shortly thereafter, Magnus Manske implemented a first prototype that demonstrated the basic feasibility of the project. The actual implementation finally began in April 2012 and was financed by donations from Paul Allen's Allen Institute for Artificial Intelligence , the Gordon and Betty Moore Foundation and Google Inc. with a total of 1.3 million euros . Funding through donations officially ran until spring 2013; Wikimedia Germany planned to find follow-up financing for the period thereafter .

The Semantic MediaWiki by Markus Krötzsch and Denny Vrandečić , who both work on Wikidata (Vrandečić was Wikidata project manager at Wikimedia Germany in 2012), had a major influence on the development of Wikidata .

The one millionth data object was entered on December 15, 2012.

The project plan , which has now been completed , provided for three phases:

  • Phase 1 began on October 30, 2012 with the official launch of Wikidata and its approval for editing. First, the language links stored locally in the Wikitext in the individual Wikipedias were merged centrally in Wikidata into data objects (English items ) and given names and descriptions. Automated computer scripts (also known as bots ) were primarily used for these tasks . Initially only language links could be inserted. From January 14, 2013, these were available in the Hungarian-language Wikipedia, on January 30, 2013 they were activated in the Hebrew and Italian and on February 13 in the English-language Wikipedia and since March 6, 2013 they have been available in all language versions.
  • On February 4, 2013, "Phase 2" went into operation with a limited range of functions. Here are statements added to data objects (as Marie Curie : birthWarsaw ). This information was partly automatically transferred from Wikipedia info boxes and categories. The aim is to store information on info boxes in the Wikipedia language versions centrally in Wikidata and to use this in Wikipedia if necessary. On March 27, 2013, support for phase 2 was activated in the first eleven Wikipedia language versions. On April 23, the rest of the Wikipedias followed suit, including the German-language Wikipedia .
  • Since phase 3 it has been possible to automatically create lists with data from Wikidata. In addition to the manually created and maintained tables and lists within Wikipedia, there have since been numerous options for automatically creating lists.

In 2013, IBM donated the prize money of the Feigenbaum Prize 2013 for the Watson project to the Wikimedia Foundation, specifically Wikidata, because Wikipedia had made a major contribution to the success of the project and Wikidata had set itself the goal of making people and machines lighter To provide access to knowledge.

At the end of 2014, Google announced that it would close its own fact database Freebase in 2015 in favor of the Wikidata project. In order to be able to transfer data to Wikidata more easily, an import tool was provided. By mid-2019, however, of around 10 million data records, only around 528,000 or less than five percent had been transferred to Wikidata.

The official query service that enables SPARQL queries to be carried out has been made available since October 2016 .

distribution

A survey carried out by the library service provider OCLC in 2018 showed that Wikidata ranked fifth among the linked data projects from which data was incorporated into its own offerings with 41% of respondents - ahead of WorldCat and ISNI , but clearly after the Library of Congress and VIAF . Compared to 2015, Wikidata has become more important, especially for libraries, museums and archives.

In May 2019 it was announced that the Library of Congress would integrate data from Wikidata into its authority data.

criticism

In the early days of Wikidata in particular, it was criticized that a license without copyleft was used. The quality of the Wikidata database was also repeatedly criticized. A comparative study of the data quality from DBpedia , Freebase, OpenCyc , Wikidata, and YAGO came to the result in November 2017 that none of the knowledge graphs mentioned are suitable for all conceivable purposes.

Others

In July 2015, bachelor students from the Hasso Plattner Institute at the University of Potsdam presented a project aimed at "ensuring the completeness and correctness of the data from the rapidly growing database". For consistency check be consulted should include the data of the German National Library and the Internet Movie Database .
The logo comes from Arun Ganesh and was selected in a competition. The stylized barcode also depicts the word "Wiki" in Morse code and in the colors of the Wikimedia projects.

literature

Web links

Commons : Wikidata  - collection of images, videos and audio files

Individual evidence

  1. ^ Special: Statistics. In: Wikidata. Retrieved August 23, 2019 .
  2. Wikidata: Database Download. In: wikidata.org. Retrieved September 9, 2016 .
  3. ^ Wikidata: Glossary. In: www.wikidata.org. Retrieved September 9, 2016 .
  4. Wikibase - Home. Retrieved December 3, 2018 .
  5. ^ Matthew Roth: The Wikipedia data revolution. In: Wikimedia Blog. March 30, 2012, accessed September 9, 2016 .
  6. ^ Erik Möller : The secret media revolution - How weblogs, wikis and free software change the world . 1st edition. Heise Zeitschriften Verlag, Hannover 2005, ISBN 3-936931-16-X , p. 189, 196 ff . ( medienrevolution.dpunkt.de [PDF; 3.0 MB ]).
  7. Boonsri Dickinson: Paul Allen Invests In A Massive Project To Make Wikipedia Better. In: Business Insider . March 30, 2012, accessed September 9, 2016 .
  8. ^ Sarah Perez: Wikipedia's Next Big Thing: Wikidata, A Machine-Readable, User-Editable Database Funded By Google, Paul Allen And Others. In: TechCrunch . March 30, 2012, accessed September 9, 2016 .
  9. Torsten Kleinz: Wikidata: data fund for Wikipedia opened. In: heise online . October 31, 2012, accessed September 9, 2016 .
  10. Lydia Pintscher, Das Wikidata-Team , blog wikimedia.de, April 4, 2012
  11. ^ Lydia Pintscher: First steps from Wikidata in the Hungarian Wikipedia. In: Wikimedia Germany Blog. January 14, 2013, accessed January 14, 2013 .
  12. Lydia Pintscher: Wikidata is coming to the next two Wikipedia. In: Wikimedia Germany Blog. January 30, 2013, accessed January 31, 2013 .
  13. ^ Lydia Pintscher: Wikidata on the English Wikipedia. In: Wikimedia Germany Blog. February 13, 2013, accessed March 7, 2013 .
  14. Lydia Pintscher: Wikidata now on all Wikipedia. In: Wikimedia Germany Blog. March 6, 2013, accessed March 7, 2013 .
  15. ^ Lydia Pintscher: Wikidata around the world. Wikimedia Germany , April 24, 2013, accessed April 28, 2013 .
  16. ^ Matthew Roth: IBM Research donates AAAI Feigenbaum Prize for Watson to the Wikimedia Foundation. In: Wikimedia Foundation Blog. July 16, 2013, accessed September 9, 2016 .
  17. The Knowledge Graph team at Google: When we publicly launched Freebase back in 2007, we thought of it as a… In: Google+ . December 16, 2014, accessed September 9, 2016 .
  18. ^ Wikidata: Primary sources tool. Retrieved February 3, 2017 .
  19. Thomas Pellissier Tanon, Denny Vrandečić, Sebastian Schaffert, Thomas Steiner, Lydia Pintscher: From Freebase to Wikidata: The Great Migration . In: Proceedings of the 25th International Conference on World Wide Web - WWW '16 . ACM Press, Montréal, Québec, Canada 2016, ISBN 978-1-4503-4143-1 , pp. 1419–1428 , doi : 10.1145 / 2872427.2874809 ( acm.org [accessed on September 27, 2019] also freely available .).
  20. Marco Fossati: (Wikidata) Google's stake in Wikidata and Wikipedia. In: Wikidata-l mailing list. September 27, 2019, accessed on September 27, 2019 .
  21. ^ Sebastian Hellmann: (Wikidata) Google's stake in Wikidata and Wikipedia. In: Wikidata-l mailing list. September 27, 2019, accessed on September 27, 2019 .
  22. ^ Wikidata Query Service. Retrieved November 5, 2018 .
  23. ^ Karen Smith-Yoshimura: Analysis of 2018 International Linked Data Survey for Implementers . In: Code4Lib Journal . tape 42 , November 8, 2018, ISSN  1940-5758 ( code4lib.org ).
  24. ^ Meghan Ferriter, Matt Miller: Integrating Wikidata at the Library of Congress. In: The Signal. Library of Congress, May 22, 2019, accessed June 9, 2019 .
  25. Alexrk2: Is CC the right license for data? In: meta.wikimedia.org. April 1, 2012, accessed September 9, 2016 .
  26. Andreas Kolbe: Unsourced, unreliable, and in your face forever: Wikidata, the future of online nonsense , The Register , December 8, 2015.
  27. Michael Färber, Frederic Bartscherer, Carsten Menne, Achim Rettinger: Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO . In: Semantic Web . tape 9 , no. 1 , November 30, 2017, p. 77–129 , doi : 10.3233 / SW-170275 ( medra.org [accessed on September 27, 2019] there is a freely accessible version .).
  28. HPI students ensure quality assurance in Wikidata. Hasso Plattner Institute , December 7, 2015, accessed on September 9, 2016 .
  29. ^ Felix Naumann, Anja Jentzsch: Wikidata Quality. Hasso Plattner Institute , August 24, 2016, accessed on September 9, 2016 .
  30. https://blog.wikimedia.de/2012/07/13/und-der-gewinner-ist/