Wiktionary

from Wikipedia, the free encyclopedia
Wiktionary
The logo of the German-language Wiktionary
motto a wiki-based free dictionary
description Wiki project for the creation of a free dictionary and thesaurus in every language
Registration optional
languages 174, including German
owner Wikimedia Foundation
Originator individual registered and non-registered authors
Published English version: Dec. 12, 2002

German version: May 1, 200400

The Wiktionary [ ˈvɪkʃəˌnɛʀi ] ( wiki dictionary) is a joint project of Wikimedia to create a freely accessible, complete and multilingual dictionary and a corresponding thesaurus in every language. The name Wiktionary is a suitcase word formed analogously to Wikipedia , which is composed of the words wiki ( Hawaiian for "fast") and dictionary ( English for "dictionary"). Since it was founded in 2002, the number of entries in Wiktionarys has grown internationally to over 29.6 million (as of September 8, 2018).

Concept and project

Cross-lingual entry page of the Wiktionary

Wiktionary is the lexical (lexicographic) partner of the free online encyclopedia Wikipedia . Analogous to the different language editions of Wikipedia, there are Wiktionarys in different languages. While the Wikipedia as an encyclopedia provides factual information on the terms, the Wiktionary takes on the functions of a language dictionary and a thesaurus ; it thus serves to impart linguistic knowledge. In its function as a language dictionary, it is intended to explain linguistic properties such as homonyms , meaning, grammar, etymology and translations; the choice of lemmas should cover the vocabulary of the languages. In its function as a (linguistic) thesaurus, terms associated with the word entry such as synonyms and generic terms are compiled. Like the Wikipedias, the Wiktionarys are constantly being expanded and improved; everyone can collaborate at any time.

In contrast to the usual printed, mostly bilingual language dictionaries, the concept of Wiktionarys is laid out so openly that lemmas for entries in "all" languages ​​are desired in each individual language version. In a theoretical final stage, each language wiktionary would have the vocabulary of "all" foreign languages, explained in the basic language of the respective wiktionary as well as all entries from the vocabulary of the own language including translations into all foreign languages.

According to a statement at the Wikimedia project Meta , the idea to create the Wiktionary was first put forward in 2002 by Daniel Alston (with the username Fonzy). According to another opinion, Larry Sanger had the idea in April 2001.

The English version was started as the first Wiktionary on December 12, 2002. The German version, the wiki dictionary , started on May 1, 2004. The 100,000th entry in this project was created on December 31, 2009.

In many Wiktionarys, separate entries are generated for different upper and lower case with the same sequence of letters (example: bank and bank in the German wiki dictionary); this procedure differs from that of the Wikipedia, which bundles different case-sensitive variants in one entry. According to the Wikimedia project Meta, the possibility of generating different entries for upper and lower case variants was only created in 2006. A large number of Wiktionarys make use of this option today, but this is not the case with some language versions.

All entries in the Wiktionarys are given a fixed format template. The German wiki dictionary - in contrast to the English, for example - does not categorize articles according to the meaning of the terms explained (for example there are no categories "biology" or "medicine"), but mainly according to their belonging to a part of speech and a language. In order to provide a certain overview of the coverage of certain subject areas from the article inventory, manually maintained lists are kept.

Audio files can be integrated into Wiktionarys to demonstrate pronunciation. Over 370,000 audio and pronunciation examples are linked to the German wiki dictionary, including over 345,000 German audio samples (as of June 11, 2020). By selective use only suitable for this entry - - In addition, the Wikimedia Foundation is to Bildwörterbuch expanded; As of March 2020, illustrations are included in over 24,000 entries.

Some Wiktionarys contain a rhyming dictionary - including the German one; rhyming words are collected here for a good 16,000 word endings.

License

The contents of the Wiktionarys are licensed under the GNU License for Free Documentation (GFDL) and, since June 2009, also under the Creative Commons license “CC-BY-SA 3.0 Unported”. To switch to the double license, the GFDL first had to be updated from version 1.2 to a new version 1.3, which was possible due to the structure of version 1.2 without the consent of the authorship. For more information on converting to the dual license, see GNU license for free documentation # Use in Wikipedia .

Overview

After the English-language Wiktionary was launched on December 12, 2002, the French and Polish versions followed on March 29, 2004. On May 1, 2004, a Wiktionary was started for every additional language in which a Wikipedia version was already operated. In one day 143 more Wiktionarys were created, including the German version.

On August 24, 2017, over 27.1 million entries were available in all 172 language versions. Most of them, around 5.3 million, are provided in the English language version, followed by the Malagasy language with around 4.0 million and the French language with around 3.2 million entries. These largest Wiktionarys have changed their places in the order of the number of entries several times; for the first time in early 2006, the French-language version was the one with the most entries. Since mid-2010, however, the English language has again been the one with the most articles. The 42 largest Wiktionarys currently provide more than 100,000 entries each. The German-language Wiktionary ranks 8th with around 720,000 entries (as of September 8, 2018).

The following table summarizes the article numbers of the 15 largest Wiktionarys as of June 30, 2018:

No. Language abbreviation language Entries
1 en English 5,661,053
2 mg Malagasy 4,731,986
3 fr French 3,311,457
4th zh Chinese 1,239,534
5 sh Serbo-Croatian 914.739
6th ru Russian 906.877
7th it Spanish 855.953
8th de German 700,422
9 nl Dutch 666.440
10 sv Swedish 636.782
11 ku Kurdish 633,562
12 pl Polish 620.367
13 according to Lithuanian 616.275
14th el Greek 461.517
15th it Italian 431,871

There are detailed statistics that also show the historical development. The number of entries in itself says little about the quality of a language version of the Wiktionary. The total number of entries also includes entries for inflected forms and alternative spellings. Statistics that classify the entries according to quality characteristics are currently only available to a limited extent. There are statistics that show that across all language versions over two thirds (67.5 percent) of all entries were created by (registered) bots. See also the chapter on growth spurts from bots .

Individual language versions of the Wiktionary

Growth curve of the German-language Wiktionary

German

The German-language Wiktionary or wiki dictionary was started on May 1, 2004. For the external presentation of the project, the two terms “Wiktionary” and “Wiki dictionary” coexist to this day. In the logo, the sequence of text passages reads " Wiki dictionary - Wiktionary - [ˈvɪkʃəˌnɛʀi], n - The free dictionary - a wiki-based free dictionary", with the top and bottom passages "Wiki dictionary" and "a wiki-based free dictionary" in gray Font are set on a light gray background.

Of the approximately 879,000 entries on February 25, 2020, around 655,000 entries were German, 54,000 English, 24,000 Czech, 18,000 Polish, 16,000 Italian, 13,000 Latin and 12,000 French in the breakdown by language of the words explained; the remaining entries were spread across over 200 other languages. The number of entries is put into perspective because there are also separate pages for inflected forms. On October 4, 2019, the 115,000. Basic form entry created, on September 18, 2019 the 825,000. Entry created. On September 12, 2019, the 15,000. German-language rhyme created.

English

The English-language Wiktionary was launched on December 12, 2002 as the first language version. Brion Vibber initially created the Wiktionary on a provisional URL wiktionary.wikipedia.org , before it was transferred to the URL that is still valid today on May 1, 2004. In November 2005, 100,000 entries were reached, in August 2007 there were 500,000. The 1 million entry threshold was exceeded on October 18, 2008. There were 2 million entries on September 7, 2010, 5 million on November 25, 2016, and the 6 millionth entry was on April 9, 2019.

Among the 6.2 million entries available on February 1, 2020, around 1,420,948 entries were Latin, 1,160,957 English, 899,031 Spanish, 705,382 Italian, 485,963 Russian, 457,985 French, 357,280 German and 347,085 Portuguese, broken down by language of the words explained , and the rest was spread across well over 1,500 other languages.

French

On March 29, 2004, the French-language Wiktionary or Wiktionnaire was launched. It now has over 2.3 million entries, making it the second most comprehensive language version of the Wiktionary after the English version. Of the 2.30 million entries available on January 7, 2013, approximately 1.2 million entries were French, 157,000 Russian, 156,000 Bulgarian, 121,000 English and 80,000 Slovenian, according to the breakdown by language of the words explained; the rest was spread over 900 other languages.

Growth curve of the Polish-language Wiktionary

Vietnamese

The Wiktionary in Vietnamese was founded in 2004. Of the now more than 230,000 entries, according to the breakdown by language of the words explained, around 110,000 were English, 44,000 French, 35,000 Russian and 31,000 Vietnamese; the rest was spread across 54 other languages.

Polish

The Polish-language Wiktionary was launched in March 2004. Of the approximately 236,000 entries available on August 20, 2011, approximately 35,000 entries were English, 30,000 Polish, 21,000 Chinese and 19,400 in the planned language Interlingua , according to the breakdown by language of the words explained ; the rest was spread across 264 other languages. The approximately 7,900 entries in Yiddish make the Polish Wiktionary the largest modern dictionary of the post-war period for Yiddish that was 'published' in Poland. Almost 6,000 of these entries were created in 2007 through a bot campaign (see the chapter on growth spurts from bots ).

Growth spurts from bots

Growth curve of the 8 largest language versions of the Wiktionary up to March 2008. The leaps in growth through the use of bots are clearly recognizable.

Most of the entries in the most extensive language versions of the Wiktionary were created through the use of bots . Their programmers found creative ways to generate large numbers of new entries or to machine-import thousands of entries from publicly available dictionaries.

Seven of the now more than 30 bots that are listed as such on the English-language Wiktionary have created around 163,000 new entries there. Websterbot imported 259 complex entries, which themselves contain many definitions, from publicly available sources; most of these imports have been manually split into thousands of entries. Another of these bots, ThirdPersBot , created verb subforms in the third person singular that are usually not listed individually in printed dictionaries. At the time of these campaigns in 2006, the English-language Wiktionary had around 137,000 entries without the 163,000 bot entries and was thus significantly smaller than many printed dictionaries: The Oxford English Dictionary, for example, has around 615,000 word entries and Merriam-Webster's Third New International Dictionary of the English Language, Unabridged 475,000 entries; many phrases are only present in the body of other entries.

The Wiktionarys in English and French have imported the approximately 20,000 entries of the Unihan database of CJK characters (Chinese, Japanese and Korean).

The rapid growth of the French-language Wiktionary in 2006 is mainly due to the work of bots, which took over many entries from old, license-free dictionaries such as the 8th edition of the Dictionnaire de l'Académie française from 1935 with around 35,000 word entries, as well as from bots who imported terms from other language editions of the Wiktionary with French translations. The French and Vietnamese-language Wiktionary have imported large parts of the Free Vietnamese Dictionary Project (FVDP). This offers freely accessible bilingual dictionaries from and into Vietnamese. After this campaign, the Vietnamese-language Wiktionary consisted almost exclusively of these imported entries.

With the help of the bot Tsca.bot, the Wiktionary in Polish imported around 15,000 stub entries in the planned language Interlingua from the website interlingua.filo.pl with the permission of the author between July 10 and November 27, 2004 . With the help of the same bot Tsca.bot , almost 6,000 entries were created in Yiddish between March 31 and April 2, 2007 , most of which contained the IPA pronunciation and the YIVO transcription in addition to the Polish translation .

The Russian-language Wiktionary took over around 80,000 stub entries (" boilerplates ") for English, German and French words from October 2006 with the help of the LXbot bot . As of June 2008, the TrudoBot bot created a large number of stub entries for Russian words.

meaning

Language is seen across all wiktionary.org of Alexa Internet in so-called "Alexa Traffic Rank" to rank 555 (as of August 24, 2017) performed. According to the language version, about 45% of the registered accesses were to the English, 15% to the Russian, 13% to the French and 6% to the German version; the rest was spread across the many other language versions.

rating

German Wiktionary

Entries in the German Wiktionary tend to be prescriptive, and explanations there are sometimes linguistically imprecise. The German Wiktionary is no competition for scientific lexicography.

English Wiktionary

In 2005, in the book The Internet and the Autonomous Learner: Free Teaching / Learning Offers on the Internet for the English-language version of the Wiktionary, it was criticized that the correctness for laypeople is not comprehensible, the entries are incomplete, information on the pronunciation of the words is mostly missing and audio - Examples only sometimes exist and translations into other languages ​​are often not available.

Ontology Learning and Knowledge Discovery Using the Web lists scientific studies on the English-language Wiktionary from 2006 to 2009. Compared to WordNet , it is positively emphasized that the English-language Wiktionary also allows entries of compositions , acronyms , abbreviations , misspellings and simplified spellings. A study by Navarro et al. from 2009 criticized incomplete, almost empty entries, the unequal weighting of languages ​​and the small number of synonyms recorded.

The book Electronic Lexicography , published in 2012 , found that the English Wiktionary does better than WordNet when it comes to listing the meaning of the entries according to the most common usage. The assignment of words according to certain characteristics such as subject area , linguistic variety , temporal and regional classification is rated better for the English-language version than in the German and Russian-language versions, because there more entries have been assigned to at least one characteristic than in the German and Russian-language variant was the case. It is emphasized that the majority of the assignments of the words according to characteristics relate to subject areas, from which the investigation concludes that the corresponding experts are involved. Wiktionary is seen as a competitor to expert-generated lexicons that opens up a wide variety of uses.

literature

  • Kai-Uwe Carstensen, Christian Ebert, Susanne Jekat, Cornelia Ebert, Hagen Langer, Ralf Klabunde (eds.): Computational Linguistics and Language Technology . An introduction. 3. Edition. Spektrum Akademischer Verlag, Heidelberg 2010, ISBN 978-3-8274-2023-7 (especially pp. 548-550).
  • A. Elia: Can a collaborative Wiki Weblish Dictionary Project help academic writing of ICT language learners? In: Isabel González-Pueyo, Carmen Foz Gil, Mercedes Jaime Siso, Marco Luzón, María José (eds.): Teaching Academic and Professional English Online . Peter Lang Publishing Group, 2009, ISBN 978-3-03911-582-2 .

Web links

Wiktionary: Wiktionary  - explanations of meanings, word origins, synonyms, translations
Commons : Wiktionary Statistics  - Collection of images, videos and audio files
Commons : Wiktionary-Logos  - album with pictures, videos and audio files

Wiktionary Wiktionary in German  - a free dictionary

Individual evidence

  1. wiki in Hawaiian Dictionaries
  2. a b Wiktionary , entry in the Wikimedia project Meta , last accessed on September 8, 2018.
  3. There are restrictions, e.g. B. The English wiki dictionary excludes numerous constructed languages , see: Wiktionary: Critieria for inclusion
  4. Talk: Wiktionary / Archives / 2002 at the Wikimedia project Meta ; this in turn relates to the entry Wiktionary / Split_into_thesaurus_and_dictionary in the same project.
  5. Phoebe Ayers, Charles Matthews, Ben Yates: How Wikipedia Works: And How You Can Be a Part of It. No Starch Press, 2008, p. 430; see. [Wikipedia-l] Wiktionary , April 17, 2001
  6. ^ Wiktionary: Milestones in the German wiki dictionary.
  7. Capitalization of Wiktionary pages in the Wikimedia project Meta , accessed on September 13, 2009.
  8. Category: All topics in the English wiki dictionary; Directory: Overview in the German wiki dictionary.
  9. Category: Audio file in the wiki dictionary.
  10. German entries in the audio file category, the number is in the top right
  11. Category: Illustration in the Wiki Dictionary, accessed on March 7, 2020.
  12. Category: Rhyme in the Wiki Dictionary, accessed March 7, 2020.
  13. ^ Wiktionary , entry in the Wikimedia project Meta , last accessed on November 6, 2015.
  14. Wiki dictionary statistics Saturday July 7, 2018. Accessed August 17, 2018 .
  15. See Wiktionary: Statistics / Language overview in the German and Wiktionary: Statistics # Detail in the English language edition of the Wiktionary.
  16. See e.g. Wiktionary Category Overview at stats.wikimedia.org .
  17. ^ Wiktionary Statistics. Bot article creations only (see also all bot editing activity). Retrieved August 25, 2017 .
  18. Talk: Wiktionary / Archives / 2002 at meta.wikimedia.org, accessed on September 13, 2009.
  19. ^ Wiktionary: Milestones at the English-language Wiktionary, accessed on February 25, 2010.
  20. Wiktionary: Statistics , version of February 1, 2020, accessed February 25, 2020.
  21. Wiktionnaire: Statistiques , accessed on 20 August 2011 ( Version with stand of 28 July 2011 ).
  22. Wiktionary: Thống kê in vietnamesischsprachigen Wiktionary, accessed on 15 September, 2009.
  23. Wikisłownik: Statystyka in the Polish-language Wiktionary, accessed on August 20, 2011 ( version as of August 16, 2011 ).
  24. Portal: Jidysz / pl / mainpage in the Polish-language Wiktionary, accessed on September 17, 2009.
  25. See Special: Listusers at the English-language Wiktionary.
  26. ^ Statement on Wiktionary in the English language version, accessed on September 2, 2009; The following processing counters are given for 5 bots: TheDaveBot , TheCheatBot , Websterbot , PastBot , NanshuBot . However, it should be noted here that not every edit represents a new entry.
  27. Free Vietnamese Dictionary Project (FVDP) at the University of Leipzig.
  28. For details see also Wiktionary: Nguồn gốc / FVDP at the Vietnamese Wiktionary.
  29. See Tsca.bot in the Polish-language Wiktionary.
  30. Entry of the first word (abandonamento) and entry of the last word (tic-tac) by Tsca.bot from the source nterlingua.filo.pl ( memento of the original from June 1, 2016 in the Internet Archive ) Info: The archive link was inserted automatically and not yet checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / interlingua.filo.pl
  31. Entry of the first word ( שפּאַס) and entry of the last word (שראַם) by Tsca.bot .
  32. triskaidekaphobia , first entry of this species by LXbot .
  33. Processing counter for LXbot .
  34. See discussion section ru: User: LXbot at User talk: VPliousnine in the English-language Wiktionary.
  35. Участник: TrudoBot in the Russian-language Wiktionary.
  36. wiktionary.org at Alexa, accessed on August 24, 2017.
  37. Petra Storjohann: What is the difference between sensitive and sensitive ? In: Journal for Applied Linguistics 62 (1), 2015, p. 120.
  38. Petra Storjohann (as a research assistant at the Institute for German Language in Mannheim ): The future project “elexiko: Paronym dictionary” . In: IDS Sprachreport 1/2014, 2014, p. 22.
  39. Carolin Müller-Spitzer: Tasks and relevance of dictionary usage research in the mid-2010s , section 3.3. In: Dictionary research and lexicography. Walter de Gruyter, 2016
  40. The Internet and the Autonomous Learner: Free teaching / learning offers on the Internet . books.google.de
  41. ^ Ontology Learning and Knowledge Discovery Using the Web , p. 87. books.google.de
  42. ^ Sylviane Granger, Magali Paquot (Ed.): Electronic Lexicography. 2012, p. 287 ff. Books.google.de