Linguistic Linked Open Data

from Wikipedia, the free encyclopedia

Linguistic Linked Open Data (LLOD) refers to the collection and linking of open source linguistic resources such as lexicons, ontologies, metadata collections and annotations with the help of linked open data technologies.

Linguistic Linked Open Data

Linguistic Linked Open Data (LLOD) cloud, version from August 2017

In Computational Linguistics / Language technology , linguistics and neighboring areas of science called Linguistic Linked Open Data a method and an interdisciplinary scientific community, dedicated to the creation, sharing and (re) use of language resources in accordance with the principles of Linked Open Data busy . The Linguistic Linked Open Data Cloud was and is developed by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation and has since developed into a focus of work for several W3C Community Groups, numerous research projects and various infrastructure projects.

Linguistic Linked Open Data refers to the publication of data for linguistics and language technology based on the following principles:

  • According to the Open Definition, data should be licensed open source, for example using Creative Commons licenses.
  • The individual elements of a data record should be clearly identified with the help of URIs .
  • URIs should be resolvable as web addresses so that users can access more information using web browsers.
  • Machine access to an LLOD resource should deliver results based on web standards such as the Resource Description Framework (RDF).
  • Data should have links to other resources to enable users to find more information, such as the meaning of the elements of the vocabulary used.

The main advantages of LLOD were identified:

  • Representation: Linked knowledge graphs offer a flexible way of modeling linguistic data.
  • Interoperability: Data relating to common (RDF) vocabularies can be easily linked together.
  • Federation: Data from different locations can be easily integrated with RDF and LOD.
  • Ecosystem: There is a wide range of open source tools available for RDF and Linked Data.
  • Expressivity: Existing vocabularies for the representation of language resources exist i. d. R. already.
  • Semantics: Links to external vocabulary clearly indicate what is meant.
  • Dynamics: Up-to-date data available via the Internet can be obtained at any time.

An important visualization of the data available as LLOD is the LLOD cloud diagram currently available at linguistic-lod.org.

use

Linguistic Linked Open is and has been used for various scientific problems:

  • In all branches of empirical linguistics, computer philology and language technology, linguistic annotations and linguistic markup are essential basic elements of academic work. LLOD can help to compensate for interoperability problems, e . B. different vocabularies and annotation schemes that are used in different resources or by different annotation or analysis tools. The linking of language resources with ontologies and knowledge graphs enables the reuse of common vocabularies and their interpretation on a concrete common basis.
  • RDF and LLOD are graph-based formalisms that are suitable to represent any linguistic data structure and to relate corresponding data; for example corpora in different formats with dictionaries .
  • Multilingualism, e.g. B. when linking lexical networks such as WordNet and in heterogeneous resources such as Wikipedia .
  • Possible starting point for the standardization of data structures and metadata of language resources

Selected resources

October 2018, the 5 most linked resources in the LLOD diagram (in order of the number of resources linked to them) were:

  • The Ontologies of Linguistic Annotation (OLiA, linked to 74 resources) provide reference terminology for linguistic annotation and grammatical metadata;
  • WordNet (linked to 51 resources), a lexical database for English and the starting point for the development of similar databases for other languages, in several editions (Princeton WordNet linked to 36 resources; W3C edition linked to 8 resources; VU Amsterdam edition linked to 7 resources);
  • DBpedia (linked to 50 resources) multilingual knowledge graph for general world knowledge, based on Wikipedia;
  • lexinfo.net (linked to 36 resources) provides reference terminology for dictionaries and lexical resources;
  • BabelNet (linked to 33 resources) multilingual lexicalized semantic network , based on the aggregation of various other language resources, v. a. WordNet and Wikipedia;

Development and Community Activities

The LLOD cloud diagram is developed and provided by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation (Open Knowledge International since 2014), an open and interdisciplinary group of experts in various, v. a., European educational and research institutions.

The OWLG organizes various community events, coordinates LLOD development and contributes to the interdisciplinary communication between LLOD users and developers. Various W3C Business and Community Groups focus on more specific aspects of LLOD. The W3C Ontology-Lexica Community Group (OntoLex) in particular is very active and develops specifications for the publication of machine-readable dictionaries in the LLOD cloud or as RDF.

The development of the LLOD cloud has also been documented and promoted through various international workshops, datathons and publications. These include a.

  • Linked Data in Linguistics (LDL), annual workshop in connection with international conferences, since 2012; Biennial since 2017 and alternating with the relevant conference series Language, Data and Knowledge (LDK)
  • Summer Datathon on Linguistic Linked Open Data (SD-LLOD), biennial summer school and hands-on workshops (Datathon), since 2015

Use and development of LLOD technologies and resources have been and are the subject of various major research projects, e.g. B.

  • LOD2. Creating Knowledge out of Interlinked Data (11 EU countries + Korea, 2010-2014)
  • MONNET. Multilingual Ontologies for Networked Knowledge (5 EU countries, 2010-2013)
  • LIDER. Linked data as an enabler of cross-media and multilingual content analytics for enterprises across Europe (5 EU countries, 2013-2015)
  • QTLeap. Quality Translation by Deep Language Engineering Approaches (6 EU countries, 2013-2016)
  • LiODi. Linked Open Dictionaries (BMBF eHumanities Young Investigators Group, Goethe University Frankfurt, 2015–2020)
  • FREME. Open Framework of E-Services for Multilingual and Semantic Enrichment of Digital Content (6 EU countries, 2015-2017)
  • POSTDATA. Poetry Standardization and Linked Open Data (ERC Starting Grant, UNED, Spain, 2016-2021)
  • Linking Latin (ERC Consolidator Grant, Universita Cattolica del Sacro Cuore, Italy, 2018–2023)
  • Pret-a-LLOD (5 EU countries, 2019-2021)
  • NexusLinguarum. European network for Web-centered linguistic data science (COST Action, 35 COST countries, Belarus, Georgia, USA, 2019-2023)

Individual evidence

  1. ^ Open Linguistics Working Group: Linguistic LOD . LIDER project. Retrieved May 24, 2016.
  2. Christian Chiarcos, John McCrae, Philipp Cimiano, Christiane Fellbaum: Towards open data for linguistics: Lexical Linked Data . In: Alessandro Oltramari, Piek Vossen, Lu Qin, and Eduard Hovy (eds.), New Trends of Research in Ontologies and Lexical Resources. Springer, Heidelberg 2013 (accessed on May 24, 2016).
  3. ^ Linguistic Linked Open Data. Information about the current status of the growing cloud of linguistic linked open data. . Retrieved December 10, 2019.
  4. linguistik.de: Linguistic Linked Open Data. August 9, 2017, accessed January 1, 2020 .
  5. lod2.okfn.org (archived version) . Retrieved December 9, 2019.
  6. Multilingual Ontologies for Networked Knowledge (Monnet) . European Commission, CORDIS EU research results. Retrieved December 10, 2019.
  7. LIDER: Linked Data as an enabler of cross-media and multilingual content analytics for enterprises across Europe . European Commission, CORDIS EU research results. Retrieved December 10, 2019.
  8. ^ Quality Translation by Deep Language Engineering Approaches . European Commission, CORDIS EU research results. Retrieved December 10, 2019.
  9. Linked Open Dictionaries (LiODi) . Retrieved December 10, 2019.
  10. ^ Open Framework of E-Services for Multilingual and Semantic Enrichment of Digital Content . Retrieved December 10, 2019.
  11. POSTDATA - Poetry Standardization and Linked Open Data . Retrieved December 10, 2019.
  12. Linking Latin. Building a Knowledge Base of Linguistic Resources for Latin . Retrieved December 10, 2019.
  13. Pret-a-LLOD project home page . Retrieved December 10, 2019. Pret-a-LLOD . European Commission, CORDIS EU research results. Retrieved December 10, 2019.
  14. CA18209 - European network for Web-centered linguistic data science . cost. European Cooperation in Science and Technology. Retrieved December 10, 2019.