Cross-Language Evaluation Forum

The Cross-Language Evaluation Forum ( short : CLEF ) emerged from the TREC task Cross-Language Information Retrieval (CLIR), which mainly dealt with the Cross-Language Information Retrieval of European languages.

CLEF is now an independent EU project and offers a platform for evaluating and improving information retrieval systems for European languages.

The system evaluation campaigns, organized annually by CLEF since 2000, are intended to promote collaboration between researchers and developers and thus simplify and promote future initiatives for collaboration between groups with similar interests. The aim here is to process user queries that are made in any European language in any language of the documents and to obtain a result set according to relevance, which represents an answer to this question. Monolingual information retrieval is also a focus of the evaluation, but is primarily intended for teams that are participating in the campaign for the first time. There are also collaborations with similar, foreign-language initiatives from the USA and Asia.

The actual goal is to support and stimulate the development of the European cross-language retrieval systems so that their competitiveness on the world market is secured.

CLEF structure and methods of evaluation

Used databases and languages of the documents

The data sets in CLEF consist mainly of newspaper articles and news agency reports and come from the same year or period in each language to ensure that the same events and topics occur in each data set for each language. Furthermore, data from scientific sources are available for technical cross-language information retrieval .

The individual documents are provided with SGML tags to identify the data elements they contain.

The core languages are German, English, French, Italian and Spanish. Occasionally there are also databases in other languages.

Topics

The topics must be worked out by the various language groups and should adequately reflect or summarize the contents of the relevant documents.

Each language group then drafts several suggestions for topics, 50 of which are ultimately selected, which are then made available to the evaluators. These 50 topics are now translated into all languages involved, which is then checked again by specialist translators to ensure a certain consistency in the translations.

Topic examples each with a translation

The selected topics with SGML tags consist of a consecutive number (num), a title (title), a short description (desc) and a detailed description (narr).

Source language (here English)

<top> <num> C088 </num> <EN-title> Mad Cow in Europe </EN-title> <EN-desc> Find documents that cite cases of Bovine Spongiform Encephalopathy (the mad cow disease) in Europe. </EN-desc> <EN-narr> Relevant documents will report statistics and/or figures on cases of animals infected with Bovine Spongiform Encephalopathy (BSE), commonly known as the mad cow disease, in Europe. Documents that only discuss the possible transmission of the disease to humans are not considered relevant. </EN-narr> </top>

Target language (here Italian)

<top> <num> C088 </num> <IT-title> Mucca pazza in Europa </IT-title> <IT-desc> Trova i documenti che citano i casi di mucca pazza (Encefalopatia Spongiforme Bovina) in Europa. </IT-desc> <IT-narr> Sono rilevanti i documenti che riportano statistiche e/o dati numerici sui casi di animali affetti da Encefalopatia Spongiforme Bovina (BSE), comunemente detta morbo della mucca pazza, in tutti i paesi europei. Non sono rilevanti i documenti sulla possibile trasmissione del morbo all'uomo. </IT-narr> </top>

Evaluation process by CLEF

The assessment procedure for CLEF based on the pooling method of TREC . To this end, the participating systems must provide their integrated, ordered lists of results for each topic. In these lists you will find the numbers of those documents for the respective topics that were determined by the systems to be relevant, in descending order of assumed relevance. The first 60 documents for the relevant topic can be found in a pool . It is only decisive that all result lists for one of the 50 topics of the main task or for one of the 25 topics each of the domain-specific task ( GIRT ) and the scientific task ( MARYLLIS ) are put together and then placed in a random order. It is no longer possible to determine which document originates from which system or at which point in the assumed relevance sequence it was previously to be found. These lists are then broken down by language. This is where the corresponding document numbers from the corpora belonging to a specific language are merged. For each topic, you get an extensive collection of documents on a respective language, so that you can then assess the relevance.

Relevance assessment

In the relevance assessment, the ordered lists of results for a language are assessed by jurors from each language group. The assessment is recorded with the aid of the assessment software ASSESS developed by NIST . The judges' assessments of whether the result lists are relevant or not relevant for a topic are added to the language-related result lists for each topic. In their assessment, the jurors consider the topic discussions of the language groups as guidelines for the relevance decision and they use the narratives of the topics as decision aids.

Tasks

CLEF is primarily aimed at the further development of multilingual IR systems . However, it should not be neglected that on the way to testing multilingual IR systems , other languages can be included and also gaining experience in setting up tests was an important point. For this reason, different tasks were formulated which the participants could face. The main task of CLEF is the multilingual information retrieval (multilingual task). Documents are searched for in all main languages, with one of these languages serving as the starting language. A list is then created that contains all results from all document collections (i.e. from all major languages). However, it is also possible to use other languages as the source language (e.g. Finnish, Russian, Swedish), since corresponding translations of the topics of these groups are created. The main language remains the target language.

In the case of the bilingual task, documents are searched for in any source language that is not the same as the target language. B. wanted in English or Dutch. For this reason, the CLEF organizers also provide translations of the topics into Dutch and other linguistic resources for Dutch ( stop word list , stemmer , Dutch-English lexicon).

On the other hand, the monolingual task provides for documents such as B. to search in German, English, French, Dutch, Italian and Spanish in one of the corresponding document collections. The English language is excluded in this case, as the TREC ad hoc retrieval task has already covered this area in the past and therefore does not represent a new challenge with regard to linguistic problems and translation issues. The monolingual task is intended as an introduction for CLEF participants and in this way new languages can be introduced for multilingual tasks.

The scientific or subject-related task (scientific and domain-specific) allows you to search for (socially) scientific documents in special document collections, namely GIRT (German Indexing and Retrieval Testdatabase) or AMARYLLIS . With this, CLEF reacted to the recurring accusations that CLEF only ever carries out large evaluations on the basis of newspaper texts and that this does not lead to any transferable results. The documents in the GIRT and AMARYLLIS databases also contain intellectually assigned keywords from a (socially) scientific thesaurus, which is also made available (also in English or, at GIRT, in Russian translation). In addition, specific topics are provided in English and German or French (at GIRT also in Russian).

The last task is the interactive task. Above all, this should define an experimental task. The aim here is to research the evaluation of interactive CLIR and to develop benchmarks against which further research can be measured. The retrieval effectiveness is evaluated here in combination with the user interface. This is explicitly about the possibilities to formulate and change the request and to be able to quickly evaluate the result documents. In this case, test persons take on the processing of the inquiries. This means that the inquiries are not created automatically by the system or by experts.

The participants use different retrieval systems to search for a topic. The query to the retrieval systems is made in one language and they deliver documents in all target languages. The retrieval systems use system-specific methods to search, translate or transform into other languages. At the end of the recovery process, they must provide an integrated and orderly set of results of the documents that are believed to be relevant to the topic. In addition to solving translation problems, integrating the results from different databases is a further challenge.

Question and answer discipline (QA track)

Since 2003 there has been a question-and-answer discipline (QA Track, short: QA @ CLEF) that evaluates question-and-answer systems for non-English European languages. In 2007, the document collections were massively changed by adding a snapshot from Wikipedia to the traditional news corpora for each language involved. In 2009 a completely new corpus with EU documents (JRC-Acquis) was taken. In the different years other tasks were offered at QA @ CLEF, e.g. B. Question answer for spoken language, question answer for geographically colored questions about Wikipedia content (GikiP, GikiCLEF).

CLEF workshops

Every year since 2000, CLEF has also organized a workshop where the current CLEF results are presented and discussed. The venue is always (?) Based on the ECDL conference. The conference locations are given below:

year	place
2000	Lisbon
2001	Darmstadt
2002	Rome
2003	Trondheim
2004	Bath
2005	Vienna
2006	Alicante
2007	Budapest
2008	Aarhus
2009	Corfu
2014	Sheffield
2015	Toulouse
2016	Evora

Web links

http://www.clef-campaign.org/
Detailed German explanation of the topic (PDF file; 125 kB)