European Advanced Multilingual Information System

from Wikipedia, the free encyclopedia

The European Advanced Multilingual Information System , acronym EURAMIS , was developed as a translation support system for the Directorate-General for Translation of the European Commission . Software development began in 1995, but the system is still being adapted and expanded.

introduction

At the beginning of the development of EURAMIS, the Directorate General for Translation of the European Commission (DGT) mainly had to translate into the 11 official languages ​​of the time. With the enlargement of the European Union, a total of 24 official languages and important languages ​​of trading partners (Russian, Arabic, Chinese and Turkish) have to be supported.

EURAMIS aims to provide the translators with as much data as possible that are relevant (or useful) for the translation in a kind of one-stop shop , so that this data can be reused as far as possible. The main purpose of this is to increase the quality and coherence of the translations, but also to optimize the translation process. With the amount of available data and the number of language pairs required, this is only possible to a limited extent with commercial systems.

EURAMIS is used within the DGT as a tool for the translation of the vast majority of documents. Almost all translation services of the other European institutions use EURAMIS primarily in connection with the legislative process ( European Parliament , Council of the European Union , Court of Justice of the European Communities , European Court of Auditors , European Economic and Social Committee , Committee of the Regions ); in addition, EURAMIS is used by the Translation Center for the Bodies of the European Union .

System architecture

EURAMIS was one of the first translation support systems that consistently as client - server - architecture have been implemented. After the architecture was originally two-tier , it is now three-tier , i.e. i.e., it has its own data layer.

Because of the high resource requirements for processing linguistic mass data, most services and functions are only offered in batch processing : the data to be processed are passed on to the system, which after a while sends back the requested results. Wherever this makes sense and is possible, online services are also occasionally offered.

EURAMIS was multilingual from the start, so it is not only based on language pairs, but also uses a cross-language data structure to translate between all official languages ​​of the European Union. Since the various official languages ​​have a number of specific special characters, it was obvious to use Unicode throughout for character representation. The system was also a pioneer in this area.

Data layer

The central translation memory is the core of EURAMIS. It contains almost 1.2 billion sentences in all official languages. Since all sentences are always saved for each document, there are about half as many types as tokens . that is, the number of different sentences is about half the total number of sentences.

The management of the multilingual data in a relational database enables functionalities that are not implemented in other translation memories. If, for example, a document has been translated into several languages ​​(e.g. from English into French and German), the data will also be taken into account - without duplication - when searching in the opposite direction of translation, and even when searching between the original target languages ​​( here between German and French). This is particularly advantageous because the source language of a document can change from version to version.

Furthermore there is z. B. a virtual document from the entirety of the sentences (a language) that have the same meta-information without a physical subdivision being necessary. The translation memory can be addressed in its entirety, but also in its virtual substructures. This is z. B. used to give priority to entries from binding documents.

Logic layer

Application server

The most important applications within EURAMIS are

  • Searching for and storing data in the central translation memory: a separate indexing system with its own search algorithm was developed to find similar, but not necessarily identical, pieces of text (so-called fuzzy retrieval ) ; this is sufficiently efficient for the given amount of data, but too slow for interactive use
  • Programs for data management of the translation memory : downloading documents, document-related maintenance of meta information, automatic coherence check and, if necessary, adaptation of meta information when new entries are saved, correction of linguistic information
  • the creation of alignments at sentence level on the basis of the original and existing translation: the algorithm used is based on heuristic methods, e.g. B. record length or parallelism of numbers etc. and therefore does not always deliver error-free results
  • an integration of various multilingual text databases (e.g. EUR-Lex ), which enables users to request alignments of documents from these text databases
  • Process for identifying documents that may be relevant for translation: u. a. automatic evaluation of references in all official languages ​​and statistical procedures for the recognition of (partial) predecessor documents
  • the connection to the machine translation system of the European Commission
  • an integrator who on the one hand controls the sequence of the individual modules (e.g. conversion of proprietary formats - sentence segmentation - search in the translation memory - conversion to the output format - transfer of the result), but on the other hand is also used to design and create complex services monitor (e.g. search in translation memory, recognition of reference documents, extraction of these documents from the translation memory or download from the corresponding text database and alignment of the documents)

If it is necessary to pass on data between the individual modules or applications, this is done via an SGML- based so-called pivot file in which the called components add their results. At the end of the process, the required information is filtered out and converted into the desired format.

Web server

Access from the client side is via a web browser . Some online applications (e.g. concordance ) are located on the web server and interact directly with the database.

Windows server

Some applications are carried out on a central Windows server (e.g. conversion of proprietary formats to RTF or preparation of documents for alignment).

Presentation layer

The web interface provides access to around 40 menus with which users can formulate their inquiries or orders. A majority of these menus are also offered in the form of web services. With this interface, the user can either formulate interactive queries or transfer documents for batch processing.

The translators use the results provided by EURAMIS with desktop software as the front end (currently the commercial product SDL Trados Studio and the open source software OmegaT ; data exchange via TMX files), or exclusively in an HTML page in which information about the hit rate in color and meta information (origin of the sentence found, e.g. document number, client) is shown in the form of comments. The procedure described first is generally preferred, since repetitions or similarities within the same document can also be used through the interactive use of a translation memory.

A Windows-based editor allows any errors in alignments to be corrected. The display is in tabular form: the user can look through the aligned document and only has to intervene where errors need to be corrected. The alignment editor has a number of auxiliary functions, e.g. B. Search and replace , spelling error detection and processing of meta information.

Automation within the workflow

Many of these steps are automated within the DGT: all electronically incoming translation jobs are handled with standard parameters, the results are made available to the translators within a workflow system. If they were made with the help of the front end, the translations are automatically placed in the translation memory, otherwise an alignment is usually carried out by an assistant.

A similar automated integration into the work process is currently taking place at the other participating institutions with the help of web services.

use

While the use of EURAMIS has stabilized at a high level within the European Commission, mainly because of the automation that has already been driven very far, use by the other institutions involved is still increasing. Currently, several million pages are searched for or aligned in the translation memory each year. Furthermore, more than 80,000 interactive queries are made with the concordance every working day.

literature