Translation memory

from Wikipedia, the free encyclopedia

A translation memory (including translation archive ;. English translation memory , abbreviated TM ) is a database of structured English translations, the main component of applications for computer-aided translation ( translation computer-aided , abbreviated CAT) represents.

Database structure

There are two basic types of database structure:

  • On the one hand, there are databases in which the stored segments are texts that belong together [separated into source (ling. "Source language") and target language]. These systems have the advantage that no isolated sentences are stored, but each sentence in context. In addition, the database query can be restricted to certain topics and thus the display of hits can be accelerated.
  • On the other hand, there are databases in which the segments are sentences or paragraphs that are saved in isolation, i.e. without the context of the source texts. The response times do not depend so much on the size of the units as on the efficient indexing in the database.

Practical work

Example of a translation process with the support of a translation memory in the free OmegaT software .

In practice, working with a translation memory begins with a source text being called up directly from the word processor or imported into stand-alone TM programs. The program then searches the memory for formulations with a given minimum match and offers them as a translation. These translations can be accepted, rejected or adjusted by the editor. If no suitable segments are found, the editor enters a new translation, which he can then save with the source segment. If it does, it will then be suggested whenever similar segments occur. If the segments are provided with additional information, this makes it easier to choose between several suggestions later. Such information includes:

  • User from whom the saved translation originates (created / changed segment)
  • Date of creation / modification of the segment
  • Frequency of formulation
  • Context of formulation
  • Further classifying information

This additional information is either assigned automatically by the program or has to be maintained manually by the translator.

When recognizing the extent to which the searched text resembles an already saved source text segment, the software evaluates not only the letter sequences of the text but also punctuation marks, spaces, paragraph marks and possibly even formatting, depending on the presetting.

Program-technical properties

TM systems usually have functions that enable the recognition of a usable translation regardless of variable elements such as numbers, dates, units of measurement or proper names.

The search for similar source segments is carried out by means of search algorithms of varying complexity ( fuzzy search ), which then usually also indicate a percentage similarity value.

In order to make texts from word processing and DTP programs available for the TM systems, there are filter and extraction programs that extract the source text from the respective files. The result is a marked (“tagged”) file in which the text to be translated is available between special control codes (tags). These layout tags are protected or hidden by the system so that they cannot be accidentally overwritten or changed. When translating software ( localization ), the program code can be protected against unintentional changes in this way. After the translation, the control codes are used by the filter program to reinsert the texts in the correct place in the DTP file and also to apply formatting (e.g. bold, italic, ...) to the corresponding places in the translation.

Most TM systems have special editors to make working with these "tagged" files easier.

The exchange between different TM systems can translation memories via the TMX (format T ranslation M emory e X change) and projects over the XML Localization Interchange File Format exchange (XLIFF). They are open standards that are supported by most professional providers. However, since the content of a system depends heavily on the type of segmentation and the definition of the TMX format leaves a lot of room for interpretation, the exchange is usually not loss-free.

Individual evidence

  1. English translation. In: TechDico. Retrieved July 18, 2019 .