Duplicate (database)

from Wikipedia, the free encyclopedia

A duplicate or doublet (CH) is a data record in a database that is redundant, i.e. H. multiple, is present, but its redundancy cannot be recognized by checking for the same content due to the different spelling.

A duplicate is not redundancy in the sense of information technology , i.e. not redundancy that is intentionally caused due to the architecture of the system.

Duplicates arise in particular in address databases, namely when the same person or company is recorded several times on the basis of different input information, several address databases are combined or the recorded persons or companies change their names.

Since duplicates (especially with bulk mailings ) cause unnecessary costs and can have negative consequences for the image, an attempt is made with appropriate software to identify the duplicates and to clean them up automatically or semi-automatically ( deduplication ). More or less sharp phonetic , pattern-related or associative algorithms are used here.

Duplicates can also occur in material and product data. After the merger of two companies, there are usually many components in both companies, but recorded in different spellings.