File format

from Wikipedia, the free encyclopedia

A file format defines the syntax (permitted values, formal structure / "grammar") and semantics (meaning and interpretation) of data within a file . It thus represents a bidirectional mapping of information on a one-dimensional binary memory .

Knowing the file format is essential for interpreting the information stored in a file. Modern operating systems use the file format to assign files to applications that can interpret the files.

Origin and meaning of the file format

File formats are usually defined by software manufacturers or by a standardizing body. Formats that have only been specified by a manufacturer are also referred to as proprietary file formats. Standard formats can also develop from proprietary file formats if they are documented and used by others. Standard formats make it possible for software from different manufacturers to work with the same file formats.

Organizations of the archive system for several years working on the creation of file format directories ( english file format registries ), enabling the automated detection of formats and provide information on their use.

The format of data stocks that are only used within a specific application (this also applies to individual software ) is also referred to as the "native file format".

Specifications

A specification should precisely describe the type of coding and arrangement of data within a file format. The specifications are published for many file formats, other specifications are treated as trade secrets, and there are also file formats that are not documented at all outside the programs that interpret them.

Recognition of file formats

The recognition of the format of a file is necessary in order to be able to interpret the information contained in the file. The file format can be determined automatically in three different ways:

  • Interpretation of the file content
  • Interpretation of the file name
  • Interpretation of metadata

Often the format is not recognized , but simply assumed - it is then the responsibility of the user to open only "suitable" files with the computer program.

Interpretation of the file content

To interpret the file content, the file or parts of the file are read in and examined for known patterns. Magic numbers are often used for this . The file format is recognized by the fact that the file begins with the magic number associated with the file format.

Interpretation of the file name

A common way to distinguish file formats is to interpret the file name. Usually only the file name extension is used for this . This method is used, for example, by the macOS , CP / M , DOS and Windows operating systems and also in developer tools such as make(here independent of the operating system). The last point in the file name is regarded as a separator and the following extension is used as an identifier for the file format. Since these file name extensions were limited to three characters in old operating systems, most file formats are still identified by a one to three-digit identifier (such as .Cor .exe).

Since changing the file name extension by untrained users leads to problems (a file is not assigned to any application or to the wrong application), Microsoft , for example, has decided in newer Windows versions to hide the file name extension by default, which has led to some new problems such as For example, viruses are given a “double file extension”, which means that an executable file is displayed kournikova.jpg.exeas a supposed image file kournikova.jpg.

Interpretation of metadata

The only reliable method of determining the file format is to store or transmit metadata together with the file, which exactly define the file format. Such metadata is transmitted in the form of MIME types on the Internet . Some operating systems store metadata in the file system.

Possible classifications

File formats can be classified according to many criteria. Common criteria are, for example:

  • textual versus binary
Files of a textual format can be read in, viewed and changed with a simple, general editor ; Binary files can only be understood with specially suitable applications. In the past, binary coded file formats were often preferred to text file formats because they took up significantly less storage space. In contrast, the textual file formats are becoming increasingly widespread nowadays. This applies in particular to the XML metaformat .
  • Data versus executable application
  • by content type: text, image, sound, video formats
  • open to proprietary
  • common versus rare

etc.

Proprietary formats

Copyrighted (proprietary) file formats sometimes create a dependency on the corresponding software manufacturer (and its supported platforms), especially if

  • the internal structure is additionally protected by software patents ;
  • the format is the company's intellectual property and is not disclosed to the public for economic reasons ( customer loyalty ).

No third-party or open source programs can be developed for this format.

There are risks such as B. insolvency of the manufacturer, discontinuation of further development of the product (at least for the selected platform), increase in license fees (see e.g. GIF patent fees ) or prices.

Sometimes proprietary or proprietary formats to license payment may also be used by third party companies, and thereby achieve a distribution, which ensures sufficient independence from a single vendor ( eg. The binary GIF - graphics format  - patents, however, expired on 10/2006).

This means that proprietary binary formats are only suitable to a limited extent for archiving databases, unless the format is in common use. Older documents also have to be converted to the new version of the format when the software is updated if they are to remain legible . This is also the case with the further development of free formats, but the old version of the format remains accessible, at least in principle.

Versions

Just as application programs evolve, file formats are usually subject to further development so that new versions are created. For many file formats, simple support for upward compatibility is already taken into account during development . ( Downward compatibility , on the other hand, is a problem area that is largely restricted to the application program.)

See also

literature

  • Günter Born : Reference manual file formats. Graphics, text, databases, spreadsheets . 3. Edition. Addison-Wesley, Bonn a. a. 1995, ISBN 3-89319-815-6 .

Web links

Wiktionary: file format  - explanations of meanings, word origins, synonyms, translations
Commons : File Formats  - collection of images, videos and audio files
  • Wotsit.org  - The Programmer's File and Data Resource
  • FileTypes.de  - List of file formats and file extensions

Individual evidence

  1. IT Wissen.Info Keyword "Native"