A file ( english file ) is a component usually content of related data , which on a disk or storage medium is stored. This data can therefore exist beyond the runtime of a program and is referred to as persistent .
The German term "file" is much more narrowly defined than the English translation file , which often also describes a (paper) file, a (paper) card index or a card box. Possibly. a specification on data file or computer file is necessary. The Duden allows the meaning of “file” as a (paper) collection of information, but this use is probably rare.
In electronic data processing , the content of each file is initially a one-dimensional sequence of bits , which are usually interpreted in byte blocks. Only the user of a file or an application program or the operating system itself interpret this bit or byte sequence, for example, as a text , an executable program , an image or a sound recording . A file therefore has a file format .
Before the development of files and file systems, the necessity that (result) data must not be lost at the end of the program was usually met in such a way that the data was stored spatially or logically separated and the storage location was usually managed manually. For example, result data was output as a pile of punched cards, and each pile was placed in a separate box, which was then labeled accordingly (spatial separation). In the case of magnetic storage media (e.g. the first floppy disks and hard disks), the user was informed before the program start from which memory block on the medium the input data are available, from which memory block the program should store the results and the maximum number of blocks it can write ( logical separation) - this information then had to be made known to the program; A corresponding list had to be kept and kept up-to-date as to what is stored where and where there is still space available.
Importance and use
Files allow data to be easily exchanged with other programs, processes or other users. Alternative methods of data storage and exchange are databases and increasingly also cloud-based storage, which usually also manage the data as files.
With application programs, files are often read in automatically when they are started (e.g. default settings, configuration) and / or the user explicitly selects a file to be "loaded". For example, a text can be stored on a data carrier under a name (the “file name”) in a file management system (“file system”) and edited by a word processing program after it has been loaded by the user. When the user triggers the command to save, the data (here the text) in the file on the storage medium are updated and the old version is overwritten. Sometimes programs offer additional options for handling files:
- "Save as" is used to save under a new name, on a different data carrier or in a different file format;
- If necessary, data loss can be avoided through regular automatic intermediate storage;
- Warning when exiting the program without first saving the data;
- regular automatic saving of any changes in the cloud;
- simultaneous editing of the file with other users.
Sometimes metadata in the file itself can prevent data loss.
A file has an internal structure as well as external attributes with regard to its storage. The internal structure - the data format - is mostly controlled solely by the program that saves and processes this file. The external attributes are primarily a name that is also used to manage the storage, as well as general attributes for files of any type; these are mostly controlled by the file system as part of the operating system. Files make data easy to copy and transport. This enables data exchange that is independent of the actual programs for processing the data.
Files are managed through file systems in most operating systems . A file system manages the storage medium by noting in lists which areas of the medium are occupied by which files, which areas are free, and often logs of planned and / or completed changes .
Although one of the tasks of the file system is to abstract from the specific storage medium ("treat all the same"), many file systems are adapted to the usual technical properties of the storage media (e.g. block size 512 bytes for hard drives ).
For most file systems, 1 byte is the smallest administrative unit. In other words, the length of the file content bit stream must be in whole bytes (whereby 0 bytes = 0 bits are generally also allowed).
In addition to directories with filenames and storage locations, the file system almost always manages other file attributes . These often include the file type (directory, normal file, special file), file size (number of bytes in the file), read and write rights , time stamp (“date”, creation, last access and last change) and other information if necessary. In many file systems, a file can be identified as a hidden file using an attribute .
The characters that can be used in file names depend on the file system, operating system and, if applicable, language options. In the case of Unix- compatible file systems, no slash (“/”) or a null character may be used in a file name , and the length of the file name is limited to 255 characters. The characters can be coded differently. Newer operating systems also support Unicode .
Types of files
According to their content, a distinction is made among other things:
- executable files
- non-executable files
Modern file systems also support so-called " sparse files ": Only sections of a (large) file that are actually filled with data are actually saved; the "free areas" in between are not saved and are accepted / evaluated as "filled with zero bytes".
Some file systems also offer to compress or encrypt files transparently ("transparent": the reading / editing program can use the file normally as if the file were not compressed / encrypted - it "sees through this process undisturbed").
On some operating systems,
- special files (pseudo files)
- Device files , for example/ dev / printer, / dev / mouse
- Process information, for example / proc / 68 / environ
how files are handled (especially Unix family operating systems ).
Options to identify the file format include
- an identifier by the file system (for example an executability flag )
- an identifier within the data (for example
<?xml version="1.0"at the beginning, see also MIME type , magic number )
- an identifier in the file name or as a file name extension (for example.jpg, .txt)
- Storage in certain directories (e.g./ usr / share / doc)
- a resource fork and other meta information (for example with classic Mac OS )
Such a label is partly mandatory, partly it is only used for orientation of the user. Often there are no markings of any kind; for such situations there are special programs that try to determine the type of a file. In the Unix environment, z. For example, the file command is very common.