Filename

from Wikipedia, the free encyclopedia
File names on a screen console under Windows

A file name identifies a file in a file system on a data carrier or in a data transfer . Usually a file is also characterized by a directory name , so that a complete path name is created. Only this combination to a full path name is usually unique. Identical naming of two files is not possible per directory (folder) ; under some operating systems, e.g. B. Unix, a changed uppercase and lowercase character qualifies a different file name on a corresponding file system. B. Dateinameand dateinametwo different - and therefore not identical - designations.

The optional file name extension is part of the file name.

properties

A file name can  consist of several parts, depending on the operating system . The individual parts are separated by certain characters, which as a rule cannot be part of the file name; the list of file name extensions provides an overview.

Some operating systems make the handling of files dependent on the respective file name extension . Other operating systems recognize the file type based on its content - for example using a so-called magic number , with the help of which a certain file type can be determined relatively reliably, or using data stored in the alternative data stream associated with the file , such as the file type and the program that was created. But even on systems that do not derive the file type from a file name extension, file names are provided with it - u. A. because it simplifies data exchange.

The maximum length of a file name is limited by both the operating system and the file system of the data carrier. For example, a CD-ROM can use a maximum of 64 characters when using the Joliet file system . An indirect limitation can also result from a maximum length of path names in the operating system.

A difference between Windows and Unix (as well as Linux ) is that Windows does not differentiate between upper and lower case letters in filenames ( English case insensitive ), while Unix does this ( English case sensitive ). Under Unix, for example, the files Dateiname.txtand are dateiName.txttwo different files, while this is not possible under Windows.

File systems

File system typical application Max. Number of characters in a file name Character set Upper / lower case (1)
FAT ( DOS , without VFAT) Hard drives, memory cards (photo) 8 + 3 OEM (mostly code page 437 ) no distinction (only capital letters are saved)
Amiga Fast File System Hard drives 31 ISO 8859 stored, but no distinction
ISO 9660 level 2 CD, DVD 31 ASCII no distinction (only capital letters are saved)
Joliet CD, DVD 64 Unicode stored, but no distinction under Windows
ISO 9660 : 1999 CD, DVD 179–221, depending on other attributes ASCII / unspecified operating system dependent
FAT with VFAT ( Windows ) Hard drives, USB sticks 255 Unicode saved, but no differentiation under Windows (e.g. already on Unix)
ext3 Hard drives 255 (2) Unicode (3) Distinction
HFS + Hard drives 255 Unicode (UTF-16) no differentiation in variant HFS (standard) , with optional strict differentiation (as HFSX)
UDF CD, DVD 255 Unicode saved; Differentiation depending on the operating system
NTFS Hard drives 256 (4) Unicode (UTF-16) the file system supports the distinction, but the implementation can be selected; by default not differentiated under Windows
ReFS Hard drives 32,000 Unicode the file system supports the distinction, but the implementation can be selected
APFS SSDs (5) Unicode the file system supports the distinction; A distinction is made by default under iOS, file names are normalized by default under macOS
(1)In English , the clear distinction between upper and lower case is called " case-sensitivity " and, if this distinction is not made, " case- in sensitivity ." A file system implementation in an operating system is either case-sensitive (makes the distinction) or case- in sensitive (does not differentiate between uppercase and lowercase letters).
(2)If UTF-8 coding and non-ASCII characters are used, 255 bytes are available , but fewer than 255 characters .
(3)The coding is not standardized; UTF-8 is usually used as the default.
(4) When using long Unicode paths, only 255 characters are possible
(5) APFS is the successor to HFS + and has been optimized for flash storage and SSDs, but it also works on hard drives and other data storage devices.

implementation

File systems have certain internal structures that mostly correspond to that of the reference system for which the file system was developed. A file system usually has no rights management if the operating system does not know this either. An example of this is the FAT file system of PC DOS or MS-DOS : since DOS itself is not a multi-user system , it does not save any access rights for files and directories. The same applies to the creation and access times of files, which are stored in the usual system either as local time or as universal time, or with the convention of upper and lower case.

Large and lower case

In English, the distinction between uppercase and lowercase letters is called " case-sensitivity ". An operating system that makes this distinction sees Dateiname(with capitalized first letter) a different file name than dateiname(everything in lower case), which in turn represents a different file than DATEINAME(everything in upper case). Unix operating systems are traditionally " case-sensitive ," that is, they differentiate between uppercase and lowercase letters.

Many other operating systems, such as CP / M , PC DOS and MS-DOS as well as compatible DOS operating systems , but also classic Mac OS , macOS (originally “Mac OS X”) and AmigaOS , do not differentiate between upper and lower case letters, which has an effect on the file system used. Under MS-DOS are therefore the file name Filenameand filenameone and the same file. The FAT file system also always saves the file in capital letters, i.e FILENAME.,. With VFAT, however, the file is saved in the notation used when it was created, but no distinction is made when it is accessed. The Amiga “ Fast File System ” works in the same way . The Hierarchical File System (HFS) from Apple can be " case-sensitive ", but is not due to its compatibility with existing applications. During the development of HFS + it would have meant a break with existing Macintosh applications, which is why the variant " case- in sensitive " is still used under macOS with HFS + or its successor APFS , which does not distinguish between uppercase and lowercase letters. On iOS, however , the HFS + or APFS file system is " case-sensitive " by default .

Interoperability

The FAT file system (in the variants FAT12, FAT16 and FAT32) is implemented on almost all operating systems. But because the file system makes certain assumptions about the underlying system - and only stores file names in uppercase letters, these are e.g. B. on Unix-like operating systems converted to lowercase letters by the file system driver by default. With the VFAT extension, this conversion is only carried out if the file name is in the 8.3 convention and in capital letters in the FAT. Other extensions for FAT, such as UMSDOS, implement such a conversion independently if the file was not saved on Unix and is kept in UMSDOS format. However, since Unix / Linux differentiates between uppercase and lowercase letters, a file is only recognized in the converted spelling or in the spelling stored in the VFAT.

An example: Under Windows (with VFAT, i.e. from Windows 95) a file with the name is Dateiname.Extsaved. This is on Windows, because Windows' case- in sensitive is "recognized in another notation, for example, in the command prompt to del dateiname.extdelete. Under Linux, however, exactly this file is only recognized in the exact spelling Dateiname.Ext. For example, if you want to lessdisplay the file and make a mistake e.g. For example, if the file name extension less Dateiname.extis:, Linux issues the error message that the file does not exist.

So it depends not only on the file system, but also on the operating system and how it handles the information stored in the file system (file name, rights, date). Under certain circumstances z. For example, a file system driver may respond to the characteristics of the file system, but the operating system environment prevents this. One example is the use of a Unix shell with wildcards . Under Linux, the "case- in sensitivity" is built into the file system driver for AFFS , but not in the shell. If, for example, the file name is saved on an Amiga “ Fast File SystemDateiName(partly with uppercase letters), it can be rm dateinamedeleted in the Unix shell with (everything in lowercase letters) because the file system driver carries out the conversion. However, if you enter rm dat*, it will DateiNamenot be deleted because the Unix shell searches for the file name - since this is " case-sensitive ", no match is found because the shell strictly distinguishes between upper and lower case letters.

The access rights and owners of files can either be adopted or completely ignored on different operating systems. The FUSE -Dateisystemtreiber NTFS-3G for example, supports the access restriction if the relevant Windows user SID was previously assigned to a Unix user ( English usermapping ).

Operating systems

Unix

Unix and Unix-like operating systems such as Solaris or Linux consider file names as a whole. A file can have several names and be located in several directories (“ hard links ” or “ bind mounts ”). All characters except the slash / and the null character are allowed. Early versions had file names 1 to 14 characters long. The BSD variants introduced names up to 255 characters long.

A relative file path can consist of several segments and begins with one segment. Each segment is subject to the rules of the file name, so it can be 14 or 255 characters long. The segments of the file paths are separated by the character /. The last segment identifies the actual file. The preceding segments are either directory names or symbolic links to directory names. A relative file path is based on the current working directory, which each process can set individually. An absolute file path, on the other hand, begins with /and is independent of the current working directory. It starts from the root directory. All files of a system are accessible via the root directory.

A distinction is made between upper and lower case letters for access.

Examples:

/home/user/Dokumente/brief.txt
/usr/bin/texteditor

The file name .(dot) denotes the current working directory. The name ..refers to the parent directory.

Also, the spaces, the line separator or the so-called wildcards * and ?may be part of a path name. However, such characters sometimes cause problems later on, as badly programmed scripts , for example , cannot handle them. Furthermore, there may be problems with file names that contain characters that do not appear in the currently used character set of a program (for example Japanese characters on an American system). The characters that cannot be displayed are then often displayed as question marks or small boxes, which makes it very difficult to access the data. These files can then often only be edited after they have been renamed at a low file system abstraction level (for example by specifying the so-called inode instead of the file name with ls -iand find . -inum […] -exec mv {} […] \;).

A Unix system does not use special extensions such as .EXEor .CMD. However, it has become common practice to add a point and a corresponding extension to files of a certain type, as in other operating systems, in order to improve clarity. For example, the ending is used .cfor C source programs. Executable files, i.e. programs and scripts, do not have an extension. Otherwise, file types can be determined with the simple program file, regardless of any existing extension.

Files or directories whose names begin with a dot are usually treated as "hidden" files and are only displayed if the user specifies this explicitly (for example with ls -a).

The same applies to directory paths.

CP / M, DOS, Windows up to version 3.11

Under CP / M and the various PC-compatible DOS versions including Windows up to version 3.11 ( Windows 3.x ), file names consist of an actual “name” with a maximum of eight characters and optionally a period and an “extension” with a maximum of three characters “( English extension ), which also specifies the type of file concerned (see 8.3 ). Extensions are often assigned by programs or reserved for programs, for example the extension .TXTfor text files. The operating systems even use special enhancements such as .BATfor scripting files .SYSfor driver files or .EXEand .COMexecutable files.

A file name including the extension can consist of the following characters:

  1. Letters, A – Z (lower case letters are automatically converted to upper case)
  2. Digits, 0-9
  3. The special characters ` ' { } ( ) % & - @ # $ ~ ! _ ^

The following characters are not allowed in file names and extensions, since they fulfill syntactic functions in the systems mentioned:

  1. ASCII control characters
  2. Spaces
  3. The special characters ? * < > . , \ + : = / " ; [ ] |

In addition, some words are reserved and must not be used as file names, as they are used as device names :

AUX, CON, NUL, PRN
COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9
LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9

This can be under classic DOS, for example, the following file names that can be permitted under other operating systems, do not use: aux.c, q"uote"s.txt, NUL.txt.

Directory names are handled like normal file names under the operating systems mentioned. They usually do not have an extension, but one can be provided. In contrast to the names of other files, this usually has no function. Each file and each directory is located on a drive, which is identified by a letter and a colon. A full name consists of the drive, optionally one or more directory names and the actual file name. The named components are \separated from each other by the directory separator symbol.

A:\MSDOS.SYS
C:\DOKUMENT\BRIEF.TXT

Since only eight characters are available, the names are often garbled. As in Unix, the names .and ..are reserved for the current directory and the higher-level directory.

The access is not case-sensitive.

Windows from version 95

Under Windows , both Windows 9x and Windows NT lines , a file name consists of the name, a period and an extension that defines the file type. You can also specify several points in a file name; the last point then serves to separate the name and extension.

Length of file name and path

Normally, the path length is limited to 260 characters under Windows; H. three characters for the drive specification, 256 characters for the path within the drive and an invisible string termination character. Longer paths of up to 32,767 characters, as supported by NTFS, are possible using UNC ( Uniform Naming Convention ), i. H. \\?\must be prefixed.

To maintain compatibility with old MS-DOS programs, the file name can also be specified in 8.3 notation if this has not been deactivated in Windows. The file name is clearly shown with eight characters for the name, a period and up to three characters for the file extension, which are regenerated in each directory. If files have lost their long filenames, i.e. they only have this specific short name, there may be conflicts with existing files with long filenames whose filenames have been shortened to the same name, even if they previously coexisted in another directory without any problems. (→ 8.3)

Problematic and illegal characters or names

As in DOS and Windows up to version 3.11, the following characters are not permitted in file names and extensions:

    < > ? " : | \ / *

The following file names reserved as device names are also not permitted:

CON, PRN, AUX, NUL
COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9
LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9

This can be also among the newer versions of Windows, for example, the following file names that can be permitted under other operating systems, do not use: aux.c, q"uote"s.txt, NUL.txt.

Also problematic are file names that &-Zeichencontain what is actually allowed , but which is used by the DOS environment under Windows as a separator for single-line command chains, so that everything &following a -character is interpreted as another DOS command line. As a consequence, in this case the Windows command prompt throws an error message that it could not find or execute a command whose name is the rest of the entered file name after the &character, not to mention that the file in question itself is of course also could not be opened or edited.

In addition, file names that have a space at the end are also problematic . You cannot create this under Windows; if they are created under other operating systems, they cannot be accessed under Windows because Windows simply cuts off the spaces at the end. Malware authors have already taken advantage of this, as anti-virus programs can only access such files by taking special measures.

Otherwise, all characters defined in the Unicode standard can be used, whereby in practice older applications often have difficulties with characters whose code is not contained in the Windows 1252 character set.

VMS

Under VMS ( Virtual Memory System ) a file name consists of a name, a period, an extension, a semicolon and a version number. The version number is automatically increased by one each time a file of the same name (with extension) is created. This allows you to keep several versions (number is adjustable, maximum 32,767) of the same file at the same time. The following information applies to ODS-2 ( english on disk structure ):

File names can be up to 39 characters long, whereby only certain characters (letters, digits, underscores, dollar signs) are allowed. No distinction is made between upper and lower case. The extension can also be 39 bytes long, is separated by a period and is not part of the file name. Except for directories, where the extension is always .DIR, it has no meaning for the possible use of the file (but there are standards that are usually observed for some file types).

The total path length (i.e. disk, directory tree, file name, extension and version) must not exceed 255 bytes.

Internet

World wide web

The transfer of files on the World Wide Web is regulated by the HTTP standard. If a file name contains characters outside of the ASCII letters and numbers, these are encoded in the URL in a % representation with a percent sign , followed by a two-character code in hexadecimal form, for example haust%FCr.htmlinstead of haustür.html. In order to be able to determine the code value, it is necessary to know the character encoding ( e.g. UTF-8 or ISO 8859-1 ) of the file name.

File download

The FTP standard only requires ASCII characters to be supported. Often, however, a file download is also carried out using HTTP.

e-mail

The transfer of file attachments (and thus also the file names permitted there) is regulated in the SMTP and MIME standards.

Web links

Wiktionary: file name  - explanations of meanings, word origins, synonyms, translations

Individual evidence

  1. Section Maximum Path Length Limitation MSDN , Naming Files, Paths, and Namespaces .
  2. Richard Russon, Yuval Fledel: NTFS Documentation .
  3. affs documentation for the Linux kernel (English), section “ Bugs, Restrictions, Caveats ”; accessed on June 12, 2016.
  4. Tuxser - NTFS-3G User Mapping (English); accessed on June 12, 2016.
  5. a b Computing Center Newsletter: MICRO digest: MS-DOS Filenames and Common Extensions . (English). The University of Michigan, Ann Arbor 1986, vol. 16, no. 2, 8
  6. Naming Files, Paths, and Namespaces. In: MSDN. Microsoft, accessed September 13, 2011 .