sort (Unix)

from Wikipedia, the free encyclopedia

sort ( /usr/bin/sort) is a program with which data streams or files can be sorted, merged or checked for an existing sort. Sorting keys can be alphabetical or numerical and comprise configurable parts of the input (lines) in a configurable order as well.

The range of functions as well as the functionality of sortis regulated for UNIX systems by the POSIX standard, while the GNU - shows sortsome deviations from this standard. The Single UNIX Specification lists the utility sortas “mandatory” (necessary component) and specifies its expected behavior.

Working method

sortworks line-oriented, items of the sort are so-called records (corresponds to lines), which are separated by newline characters. Each such record in turn consists of fields that are separated by field separators . The default for the field separator is blank , but any other character can also be selected using the command line option -t <char>.

Sorting keys are defined by specifying a field (or part of it, e.g. the third to fifth character of a certain field) and the associated sorting method (alphabetical or numerical). Complex sorting keys can be built up from several consecutive such individual keys. For example, you can sort according to a date field in the format "DD-MM-YYYY" by using numerically primarily after 7-11. Character, as a secondary key after the 4th – 5th Character and as a tertiary key after the 1st – 2nd Characters are sorted (the option -nat the beginning defines all subsequent keys as numeric):

sort -n -k 1.7,1.11 -k 1.4,1.5 -k 1.1,1.2 /path/to/input

If nothing else is explicitly stated, the remainder of the line following the last key definition counts as the last partial key (in extreme cases - if no key is defined at all - this means that it is sorted sortaccording to the entire record ). If this is not desired, the end of the key must be expressly stated:

sort -k 2 /path/to/input     # sortiert nach Feld 2 bis Zeilenende
sort -k 2,2 /path/to/input   # sortiert ausschließlich nach Feld 2

Alphabetical sorting is significantly influenced by the internationalization settings, in particular the variables LANGor LC_ALL, LC_COLLATEetc., numerical sorting also react in its behavior to the respective value of LC_NUMERIC.

Like most of the UNIX tools defined in the POSIX standard, it also complies with sortthe utility syntax guidelines , with the exception of guideline 9 . In addition, both -and as +an option delimiter are accepted.

Input and output behavior, return values

sortwrites its output to stdoutand error messages, unless otherwise specified stderr. These expenses can be diverted using the usual means ( pipeline , redirection ). In addition, the switch -o <file>is available, which defines a defined file as the target of the standard output.

sorttakes either a data stream as input stdinor one or more files as arguments. If several files are specified, they can be merged into a single output file during sorting. The special file name -means stdinthat a data stream can also be merged with other files.

In addition to the usual return values ​​0 (success) and> 1 (inherent error condition), the value 1 can also be returned if the sorting of a file is only checked. This means that the specified file is not sorted according to the specified criterion.

Notes on use

Obsolete methods of key definition

The original sortdid not know the now common and standardized form of key -k <Teilschlüssel>definition using multiple expressions. Instead +N[.M], the beginning of the key with the switch and the end of the respective partial key were -N[.M]specified, whereby Nthe (zero-based) number of the field represents Mthe (also zero-based) number of the character within the field. The following example provides the same key definition in old and new notation. It sorts the user directory /etc/passwdnumerically ( -n) according to the 3rd field (the user ID), where ":" serves as the field separator ( -t':'):

sort -t':' -n +2 -3  /etc/passwd
sort -t':' -n -k 3,3 /etc/passwd

This method can still be seen very often in existing scripts, but its use is now not recommended. Even if most of today's implementations still understand this notation, it is no longer part of the POSIX standard and portable scripts should therefore not be required.

Influences on the sorting order

In addition to the basic distinction between alphanumeric and numeric sorting and the internationalization variables already mentioned, the user has a number of other options available to influence the sorting sequence. This can be done globally for the entire sorting using an option or only for a partial key using a subsequent modifier. The option and modifier have the same name.

sort -n -k 3,3 -k 4,4 /path/to/input   # -n gilt global für beide Schlüssel
sort -k 3,3n -k 4,4 /path/to/input     # -n gilt lediglich für den ersten Teilschlüssel
The following modifiers are available:
b
ignore leading blanks ; Leading spaces are ignored, which also applies to keys that do not start with the first character of the field. The key -k 2.2b,2definition allows the sort key to begin with the second non- blank in the second field and end with the last character in the second field.
d
dictionary ; Dictionary-like sorting. Only alphanumeric characters and blanks are taken into account, the value of LC_CTYPEdefining what is meant by alphanumeric .
f
fold lowercase to uppercase ; Deactivates case sensitivity (differentiation between uppercase and lowercase letters) by sorting characters that have an uppercase letter as equivalent, as if they had been replaced by that letter. LC_CTYPEdefines which pairs of characters correspond.
i
ignore unprintables ; similar to d, instead all non-printable characters are ignored. Here, too, LC_CTYPEit defines what is meant by "not printable".
n
numerical ; instead of alphanumeric sorting, sorting is carried out numerically.
r
reverse ; reverses the sort order. Instead of descending, it is sorted in ascending order.

Leading spaces

The different treatment of leading spaces, depending on whether -tor not they are specified on the command line, causes confusion . Especially when the blank is specified as the field separator , which apparently reflects the default. However, this is the change between another character and a blank.

If -tnot specified, the leading field separator is added to the respective field, therefore leading blanks are added to the first field, while otherwise they are treated like other characters and - in the case of -t' ' - act as field separators . On the other hand, -tthe field separator is not regarded as part of the field when specified . The POSIX standard gives the following example in its explanatory notes ( blanks as <b>represented):

sort <<EOF
<b><b>foo
EOF              # erstes Feld: "<b><b>foo", zweites Feld leer, drittes Feld leer
sort -t'<b>' <<EOF
<b><b>foo
EOF              # erstes Feld leer, zweites Feld leer, drittes Feld "foo"

Individual evidence

  1. a b c sort specification of the Open Group. Retrieved May 2, 2013 .
  2. UNIX® Commands & Utilities Interface Table. Retrieved May 3, 2013 .
  3. Here and in the following, the POSIX sort is described, if not explicitly stated
  4. The Open Group Base Specifications Issue 7, 2018 edition, chap. 12. Utility Conventions. Retrieved May 15, 2019 .