sed (Unix)
sed stands for Stream EDitor and is a Unix tool that can be used to edit text data streams . The data stream can also be read from a file. In contrast to a text editor , the original file is not changed.
In contrast to an interactive text editor such as vi , it is controlled sed
by means of a script .
The sed
command set is based on that of the line-oriented text editor ed . According to the POSIX specification, a certain type of regular expressions , so-called (POSIX) basic regular expressions (BRE), are used for the text screening . The GNU implementation uses GNU-BRE s, which differ slightly from POSIX-BRE s.
Even if the range of languages sed
seems to be quite limited and specialized, it is a Turing-complete language. The Turing-completeness can be proven by sed
programming a Turing machine with or by writing an interpreter for another Turing-complete language with sed .
As a result, games like Sokoban or Arkanoid and other sophisticated programs like Debugger could and were written with sed.
Working method
sed
can work both within a pipeline and on files . Outputs are always on <stdout>
, error messages on <stderr>
. The typical call therefore looks like this:
sed 'Anweisung1 Anweisung2 … AnweisungN' Eingabedatei > Ausgabedatei <stream> | sed 'Anweisung1 Anweisung2 … AnweisungN' | <stream>
sed
reads an input file (or one input stream
to <stdin>
) line by line. These input data first end up in the so-called pattern space . Each instruction of the specified program is executed one after the other on this pattern space. Each of these instructions can change the pattern space, the following instructions are then executed on the respective result of the last instruction. If one of these changes leads to a null text, processing is aborted at this point and the instruction list starts again from the beginning with the next input line. Otherwise the result of the last instruction is <stdout>
output and the instruction list is also started again with the next input line.
programming
sed
-Instructions can be roughly divided into three groups: text manipulations, branches and others. (Most of the sed
manuals, as well as the POSIX specification, divide instructions into 2-address, 1-address and addressless instructions - see below - but this grouping is not suitable for introductory purposes.)
Text manipulation
This is by far the most frequently used function and the instruction set is particularly extensive here. An instruction generally has the following structure ( 2-address command ):
<Adresse1>,<Adresse2> Kommando [Optionen] |
Adresse1
and Adresse2
can also be omitted. If both addresses are specified, the execution will be carried out Kommando
for each line, starting with the one that Adresse1
matches with up to the one that Adresse2
matches with . If Adresse1
and are Adresse2
not specified, then it is carried out Kommando
for each line, if it is only Adresse2
omitted, then it is Kommando
only carried out for lines that Adresse1
match. An address is either a line number or a regular expression . Regular expressions are /
enclosed in two . Two examples:
sed '/Beginn/,/Ende/ s/alt/NEU/' inputfile |
|
Input | output |
x alt Beginn y alt Ende z alt |
x alt Beginn y NEU Ende z alt |
"Old" is replaced by "NEW", but only from the line that contains "begin" to the line that contains "end" (2-address variant). In contrast, in the second example, the same replacement is carried out in all lines that begin with "y" or "z" (1-address variant):
sed '/^[yz]/ s/alt/NEU/' inputfile |
|
Input | output |
x alt Beginn y alt Ende z alt |
x alt Beginn y NEU Ende z NEU |
Compound commands
Instead of a single command, it can Kommando
also contain a list of instructions { … }
enclosed by. The rules described above apply to these instructions; they can themselves also consist of further composite commands. An example:
sed '/^[yz]/ { s/^\([yz]\)/(\1)/ s/alt/NEU/ }' inputfile |
|
Input | output |
x alt Beginn y alt Ende z alt |
x alt Beginn (y) NEU Ende (z) NEU |
Branches
sed
knows two types of branches: unconditional branches (jump instructions) and conditional ones, which are executed depending on a replacement operation that was performed or not. A typical application example is the following: a source text was indented with the help of leading tab characters , these leading tabs should be replaced by 8 blanks each . Tabs other than those at the beginning of the line can appear in the text, but should not be changed. The problem is that multiplicative links ( replace N tabs with N * 8 blanks ) cannot be expressed as RegExp. On the other hand, a global replacement would also affect the tab characters within the text. Therefore, a loop is formed with jump instructions (in the following, blanks and tabs are symbolized by "<t>" and "<b>" for better understanding):
sed ':start /^<b>*<t>/ { s/^\(<b>*\)<t>/\1<b><b><b><b><b><b><b><b>/ b start }' inputfile |
In each line, the first tab character, provided that it is preceded by zero or more spaces, is replaced by 8 spaces, after which the jump instruction b <Sprungzielname>
ensures that the program execution returns to the first line. If the last leading tabulator character is replaced, the expression /^<b>*<t>/
no longer matches and the block is not executed, so that the end of the program is reached and the next line is read.
The similarity to assembly languages becomes clear here, as a control structure comparable to that used in high-level languages is built up with a condition and a labelrepeat-until
.
Other instructions
Hold space manipulation
A powerful (albeit relatively unknown) function of sed
is the so-called hold space . This is a freely available memory area that works in a similar way to the accumulator known in some assembler languages . Direct manipulation of the data in the hold space is not possible, but data in the pattern space can be relocated to the hold space, copied or exchanged with its content. The pattern space can also be attached to the hold space or vice versa.
The following example illustrates the function of the Hold Space: the text of a "chapter heading" is saved and added to each line of the respective "chapter", but the line with the chapter heading itself is suppressed:
sed '/^=/ { s/^=// s/^/ (/ s/$/)/ h d } G; s/\n// ' inputfile |
|
Input | output |
=Kapitel1 Zeile 1 Zeile 2 Zeile 3 =Kapitel2 Zeile A Zeile B Zeile C |
Zeile 1 (Kapitel1) Zeile 2 (Kapitel1) Zeile 3 (Kapitel1) Zeile A (Kapitel2) Zeile B (Kapitel2) Zeile C (Kapitel2) |
Whenever a line begins with "=", the statement block is executed, which removes this character and provides the remaining line with a leading space and brackets. Then this text is copied into the hold space (“h”) and deleted from the pattern space (“d”), whereby the program for this line is ended and the next line is read. Since the condition of the input block does not apply to “normal lines”, only the last instruction (“G”) is carried out, which appends the content of the hold space to the pattern space.
Multi-line statements
Not all text manipulations can be carried out within individual lines. Sometimes information from other lines needs to be included in the decision-making process, and sometimes cross-line replacements need to be made. For this purpose, the sed
programming language provides the instructions N
, P
and D
, with which several lines of the input text can be loaded into the pattern space (“N”) at the same time and parts of it can be output (“P”) or deleted (“D”). A typical application example is the following one-liner (actually two one-liners), which provides a text with line numbers:
sed '=' inputfile | sed 'N; s/\n/<t>/' |
The first sed
call prints the line number for each line in the input text and then the line itself. The second sed
call combines these two lines into a single one by first reading the following line ("N") and then the automatically inserted line separator ( "\ N") is replaced by a tab character.
Applications, options, notes
Capacity limits
sed
is not subject to any (real) restrictions on file sizes. Aside from the available disk space, which is a practical limit, most implementations implement the line counter as int
or long int
. With the 64-bit processors commonly used today, the risk of overflow can therefore be neglected.
Like most of the UNIX text manipulation tools, sed
the line length is limited (more precisely: the number of bytes up to the following newline
character). The minimum size is defined by the POSIX standard; the actual size can vary from system to system and can be looked up in the kernel header file /usr/include/limits.h
as the value of the constant in each case LINE_MAX
. The length is specified in bytes , not in characters (which is why a conversion is necessary when processing UTF-coded files that represent individual characters with several bytes).
Greedyness
A RegExp
distinction is made between greedy and non-greedy in the scope of s . sed
- RegExp
s are always greedy , which means that they RegExp
always have the longest possible scope:
/a.*B/; "'a', gefolgt von null oder mehr beliebigen Zeichen, gefolgt von 'B'" axyBBBskdjfhaaBBpweruBjdfh ; längstmöglicher Geltungsbereich (greedy) axyBBBskdjfhaaBBpweruBjdfh ; kürzestmöglicher Geltungsbereich (non-greedy) |
The reason is that it is sed
optimized for speed and non-greedy RegExp
s would require extensive backtracking . If you want to enforce a non-greedy behavior, this is usually achieved by negating character classes. In the example above, for example:
/a[^B]*B/ ; "'a', gefolgt von null oder mehr nicht-'B', gefolgt von 'B'" |
Practical Limits in Shell Programming
It should not go unmentioned that the most common use of sed
(but also of awk
, tr
and similar filter programs) in practice - the manipulation of the output of other commands ad hoc , something like this:
ls -l /path/to/myfile | sed 's/^\([^ ][^ ]*\) .*/\1/' # gibt Filetype und Filemode aus |
strictly speaking represents abuse. Since every call of an external program requires the complex system calls fork()
and exec()
, shell-internal methods, such as the so-called variable expansion , are usually superior to calling external programs, even if they have to be written much longer. The rule of thumb for this is: if the output of the filter process is a file or a data stream, the filter program must be used, otherwise variable expansion is preferable.
In-place editing
Due to the way sed
text manipulation is carried out, this cannot be done directly on the input file. A separate file is required as output, which is then copied via the input file if necessary.
sed '…<Anweisungen>…' /path/to/inputfile > /path/to/output mv /path/to/output /path/to/input |
This is also provided for in the POSIX standard. The GNU version of sed offers the command line option in addition to the POSIX standard -i
. This allows a file to be changed in place, apparently without detour , but in fact a temporary file is also created in the background. This is not deleted in the event of an error and the metadata (owner, group, inode number, ...) of the original file is definitely changed.
RegExp
-Notation
It has become common practice to limit regular expressions - as in the examples above - with slashes. sed
does not require this, however. Every character that follows a replacement command is accepted as a delimiter and then expected in the sequence. These two statements are therefore equivalent:
s/^\([^ ][^ ]*\) \([^ ][^ ]*\)/\2 \1/ ; vertauscht erstes und zweites Wort einer Zeile s_^\([^ ][^ ]*\) \([^ ][^ ]*\)_\2 \1_ ; "_" statt "/" |
This is useful if the slash is RegExp
required as part of the , because then you can save yourself the tedious escape of escaping (identifying the use as a literal ). You then simply switch to another, unused character.
Some typical practices
Deletion of text parts
Is done by replacing with nothing . Explicit deletion for parts of a line is only provided from the beginning of the line to the first line separator ( D
). The expression
/Ausdruck/d |
however, does NOT delete the expression part , but every line that contains expression ! Expression functions here as the address (see above, 1-address variant of the command d
).
Addressing at least one character
In the scope of the POSIX-BREs - in contrast to the GNU-BREs - the quantifier \+
for one or more of the preceding expression is not provided. In order sed
to write portable scripts that do not only run with GNU sed, the expression should therefore be doubled and the *
quantifier ( zero or more ) should be used.
/xa\+y/ ; GNU-Variante für "'x' gefolgt von einem oder mehr (aber nicht null) 'a', gefolgt von 'y'" /xaa*y/ ; dasselbe in POSIX: "'x' gefolgt von 'a' gefolgt von null oder mehr 'a's, gefolgt von 'y'" |
Replacement of several or all occurrences within a line
If no further options are specified, only the first occurrence of a search text is subject to the replacement rule:
sed 's/alt/NEU/' inputfile |
|
Input | output |
alt alt alt alt alt alt alt alt alt alt alt alt alt alt alt |
NEU NEU alt NEU alt alt NEU alt alt alt NEU alt alt alt alt |
This behavior, however, can be changed by specifying comma DOptions: If a number N is specified, only that is N changed th occurrence, one g
(for global ) changes all occurrences:
sed 's/alt/NEU/g' inputfile |
|
Input | output |
alt alt alt alt alt alt alt alt alt alt alt alt alt alt alt |
NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU |
Filter specific lines
Basically, sed
always outputs the content of the pattern space after the last instruction. If you want to suppress this behavior for individual lines, you can either delete certain lines using a rule (explicit filtering), but it is also -n
possible to switch off this behavior altogether (implicit filtering) with the command line option . Only what is specified with the express Print
command ( p
) is then output . p
can either serve as a separate instruction or as an option for other instructions. From the text already used above, the example only outputs the "chapter headings":
sed -n 's/^=\(.*\)$/Kapitelüberschrift: \1/p' inputfile |
|
Input | output |
=Kapitel1 Zeile 1 Zeile 2 Zeile 3 =Kapitel2 Zeile A Zeile B Zeile C |
Kapitelüberschrift: Kapitel1 Kapitelüberschrift: Kapitel2 |
Debugging
For troubleshooting, it can be useful to have interim results displayed in order to better understand the development in the pattern space. The option mentioned above p
can be used for this. Lines can be output several times in a row. In the example program above, for example:
sed '/^=/ { s/^=//p s/^/ (/p s/$/)/p h d } p G' inputfile |
Web links
-
sed(1)
: stream editor - Open Group Base Specification -
sed(1)
: stream editor - OpenBSD General Commands Manual -
sed(1)
: Stream editor for filtering and converting text - Debian GNU / Linux Executable programs or shell commands man page - sed project page on sourceforge (English)
- seder's grab bag (english)
- sed for Windows with working -i option ( ZIP ; 50 kB)
- Extensive tutorial (German)
Individual evidence
- ↑ sed specification of the Open Group. Retrieved March 27, 2013 .
- ↑ Implementation of a Turing Machine as Sed Script. Retrieved March 23, 2013 .
- ↑ Turing machine with sed . Retrieved March 17, 2013 .
- ↑ cam.ac.uk ( Memento from April 18, 2010 in the Internet Archive )
- ↑ List of various sed scripts. Retrieved November 19, 2011 .
- ↑ Comparing the Run-Time Efficiency of a ROT13 Algorithm in tr vs. ksh93. Retrieved March 25, 2013 .