awk

from Wikipedia, the free encyclopedia

awk is a programming language for processing and evaluating any text data, including CSV files . The associated interpreter should be viewed more as a compiler because the program text is first completely compiled and then executed. awk was primarily designed as a report generator and was one of the first tools to appear in version 3 of Unix . Awk can be seen as a further development or an addition to the stream editor sed , they share certain syntactic elements such as regular expressions . In contrast to sed, however, C-like structures (if .. then .. else, different loops, C formats ...) are available in awk, which allow a much easier program structure. In the minimum application is awk in shell - scripts used to assemble as a filter, for example, filename. With more detailed programs it is possible to edit, reshape or evaluate text files. In addition to the usual string functions, there are also basic mathematical functions available. The name "awk" is composed of the first letters of the surnames of its three authors Alfred V. A ho , Peter J. W einberger and Brian W. K ernighan .

A version of awk can be found in almost every Unix-like system today and is often preinstalled. A comparable program is also available for almost all other operating systems.

The language works almost exclusively with the data type string ( english string ). Associative arrays (i.e. arrays indexed with strings, also called hashes ) and regular expressions are also fundamental parts of the language.

The performance, compactness, but also the limitations of the awk and sed scripts inspired Larry Wall to develop the Perl language .

Building a program

The typical execution of an awk program consists of performing operations - such as replacements - on an input text. To do this, the text is read in line by line and split into fields using a selected separator - usually a series of spaces and / or tabs . The awk instructions are then applied to the respective line.

awk statements have the following structure:

  Bedingung { Anweisungsblock }

It is determined for the read line whether it fulfills the condition (often a regular expression ). If the condition is met, the code is executed within the statement block enclosed in curly braces. Deviating from this, a statement can only come from one action

  { Anweisungsblock }

or just from a condition

  Bedingung

consist. If the condition is missing, the action is carried out for each line. If the action is missing, the standard action is to write the entire line, provided the condition is met.

Variables and functions

The user can define variables within statement blocks by referencing them; an explicit declaration is not necessary. The scope of the variable is global. Function arguments are an exception, the validity of which is restricted to the function that defines them.

Functions can be defined at any point, the declaration does not have to be made before the first use. If they are scalars , function arguments are passed as value parameters , otherwise as reference parameters . The arguments when calling a function do not have to correspond to the function definition, excess arguments are treated as local variables, omitted arguments are given the special value uninitialized - numeric zero and the value of the empty string as a character string.

Functions and variables of all kinds use the same namespace, so that the same naming leads to undefined behavior.

In addition to user-defined variables and functions and standard variables and standard functions are available, for example, the variables $0for the entire line $1, $2... for each ith field of the line, and FS(of English. Field separator ) for the field separator and the functions gsub () , split () and match ().

Commands

The syntax of awk is similar to that of the C programming language . Elementary commands are assignments to variables, comparisons between variables as well as loops or conditional command executions (if-else). There are also calls to both permanently implemented and self-programmed functions.

Output of data on the standard output is possible with the " print" command. To print out the second field of an input line, for example, the command

  print $2

used.

conditions

Conditions in awk programs are either of the form

 Ausdruck Vergleichsoperator Ausdruck

or of the shape

 Ausdruck Matchoperator /reguläres Suchmuster/

Regular search patterns are formed like the grep command, and match operators are ~ for "pattern found" and ! ~ For "pattern not found". "/ Regular search pattern /" can be used as an abbreviation for the condition "$ 0 ~ / regular search pattern /" (ie the entire line meets the search pattern) .

The words BEGIN and END apply as special conditions , in which the associated statement blocks are executed before reading in the first line or after reading in the last line.

In addition, conditions can be combined with logical links to form new conditions, e.g. B.

  $1 ~ /^E/ && $2 > 20 { print $3 }

This awk command causes the third field of every line that begins with E and whose second field is a number greater than 20 to be output.

Versions, dialects

The first awk version from 1977 was revised in 1985 by the original authors, which was referred to as nawk ("new awk"). It offers the possibility of defining your own functions as well as a large number of operators and predefined functions. The call is mostly made via "awk", since a distinction between the two versions has become obsolete.

The GNU project of the Free Software Foundation provides a further expanded free variant under the name gawk .

Another free implementation is mawk by Mike Brennan. mawk is smaller and faster than gawk, but this comes at the price of some restrictions.

Also BusyBox contains a small awk implementation.

literature

Web links

Wikibooks: Awk  - learning and teaching materials