join (Unix)

from Wikipedia, the free encyclopedia

join ( /bin/join) is a program for generating output from two merged ( record- oriented) input data streams, similar to the different variants of the SQL operation JOIN. Two already sorted data streams (files, outputs of subprocesses, results of process substitutions or other input on stdin) are expected as input , the output is on stdout.

The behavior of joinis specified in detail for UNIX systems by the POSIX standard; the general rules for the behavior of command line utilities also apply. The command is mandatory on UNIX systems and other POSIX-compliant systems, and many non-standard operating systems also have an identical or similar command.

application

joinis used to link information from several input data sets and to output the result of this link. A record structure is expected from the input : a table-like structure in which there are columns ( fields ) newlineseparated by separate lines, each with a field separator character . These are files in CSV format, for example, but also others with a similar structure.

If a record in one file corresponds to several records in the other file, then the information in this record is duplicated as often as necessary (i.e. analogous to an outer join ):

A:
   f1 a
   f1 b
   f1 c
B:
   f1 X
Ergebnis:
   f1 a X
   f1 b X
   f1 c X

example

A list with both information is to be created from a list of telephone numbers and one of fax numbers. The files telwith the telephone numbers and faxwith the fax numbers (the large spaces in the field separators are tab characters ) have the following content:

>Name	Tel-Nummer
Anna	123456-123
Karl	123456-456
Sandra	123457-789
>Name	Fax-Nummer
Anna	345678-997
Leo	345679-998
Sandra	345678-999

The naive call would now connect via the first fields (i.e. the names) and only output the values ​​that appear in both files ( inner join ):

# join tel fax
>Name Tel-Nummer Fax-Nummer
Anna 123456-123 345678-997
Sandra 123457-789 345678-999

On the other hand, the output is formatted much more appealing by specifying the separator ( -t, <tab>is a literal tab) and a format specification for the output ( -o). The field separator is used for both input and output:

# join -t'<tab>' -o 0,1.2,2.2 tel fax
>Name	Tel-Nummer	Fax-Nummer
Anna	123456-123	345678-997
Sandra	123457-789	345678-999

In addition, the standard behavior ( inner join ) can also be changed to include keywords that do not appear in both files ( -a) and a standard text -ecan be specified for the missing information ( ):

# join -t'<tab>' -a 1 -a 2 -e '(keine)' -o 0,1.2,2.2 tel fax
>Name	Tel-Nummer	Fax-Nummer
Anna	123456-123	345678-997
Karl	123456-456	(keine)
Leo	(keine) 	345679-998
Sandra	123457-789	345678-999

Finally, the behavior can be inverted so that only those records appear in the output that have no equivalent in both files ( -v). The result is a list of people who either don't have a fax or a phone:

# join -t'<tab>' -v 1 -v 2 -o 0 tel fax
Karl
Leo

Web links

Individual evidence

  1. ^ The Open Group Base Specifications. No. 7, IEEE Std 1003.1, 2013 Edition.
  2. Utility Conventions 12.1 Utility Argument Syntax. In: The Open Group Base Specifications. No. 7, 2013.
  3. Utility Conventions 12.2 Utility Syntax Guidelines. In: The Open Group Base Specifications. No. 7, 2013.