join (Unix)
join ( /bin/join
) is a program for generating output from two merged ( record- oriented) input data streams, similar to the different variants of the SQL operation JOIN
. Two already sorted data streams (files, outputs of subprocesses, results of process substitutions or other input on stdin
) are expected as input , the output is on stdout
.
The behavior of join
is specified in detail for UNIX systems by the POSIX standard; the general rules for the behavior of command line utilities also apply. The command is mandatory on UNIX systems and other POSIX-compliant systems, and many non-standard operating systems also have an identical or similar command.
application
join
is used to link information from several input data sets and to output the result of this link. A record structure is expected from the input : a table-like structure in which there are columns ( fields ) newline
separated by separate lines, each with a field separator character . These are files in CSV format, for example, but also others with a similar structure.
If a record in one file corresponds to several records in the other file, then the information in this record is duplicated as often as necessary (i.e. analogous to an outer join ):
A: f1 a f1 b f1 c
B: f1 X
Ergebnis: f1 a X f1 b X f1 c X
example
A list with both information is to be created from a list of telephone numbers and one of fax numbers. The files tel
with the telephone numbers and fax
with the fax numbers (the large spaces in the field separators are tab characters ) have the following content:
>Name Tel-Nummer Anna 123456-123 Karl 123456-456 Sandra 123457-789
>Name Fax-Nummer Anna 345678-997 Leo 345679-998 Sandra 345678-999
The naive call would now connect via the first fields (i.e. the names) and only output the values that appear in both files ( inner join ):
# join tel fax
>Name Tel-Nummer Fax-Nummer Anna 123456-123 345678-997 Sandra 123457-789 345678-999
On the other hand, the output is formatted much more appealing by specifying the separator ( -t
, <tab>
is a literal tab) and a format specification for the output ( -o
). The field separator is used for both input and output:
# join -t'<tab>' -o 0,1.2,2.2 tel fax
>Name Tel-Nummer Fax-Nummer Anna 123456-123 345678-997 Sandra 123457-789 345678-999
In addition, the standard behavior ( inner join ) can also be changed to include keywords that do not appear in both files ( -a
) and a standard text -e
can be specified for the missing information ( ):
# join -t'<tab>' -a 1 -a 2 -e '(keine)' -o 0,1.2,2.2 tel fax
>Name Tel-Nummer Fax-Nummer Anna 123456-123 345678-997 Karl 123456-456 (keine) Leo (keine) 345679-998 Sandra 123457-789 345678-999
Finally, the behavior can be inverted so that only those records appear in the output that have no equivalent in both files ( -v
). The result is a list of people who either don't have a fax or a phone:
# join -t'<tab>' -v 1 -v 2 -o 0 tel fax
Karl Leo
Web links
-
GNU (non-POSIX) variant of
join
Individual evidence
- ^ The Open Group Base Specifications. No. 7, IEEE Std 1003.1, 2013 Edition.
- ↑ Utility Conventions 12.1 Utility Argument Syntax. In: The Open Group Base Specifications. No. 7, 2013.
- ↑ Utility Conventions 12.2 Utility Syntax Guidelines. In: The Open Group Base Specifications. No. 7, 2013.