Pipe (computer science)

from Wikipedia, the free encyclopedia

A pipe or pipeline ( English pipeline ) is a data stream between two processes by a buffer with the principle First In - First Out (FIFO). In simplified terms, this means that a result of a computer program is used as input for a further program. Pipes were invented in 1973 by Douglas McIlroy for the Unix operating system .

How it works using an example

The following example should explain the function of a pipe:

ls -R ./Bilder | grep -ic '\.jpg$'

In the example, two small programs are connected via a pipe (symbol "|"). According to the Unix philosophy , these are programs that do their job very well and can work together.

  • the ls program is used to list the content of file paths in text form
  • grep is a program that can search text for character strings

In the example ls is instructed to list the directory Pictures including the subdirectories (switch R ). Ls sends the result (a text) to a pipe.

The grep program receives the text in the pipe . You are instructed to search the text for the string .jpg (the period of ".jpg" is masked with a backslash because it has two meanings for grep ). The switches i and c were also given to the grep program . i means that uppercase and lowercase letters are treated equally in the character string (“JPG” is also found); the switch c instructs grep not to list the occurrences, but to output their number.

The result of the pipe is the number of JPG images under the ./Images directory.

At first glance, it seems easier if the program ls is equipped with more functionality to avoid the cumbersome notation of the pipe (like the dir command in DOS and its successor). Supporters of the Unix philosophy emphasize, however, that dividing a larger problem into sub-problems is more effective for appropriately specialized programs.

Create a pipe

Under most operating systems and programming languages, when a pipe is requested by the pipe () system call, the operating system returns two access identifiers ( handles ) that are required for writing to and reading from the pipe. Child processes also inherit access to these handles . When the last process that has access to an active pipe is terminated, the operating system terminates it.

Pipe variants

There are anonymous and named pipes.

Anonymous pipes have three major limitations:

  • They can only be used in one direction: one process writes, the other reads.
  • They can only be used for communication between closely related processes.
  • The maximum amount of data that a pipe can hold is relatively small.

Named Pipes ( Named Pipes ) can, however also be used for communication between processes that are not related to each other and on different computers within a beyond network may be located. They are more flexible than anonymous pipes and are suitable for so-called client-server applications ( RPCs can also be implemented). Named pipes enable simultaneous communication in both directions, which means that data can be exchanged between the processes in full duplex mode .

Any process that knows the name of a named pipe can use this name to establish the connection to the pipe and thus to other processes.

Pipes in operating systems

Pipes are implemented in various operating systems ; most of them offer both anonymous and named pipes.

Unix

Anonymous pipes

Pipes are one of the most powerful tools under Unix and Unix-like operating systems to enable the sequential processing of commands on a specific database.

With an anonymous pipe , communication is limited to several processes of the same origin. This (original) relationship is mostly created by forks . In the shell , an anonymous pipe is created when the programs are started by entering a "|" character. The shell is then the (common) parent process of all processes and does the forks automatically.

Example:

grep '.sshd.*Invalid user' /var/log/messages | awk '{print $NF}' | sort -u

Here the system log file is searched for the search term “sshd” followed by the text “Invalid user” in the same line. Then the last field is cut out of the message ("awk ..."), which contains the IP address of the computer from which the ssh access originated. Finally, these IP addresses are sorted so that they do not occur more than once ("sort --unique").

The programs involved in such a pipe accept input data (except for the first) from their standard input and provide (except for the last) output data on their standard output , see also Filters (Unix) . If an application requires the specification of file names for the input or output, then by specifying a minus sign as the file name, it is often possible to write to the standard output or read from the standard input. If you have implemented this convention, writing to or reading from a pipe is also possible with such applications.

Example:

tar cf - /home/user/ogg/mycolouringbook | ssh -l user server "cd /var/ogg && tar xvf -"

Here the contents of a directory are packed into an archive with tar , sent to another computer via an SSH connection and unpacked there.

Named pipe

A named pipe , also called FIFO (from first-in-first-out), is a pipe that can be opened by two processes at runtime using a file name for reading or writing. With a named pipe , the processes do not have to have a common origin; the processes only need to be authorized to access the pipe and know the name of the pipe.

Example:

mkfifo einefifo
cat /var/log/messages > einefifo &
grep sshd < einefifo

FIFOs outlast the processes that use them because they are part of the file system. However, a FIFO cannot have any content as long as it is not opened by any process. This means that the buffered content is lost if a writing process closes its end of the tube without a reading process having opened the other end.

Use of a one-way Unix pipe

This source code has the same effect as the shell statement who | sort

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

int main(void){

// pipe_verbindung[0] zum Lesen und pipe_verbindung[1] zum Schreiben
int pipe_verbindung[2];

//Initialisierung durch die Funktion Pipe
pipe(pipe_verbindung);

//Kindprozess erzeugen
if (fork()==0){
        // dup2 verbindet den Filedeskriptor der Pipe mit dem Filedeskriptor der Standardausgabe
        dup2(pipe_verbindung[1],1);

        // der Leseausgang muss geschlossen werden, da dieser Prozess nichts liest
        close(pipe_verbindung[0]);

         // Kommando ausführen, Standardausgabe des Kommandos ist mit der Pipe verbunden
        execlp("who","who",NULL);
        }
// dann zweiten Kindprozess erzeugen
else if (fork()==0){
        dup2(pipe_verbindung[0],0);
        close(pipe_verbindung[1]);
        execlp("sort","sort",NULL);
        }
}

The writing process (process 1) initially also has read access to the pipe (unidirectional). Therefore he has to lock his file descriptor (0) for reading. In the same way, the reading process (process 2) initially has write access to the pipe. Therefore he has to lock his file descriptor (1) for writing. If the descriptors that are not required are not blocked, complications arise: if, for example, process 1 no longer has any data to send, it terminates. However, process 2 will not terminate because a file descriptor (1) is still set to write (its own) for further input on the pipe. He waits, but no data is coming. Another conceivable scenario is that process 1 cannot terminate because it has read access to the pipe and waits forever and ever for data from the opponent, but which will never arrive, because firstly it has long been terminated and secondly it has never been sent Has.

Windows

Windows knows anonymous and named pipes. Named pipes can be addressed as \\ ServerName \ pipe \ PipeName via the Pipe API, analogous to the SMB shares .

Anonymous pipes are possible in the Windows command prompt, e.g. For example, with the find command in the output of a dir command, you can only get output that contains certain character strings in the path or file name (which is not always completely covered by the dir syntax): The following command input shows all subdirectories with contained .java files that contain the word Render in the path , as well as the .java files themselves, whose file name contains Render :

dir *.java /s | find "Render"

OS / 2

OS / 2 knows anonymous and named pipes. Named pipes are one of the most powerful IPC methods OS / 2 has to offer. When a server process creates a named pipe, it can generate several instances of this pipe, all of which are addressed under the same name: A named pipe can also work in multiplex mode, so that a single server process can serve several clients at the same time.

Using an unnamed pipe in C

The program reads in user input and then uses a pipe to communicate the data to a child process. This converts all entries into capital letters (toupper) and outputs them.

#include <ctype.h>
#include <sys/wait.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// Die maximale Laenge der Eingabe wird auf 2048 Bytes festgelegt.
#define MAX_ZEICHEN 2048

int main(void) {
	int fd[2], n, i;
	pid_t pid;
	char zeile[MAX_ZEICHEN];

	// Wir erstellen die Pipe. Tritt dabei ein Fehler auf, gibt die
	// Funktion -1 zurueck, so dass wir schon hier moegliche Fehler
	// abfangen und behandeln koennen.
	if (pipe(fd) < 0)
		fprintf(stderr, "Fehler beim Erstellen der pipe()");

	// Ein Kindprozess wird erstellt.
	if ((pid = fork()) > 0) {
		// Im Elternprozess
		close(fd[0]);
		fprintf(stdout, "Eltern : ");
		fgets(zeile, MAX_ZEICHEN, stdin);
		write(fd[1], zeile, strlen(zeile));

		if (waitpid(pid, NULL, 0) < 0)
			fprintf(stderr, "Fehler bei waitpid()");
	}

	// In den else-Zweig gelangt nur der Kindprozess
	else {
		// Im Kindprozess
		close(fd[1]);
		n = read(fd[0], zeile, MAX_ZEICHEN);

		for (i = 0; i < n; i++)
			zeile[i] = toupper(zeile[i]);
		fprintf(stderr, "Kind : ");

		write(STDOUT_FILENO, zeile, n);
	}
	exit(0);
}

history

Pipes were invented by Douglas McIlroy in 1972/73 for the Unix operating system . As early as 1964, at the beginning of the Multics project (the forerunner of Unix), he demanded in a memo:

"We should have some ways of connecting programs like garden hose - screw in another segment when it becomes necessary to massage [sic?] Data in another way."

- Douglas McIlroy

See also

Web links

Individual evidence

  1. ^ Dennis Ritchie : Advice from Doug Mcilroy