Shebang

from Wikipedia, the free encyclopedia
#!

Shebang or hash-bang refers to the character combination #!at the beginning of a script program , similar to a document type definition . In the case of Unix-like operating systems , marking with a pound sign and exclamation mark means that the following command and all specified arguments are executed when the program is called. The file name is then passed as a further argument.

etymology

The term shebang comes from 19th century America with a fairly high degree of certainty. Originally the term probably referred to either a hut, shelter or tent, possibly a place where unlicensed alcohol is drunk ( Irish shebeen ). From the middle of the 19th century the term also referred to a horse-drawn carriage (as used for example by Mark Twain ).

Shebang has been used for over 150 years normally within the expression "the whole shebang", which roughly translates as "stuff, thing, matter". This appears to come from "running the whole shebang," a term that originated in the late American Civil War and was used to refer to officers who kept the tents, camp and unit going, known as quartermasters .

In the Unix context it is interpreted as an abbreviation of sharp bang or hash bang , which refers to the two initial characters. In Unix jargon, the exclamation mark is referred to as bang and the double cross as hash or sharp .

history

The mechanism was introduced in its original form in order to be able to distinguish shell script files for the different Unix shells sh and csh from one another. The first character in the file was used to decide whether it was either “ : ” or “ # ”. These are characters that either introduce comments in the respective script language (" # ") or - in the case of " : " - represent the call of an empty function (NOP) and can therefore be incorporated into the files without impairing the functionality of the script .

The shebang was introduced by Dennis Ritchie in the period between Unix versions 7 and 8 at Bell Laboratories . At the same time it was adopted by BSD- Unix. Since version 8 of Bell's Unix was no longer published, the shebang became widely known through BSD.

implementation

The shebang characters represent a human-readable form of a magic number for executable programs in the ASCII character set ; the magic string corresponds to hexadecimal . This means that the operating system kernel can recognize the file as a script and execute it with the specified interpreter . In this way, the script is regarded as an almost fully-fledged program and can be called as such in the operating system. 0x23 0x21

The prerequisite is that the Unix file rights are set correctly. This means that, as with every program, the execute bit is set; Since in scripts the code is read by the interpreter with the rights of the executing user, the read bit must also be set here.

use

A Hello World program in Perl . The first line contains the path to the interpreter and an argument ( -w).

A typical shebang line looks like this:

#!/bin/sh

This line instructs the operating system to execute this file with the interpreter program /bin/sh, in this case the standard Unix shell .

The shebang line #!/bin/catturns a program into a (spurious) Quine , which outputs its content to standard output by passing its name to the program cat.

Problems

Location

Some storage locations are standardized in the Filesystem Hierarchy Standard (FHS), so that FHS-compliant Unix-like systems must keep the corresponding programs or symbolic links to them on the standardized path. A POSIX- compatible Unix shell is always under /bin/sh. However, not all Unix derivatives are FHS-compliant, and the storage location for other interpreters is not standardized. Therefore it may be necessary to change the shebang line when copying a script from one computer to another.

The program can be used to remedy this env:

#!/usr/bin/env python

env starts the desired program (here Python ) regardless of the storage location by loading the standard environment variables of the operating system configuration - and thus also the environment variable PATH - and then looking for the python program in these program paths. This is how it finds the Python interpreter in this example /usr/bin/python. However, env is not installed on every system and is not necessarily always to be found in the same place.

If the storage location is unclear, the command line command which can also help:

user@localhost:~$ which python
/usr/bin/python

Windows

Windows basically doesn't know the shebang. If, however, program packages that have been developed for Windows and Unix alike are installed under Windows, some program parts often interpret the shebang. For example, the Apache web server “understands” shebangs when it calls up CGI scripts. Here is a possible example of how a Python script is called by Apache:

#!C:\Programme und Anwendungen\Python 2.48\bin\python.exe

When porting scripts from Windows to Unix and vice versa, you should pay attention to line breaks, carriage returns, end-of-files and other special characters to avoid problems.

Shebang as a special form of a comment in the scripting language

By using the shebang, any interpreter can theoretically be called, to which the entire script is then passed for processing. The use of the shebang as a call for the interpreter is only possible if the shebang is ignored by the interpreter, since it does not contain any instructions for the interpreter itself. With the double cross, the shebang is rated as a comment in many script languages and is therefore ignored. Alternatively, the interpreter could always skip the first line.

This is the case with common languages ​​such as Ruby , Perl , Python or PHP , as they use the pound sign for line comments. Other languages, however, use different characters for (line) comments. REXX interpreters, for example, generally view this character as a syntax error . For this reason, not just any interpreter is suitable for calling via the shebang.

Sometimes the Shebang addresses a preprocessor, which evaluates the line, removes it and transfers the rest to an interpreter or compiler. This is the case, for example, with InstantFPC , a command that allows the execution of Pascal scripts with Free Pascal under different operating systems. Although Pascal does not use the "#" character as a comment identifier, the scripts are compiled and executed without errors, as InstantFPC removes the shebang line and extracts any parameters. Starting with version 0.9.31, Lazarus also recognizes the shebang line. In the Lisp variant Scheme and in D , the pound sign is generally not a comment, the shebang line is specifically ignored by the compiler as the first line.

Unicode Byte Order Mark at the beginning of the file

Script files contain text and are classed as text files. Text files encoded in Unicode often begin with a byte order mark (BOM). If such a BOM is at the beginning of a script file, i.e. before the shebang construction, then the shebang construction may not be recognized (this must also be at the beginning by definition). Therefore, in scripts that use a shebang, a BOM at the beginning of the file should be avoided.

End of line

In addition to the first character in the file, the last character of the first line, i.e. the line break character , is also relevant for correct interpretation. If a line break character is used that is unsuitable for the operating system, the command to be executed will inevitably be interpreted incorrectly. Under Unix it is imperative that a shebang line is only ended with the linefeed character ( LF). Windows line endings have LFa carriage return character  ( CR) in front of the character . Under Unix, this CRcharacter is then incorrectly added to the name of the script interpreter to be called.

Example of a control character problem
Script file with Unix line end Script file with Windows line end
Text of the test.pl file
#!/usr/bin/perl
# dies ist ein Kommentar
print("Hallo Welt");
#!/usr/bin/perl
# dies ist ein Kommentar
print("Hallo Welt");
Hexdump the file

23 21 2f 75 73 72 2f 62 #! / Usr / b
69 6e 2f 70 65 72 6c 0a in / perl.
23 20 64 69 65 73 20 69 # dies i
73 74 20 65 69 6e 20 4b st a K
6f 6d 6d 65 6e 74 61 72 comment
0a 70 72 69 6e 74 28 22 .print ("
48 61 6c 6c 6f 20 57 65 Hallo We
6c 74 22 29 3b lt ");

23 21 2f 75 73 72 2f 62 #! / Usr / b
69 6e 2f 70 65 72 6c 0d in / perl.
0a 23 20 64 69 65 73 20. # This
69 73 74 20 65 69 6e 20 is a
4b 6f 6d 6d 65 6e 74 61 comment
72 0d 0a 70 72 69 6e 74 r..print
28 22 48 61 6c 6c 6f 20 ("Hello
57 65 6c 74 22 29 3b World");

You will get the output when you run the same two programs with different line endings on the respective platforms Hallo Welt. If you try to run the Windows program under Unix, it ends with an error, for example:

$ ./test.pl
bash: ./test.pl: /usr/bin/perl^M: bad interpreter: Datei oder Verzeichnis nicht gefunden

This ^Mis a symbolic representation of the carriage return symbol CR. This shebang line can only be used by converting the file to Unix line endings.

In practice, this phenomenon usually only occurs when the program is ported to another platform. In the case of scripts that are actually platform- independent, this leads to a need for action, analogous to the problems mentioned above.

See also

Individual evidence

  1. archive excerpt from 1980 to in-ulm.de ; already available in version 4BSD and activated by default in version 4.2BSD
  2. a b German documentation on InstantFPC
  3. SRFI-22
  4. ^ The D Language Foundation: D Programming Language Specification. (PDF, 1.46 MB) p. 5 , accessed on October 17, 2017 (English).

Web links