C preprocessor

from Wikipedia, the free encyclopedia

The C preprocessor ( cpp , also C precompiler ) is the preprocessor of the C programming language . In many implementations, it is a stand-alone computer program that is called by the compiler as the first step in translation. The preprocessor processes instructions for inserting source code ( #include), for replacing macros ( #define), and conditional translation ( #if). The language of the preprocessor instructions is not specific to the grammar of the C language. The C preprocessor can therefore also be used to process other file types .

background

The earliest versions of the C programming language did not have a preprocessor. It was introduced at the instigation of Alan Snyder (see also: Portable C Compiler ), but above all to allow other source code files to be inserted in C as in BCPL (a predecessor language of C) and to enable simple, parameterless macros to be replaced. Expanded by Mike Lesk and John Reiser with parameter-based macros and constructs for conditional translation, it developed over time from an optional additional program of a compiler to a standardized component of the programming language. The development independent of the core language explains the discrepancies in the language syntax between C and the preprocessor.

In the early years, the preprocessor was an independent program that passed its intermediate result to the actual compiler, which then translated it. Today, the preprocessor instructions are taken into account by the compilers for C ++ and C in a single operation. Upon request, you can also output the result that a preprocessor would have delivered, either in addition or exclusively.

The C preprocessor as a text substitute

Since the C preprocessor does not rely on the description of the C language, but only recognizes and processes the instructions it knows, it can also be used as a pure text substitute for other purposes.

For example, the C preprocessor is also used to compile resource files . This allows C header files to be embedded and thus enables values, for example with #definedefined numerical values ​​or character strings, to be divided between C code and resource code. It is important that the resource compiler cannot process complex C code. The C code, such as function declarations or structure definitions, can be hidden with #ifor #ifdefconditionally for the resource compiler, with certain macros being defined by the resource compiler ( RC_INVOKED) that are not defined in the C compiler. This also makes use of the header file windows.h, which can thus (partially) be used both in program code and in resource code.

Phases

The C standard defines, among other things, the following four (of a total of eight) translation phases . These four are done by the C preprocessor:

  1. Replacement of trigraph characters by the corresponding single character.
  2. Merging of lines that were split by the backslash (\) at the end of the line (intended for long strings, for punch cards or magnetic tapes with a fixed record length).
  3. Preparation in tokens : The preprocessor breaks down the input into units and spaces that are easier to process for the subsequent compiler phases, and replaces comments with spaces.
  4. Replacing macros and injecting file content: Preprocessor instructions for injecting file contents (in addition to the source text to be translated) and for conditional translations are executed. Macros are expanded at the same time.

Infiltration of file content

The most common use of the preprocessor is to smuggle in other file contents:

#include <stdio.h>

int main(void)
{
    printf("Hello, world!\n");
    return 0;
}

The preprocessor replaces the line #include <stdio.h>with the contents of the header file stdio.h in which the function is printf() declared, among other things . The file stdio.his part of every C development environment.

The #includestatement can also be used with double quotation marks ( #include "stdio.h"). When searching for the file in question, the current directory in the file system is searched in addition to the directories of the C compiler . Options for the C compiler, which in turn forwards them to the C preprocessor, or call options for the C preprocessor can be used to specify the directories in which to search for include files.

A common convention is that include files have the filename extension .h . Original C source files have the filename extension .c . However, this is not mandatory. Content from files with a filename extension other than .h can also be injected in this way.

Within files to be injected, conditional replacement is often used to ensure that declarations for the subsequent compiler phases do not take effect more than once, provided that the file content is injected several times #include.

Conditional replacement

The instructions #if, #ifdef, #ifndef, #else, #elifand #endifare used for conditional substitutions of the C preprocessor such. B.

#ifdef WIN32
    #include <windows.h>
#else
    #include <unistd.h>
#endif

In this example, the C preprocessor checks whether it knows a macro named WIN32. If this is the case, the file content is <windows.h>smuggled in, otherwise that of <unistd.h>. The macro WIN32can be made known implicitly by the translator (e.g. by all Windows 32-bit compilers), by a call option of the C preprocessor or by an instruction using #define.

In the following example, the call to is printfonly retained if the macro has VERBOSEa numeric value of 2 or more at this point:

#if VERBOSE >=2
    printf("Kontrollausgabe\n");
#endif

Definition and replacement of macros

In C, macros without parameters, with parameters and (since C99) also with a variable number of parameters are permitted:

#define <MAKRO_NAME_OHNE_PARAMETER> <Ersatztext>
#define <MAKRO_NAME_MIT_PARAMETER>(<Parameterliste>) <Ersatztext>
#define <MAKRO_NAME_MIT_VARIABLEN_PARAMETERN>(<optionale feste Parameterliste>, ...) <Ersatztext>

For macros with parameters, no space is allowed between the macro name and the opening parenthesis. Otherwise, the macro including the parameter list is used as a pure text replacement for the macro name. To differentiate between functions, the names of macros usually consist exclusively of uppercase letters (good programming style ). An ellipse (“ ...”) indicates that the macro accepts one or more arguments at this point. This can be referred to in the replacement text of the macro with the special identifier __VA_ARGS__.

Macros without parameters are replaced by their replacement text (which can also be empty) when the macro name occurs in the source text. For macros with parameters, this only happens if the macro name is followed by a parameter list, which is enclosed in round brackets and the number of parameters corresponds to the declaration of the macro. When replacing macros with a variable number of parameters, the variable arguments, including the commas separating them, are combined into a single argument and instead __VA_ARGS__inserted in the replacement text .

Macros without parameters are often used for symbolic names of constants:

#define PI 3.14159

An example of a macro with parameters is:

#define CELSIUS_ZU_FAHRENHEIT(t) ((t) * 1.8 + 32)

The macro CELSIUS_ZU_FAHRENHEITdescribes the conversion of a temperature (specified as a parameter t) from the Celsius to the Fahrenheit scale. A macro with parameters is also replaced in the source code:

int fahrenheit, celsius = 10;
fahrenheit = CELSIUS_ZU_FAHRENHEIT(celsius + 5);

is replaced by the C preprocessor to:

int fahrenheit, celsius = 10;
fahrenheit = ((celsius + 5) * 1.8 + 32);

Macros with a variable number of parameters are useful for passing arguments to a variadic function :

#define MELDUNG(...) fprintf(stderr, __VA_ARGS__)

For example:

int i = 6, j = 9;
MELDUNG("DEBUG: i = %d, j = %d\n", i, j);

replaced by the C preprocessor to:

int i = 6, j = 9;
fprintf(stderr, "DEBUG: i = %d, j = %d\n", i, j);

Since in C successive character string literals are combined during the translation, this results in a valid call of the library function fprintf.

Macro over several lines

Since in the second phase of the C preprocessor, the \ character at the end of the line is used to merge into one line, macros can be declared on several lines using this mechanism.

Withdraw macro definition

A previous macro definition can #undefbe undone with. The purpose of this is to make macros only available in a limited section of code:

#undef CELSIUS_ZU_FAHRENHEIT /* Der Geltungsbereich des Makros endet hier */

Conversion of a macro parameter into a character string

If a parameter is placed in #front of a parameter in the replacement text of a macro , the argument is converted into a character string by enclosing it in double quotes (stringized) . The following program outputs string , not hello :

#include <stdio.h>
#define STR(X) #X

int main(void)
{
    char string[] = "hallo";
    puts(STR(string));
    return 0;
}

Concatenation of macro parameters

The concatenation operator ##allows two macro parameters to be merged into one (token pasting) . The following example program outputs the number 234 :

#include <stdio.h>
#define GLUE(X, Y) X ## Y

int main(void)
{
    printf("%d\n", GLUE(2, 34));
    return 0;
}

The operators #and ##enable, with a clever combination, the semi-automatic creation or rearrangement of entire program parts by the preprocessor during the compilation of the program, which can, however, lead to code that is difficult to understand.

Standardized macros

Two predefined macros are __FILE__(current file name) and __LINE__(current line within the file):

 1 #include <stdio.h>
 2 #include <stdlib.h>
 3 
 4 #define MELDUNG(text) fprintf(stderr, \
 5     "Datei [%s], Zeile %d: %s\n", \
 6     __FILE__, __LINE__, text)
 7 
 8 int main(void)
 9 {
10     MELDUNG("Kapitaler Fehler. Programmende.");
11     return EXIT_FAILURE;
12 }

In the event of an error, the following text is output before the program ends:

Datei [beispiel.c], Zeile 10: Kapitaler Fehler. Programmende.

Dangers of macros

  • It is important that when declaring macros with calculations, a sufficient number of brackets are set so that the desired result is always achieved when the macro is called. If, in the example of temperature conversion, the parentheses around the parameter tin the replacement text had not occurred, the result would have been the (mathematically incorrect and undesired) result (celsius + 5 * 1.8 + 32).
  • When calling macros, arguments with the operators ++ and - as well as functions and assignments should be avoided as arguments, as these can lead to undesirable side effects or even undefined code due to possible multiple evaluations.
  • The use of semicolons in the replacement text as the end of a C statement or as a separator between several C statements specified in the macro replacement should be avoided, as this can have side effects on the source text to be translated.

Targeted termination of the translation

The compilation process #errorcan be canceled with the instruction and a message can be output:

#include <limits.h>

#if CHAR_BIT != 8
    #error "Dieses Programm unterstützt nur Plattformen mit 8bit-Bytes!"
#endif

Change the file name and line numbers

Using the instruction #line, it is possible, from the compiler's point of view, to manipulate the number of the following line and also the name of the current source file used for messages. This has an effect on any subsequent compiler messages:

#line 42
/* Diese Zeile hätte in einer Compilermeldung jetzt die Nummer 42. */
#line 58 "scan.l"
/* In einer Meldung wäre dies Zeile 58 der Datei ''scan.l'' */

This mechanism is often used by code generators such as lex or yacc to refer to the corresponding location in the original file in the generated C code. This greatly simplifies troubleshooting.

Influencing the compiler

The preprocessor instruction #pragmaallows the compiler to be influenced. Such commands are mostly compiler-specific, but some are also defined by the C standard (from C99), e.g. B .:

#include <fenv.h>
#pragma STDC FENV_ACCESS ON
/* Im Folgenden muss der Compiler davon ausgehen, dass das Programm Zugriff auf
Status- oder Modusregister der Fließkommaeinheit nimmt. */

literature

Individual evidence

  1. ^ Dennis M. Ritchie : The Development of the C Language. Retrieved September 12, 2010 .
  2. Rationale for International Standard - Programming Languages ​​- C. (PDF; 898 kB) p. 15 (Section 5.1.1.2) , accessed on September 12, 2010 (English).
  3. msdn.microsoft.com
  4. The C Preprocessor - Concatenation. Retrieved July 25, 2014 .