Intermediate code

An intermediate code - also called an intermediate language in the broadest sense - is the code that is generated in the course of a translation process on an abstraction level between the higher-level source language and the target language, which is usually close to the machine. It is primarily a conceptual intermediate step established in compiler construction that is not always associated with the creation of products.

history

In the late 1960s, Martin Richards developed an intermediate code called O-Code (O for object code ) for his programming language BCPL , the forerunner of C and C ++ , which made the actual compiler machine-independent. This made it possible to easily port this compiler to different processors. The O code could then be interpreted or translated into machine-specific code.

The UCSD Pascal environments from the late 1970s used p-code . The attempt to enable completely portable computer programs on the basis of an interpreted bytecode , however, largely failed due to the low speed of the computer systems of the time - at that time one could not and would not afford to slow down due to the additional indirection.

advantages

It can be advantageous not to generate code directly for the processor of the runtime system, but initially only to generate intermediate code for an ideal (or virtual) processor, which is often only simulated by software. Reasons can u. a. be:

Portability or platform independence (see also Java VM ),
Simplification of the translation process (see also p-code ),
general optimizations (efficiency-increasing code transformations) can already be carried out on the intermediate code,
the target processor is not yet comfortable enough to program, e.g. B. because you would like floating point instructions but the processor does not have an FPU - a further compilation step then inserts code that simulates these instructions with the existing integer instructions.

Static single assignment

A special class of intermediate code is the static single assignment representation (also Static Single Assignment Form, abbreviated: SSA ). It is characterized by the fact that each variable is only assigned a value once in the intermediate code. This explicitly shows data dependencies between commands, which is an advantage for many optimizations. The SSA representation is generally only possible with the help of Phi functions . The source programs of many programming languages can be transformed into an SSA representation with little effort. Many modern compilers - including the compilers of the GNU Compiler Collection - therefore use SSA-based intermediate code . Example:

Original code:

 y:= 1
 y:= 2
 x:= y

Intermediate code:

 y₁ := 1
 y₂ := 2
 x₁ := y₂

languages

Although not intended as an intermediate code, C , as an abstraction from assembler and because of its general availability as a de facto system language of Unix-like systems and other operating systems, became a popular intermediate language - Eiffel , Sather , Esterel , some Lisp dialects ( Lush , Gambit ), Haskell ( Glasgow Haskell Compiler ), Squeak 's Smalltalk subset Slang, Cython , Seed7 , Vala and others use C as an intermediate code . Some variants of C were developed to make C a better portable assembly language : C-- and C Intermediate Language .

Microsoft's Common Intermediate Language is an intermediate code that is used by all .NET compilers before it is statically or dynamically compiled into machine code.

The GNU Compiler Collection (GCC) uses several intermediate codes internally to support portability and cross-compilation . These languages include

the historical Register Transfer Language (RTL),
the language-independent tree format GENERIC and
the SSA -based GIMPLE .

Most intermediate code languages were developed for statically typed languages. In contrast, Parrot was developed to support the dynamically typed languages Perl and Python.

Intermediate code

contents

history

advantages

Static single assignment

languages

See also