GNU Compiler Collection

from Wikipedia, the free encyclopedia
GNU Compiler Collection

Logo from GNU Compiler Collection
Basic data

developer GNU project
Publishing year May 23, 1987
Current  version 10.2
( July 23, 2020 )
operating system Linux , GNU Hurd , Microsoft Windows
programming language C ++
category Compiler
License GNU General Public License, version 3, GNU Lesser General Public License, version 2.1
German speaking Yes
gcc.gnu.org

GCC is the name of the compiler suite of the GNU project . GCC originally stood for GNU C Compiler . Since GCC can translate some other programming languages besides C today , GCC has meanwhile acquired the meaning of GNU Compiler Collection ( English for GNU Compiler Collection ). The command gcc (in lower case) still stands for the C compiler.

overview

The collection contains compilers for the programming languages C , C ++ , Objective-C , D , Fortran , Ada and Go . The compiler collection is subject to the terms of the GNU General Public License .

GCC is used as the standard compiler by a number of systems, including many Linux distributions , BSD variants, NextStep , BeOS and ZETA . It also offers support for the Cygwin runtime environment and the MinGW developer tools . It has been ported to more systems and computer architectures than any other compiler and is particularly suitable for operating systems that are to run on different hardware platforms. The GCC can also be installed as a cross compiler .

In 2014 he received the Programming Languages ​​Software Award from ACM SIGPLAN.

history

The first public version (0.9) of the GCC was released on March 22, 1987 by Richard Stallman for the GNU project (version 1.0 was released on May 23 of the same year) and is now being developed by programmers all over the world. The expansion of the C compiler package for compiler collection took place within the framework of the EGCS project, which existed for a while parallel to the GCC and finally became the official GCC.

EGCS

In 1997, the Experimental / Enhanced GNU Compiler System ( EGCS , English for experimental / improved GNU compiler system ) project split off from GCC and was reunited with it in 1999.

GCC 1.x had achieved a certain stability in 1991, but architecture- related limitations prevented many improvements, so the Free Software Foundation (FSF) began to develop GCC 2.x. In the mid-1990s, however, the FSF controlled very carefully what could and could not be added to GCC 2.x, so GCC used as an example of the “Cathedral” development model that Eric S. Raymond found in his book The Cathedral and the Bazaar describes.

The fact that GCC is free software allowed programmers who wanted to work in a different direction to develop their own spin-offs . However, many spin-offs turned out to be inefficient and confusing. Many developers were frustrated that their work was often not accepted by the official GCC project, or accepted only with difficulty.

So a group of developers formed EGCS in 1997 to combine several experimental spin-offs into a single project. This included g77 ( Fortran ), PGCC ( Pentium- optimized GCC), the incorporation of many improvements to C ++, as well as compiler versions for other processor architectures and operating systems.

The development of EGCS turned out to be faster, livelier and overall better than that of the GCC project, so that in 1999 the FSF officially stopped the further development of GCC 2.x and instead adopted EGCS as the official GCC version. The EGCS developers became project managers ( maintainer ) of the GCC. From then on, the project was explicitly developed according to the “bazaar” model and no longer according to the “cathedral” model. With the publication of GCC 2.95 in July 1999, both projects were reunited.

Target systems

GCC 4.1.3 in a command line window under Ubuntu 7.10 with Gnome 2.20

The GCC project officially designates some platforms as primary and others as secondary evaluation platforms. Before a new version is released, these two groups in particular are tested. GCC can generate programs for the following processors (primary and secondary evaluation platforms are marked):

There are also a number of processors from embedded systems , such as

Not part of the official GCC, but there are derivatives for derived from it and sold commercially

  • Atmel AVR32
  • Infineon C167
  • Infineon TriCore
  • Microchip PIC 24, dsPIC (only in C) and PIC32 (also in C ++)

In total, the GCC supports more than 60 platforms.

structure

Design Flow from GCC

The external interface of the gcc corresponds to that of a standard Unix compiler.

  1. The user calls a main program with the name gcc.
  2. GCC interprets the command line argument.
  3. GCC determines the programming language of the input file.
  4. The corresponding language compiler is called.
  5. The output is passed to the assembler .
  6. Eventually the linker is called.
  7. A complete, i.e. H. executable program was created.

Each language compiler is a separate program that takes source code and produces assembly language. The diagram on the right gives examples for C and assembler, which both have to undergo preprocessing in which compiler macros , integrated header files and the like are converted in order to obtain pure C code or assembler. That language-dependent frontend parses the corresponding language and creates an abstract syntax tree that is passed to a backend , which transfers the tree to GCC's Register Transfer Language (RTL) (not shown in the diagram), performs various code optimizations and finally generates assembly language .

Originally, most of the GCC was written in C. As part of the “GCC in Cxx” project, the conversion of the gcc sources to C ++ was planned and started in 2010. The aim of this change is to keep the GCC understandable and maintainable. In the follow-up project, the still missing stage 1 of the GCC building process was converted to C ++ code. Exceptions are backends, which are largely formulated in RTL, and the Ada frontend, which is mostly written in Ada.

Front ends

Frontends have to produce trees that can be processed by the backend. How they achieve this is up to them. Some parsers use Yacc- like grammars, others use handwritten, recursive parsers.

Until recently, the program's tree representation was not entirely independent of the target processor. The meaning of a tree could be different for different language front ends, and front ends could provide their own tree code.

With the Tree SSA project, which was integrated into version GCC 4.0, two new forms of language-independent trees were introduced. These new tree formats were named GENERIC and GIMPLE . Parsing is now carried out by converting a temporary language-dependent tree to GENERIC. The so-called "Gimplifier" transfers this complex form into the SSA-based GIMPLE form, from which a number of new language and architecture-independent optimizations can be carried out.

Middleend

Optimization on trees does not actually fit into the scheme of "frontend" and "backend" because they are not language-dependent and do not contain parsing. The GCC developers have therefore given this part of the compiler the name "Middleend". The optimizations currently being performed on the SSA tree include dead code elimination , partial redundancy elimination , global value numbering , sparse conditional constant propagation , and scalar replacement of aggregates . Array-based optimizations such as automatic vectorization as offered by the Intel compiler are currently being developed.

Backend

The behavior of the GCC backend is partly determined by preprocessor macros and architecture-specific functions with which, for example, the endianness , word size, and calling conventions are defined and the register structure of the target machine is described. Using the machine description , a Lisp- like description language, GCC converts the internal tree structure into the RTL representation. Although this is processor-independent in name, the sequence of abstract instructions is therefore already adapted to the target.

The type and number of optimizations carried out by the GCC on the RTL are further developed with each compiler version. To them about include (global) common subexpression elimination , various loop and jump optimization ( English if-conversion, branch probability estimation, sibling calls, constant propagation , ... ) and the combine-matching , in which several instructions combined into a single can be.

Since the recent introduction of global SSA-based optimizations on GIMPLE trees, the RTL optimizations have lost some of their importance, since the RTL representation of the program contains far less of the high-level information important for many optimizations. However, machine-dependent optimizations are also very important, since information about the machine must be available for many optimizations, for example about which instructions a machine knows, how expensive they are and what the pipeline of the target architecture is.

In the "Reload" phase, the basically unlimited number of abstract pseudo- registers is replaced by the limited number of real machine registers, whereby new instructions may have to be inserted into the code, for example to put pseudo-registers on the function stack buffer. This register allocation is quite complicated, as the various characteristics of the respective target architecture must be taken into account.

In the last phase optimizations are performed as peephole optimization (Engl. For peephole optimization ) and delay slot scheduling (Engl. Literally for delay-slot scheduling ) before the right machine-level expression of the RTL is mapped to assembler code by the name of registers and addresses are converted into character strings which specify the instructions.

See also

literature

Individual evidence

  1. www.gnu.org .
  2. Richard Biener: GCC 10.2 Released . July 23, 2020 (accessed July 23, 2020).
  3. a b c Installing GCC - GNU Project - Free Software Foundation (FSF) . (accessed December 11, 2018).
  4. Jens Ihlenfeld: Compiler GCC 4.5.1 published. golem.de, August 2, 2010, accessed April 27, 2015 .
  5. Nikolaus Schüler: The Gcc Compiler - Overview and Operation. 1st edition. bhv, Kaarst 1997, p. 28. ISBN 3-89360-873-7
  6. GNU C compiler beta test release - Message from Google Groups , March 22, 1987, accessed on March 1, 2017.
  7. Alexander Neumann: GCC 4.7 and 25 years of the GNU Compiler Collection. heise.de, March 22, 2012, accessed on March 24, 2012 .
  8. Host / Target specific installation notes for GCC . In: gnu.org , February 23, 2006 (English).
  9. https://gcc.gnu.org/wiki/gcc-in-cxx
  10. Cxx conversion
  11. Thorsten Leemhuis: GCC is increasingly relying on C ++ internally. heise.de, August 16, 2012, accessed on April 26, 2015 .
  12. Autovect Branch Optimizations . In: GCC-Wiki , January 10, 2008 (English).

Web links

Commons : GNU Compiler Collection  - album containing pictures, videos and audio files