Assembler (computer science)

An assembler (also known as assembler ) is a computer program that translates assembly language into machine language. The first assembler was written by Nathaniel Rochester for an IBM 701 between 1948 and 1950 . Assemblers are among the tools used by programmers .

description

More and more machine-level programming - the domain of assembly language - can now be almost completely covered by high-level programming languages . The possibility of creating efficient programs is also offset by the difficult maintainability of assembler programs. More and more contextual knowledge is required for optimal code (for example: cache usage, locality, temporal usage, etc.). An example of this would be the SSE command movntq , which, due to a lack of context knowledge , cannot be used by compilers or can only be used very speculatively . On the other hand, most high-level language compilers only use a small portion of the CPU's instruction set (a feature that led to the development of RISC processors), while the assembler programmer has the full instruction set available, making them more efficient in some situations Can use commands that are not accessible to the pure high-level language programmer. Some programming systems for high-level programming languages allow assembly language commands to be integrated into the source text using inline assembler . The application can then be limited to those situations in which it is necessary or useful to program close to the machine for functional or efficiency reasons. It should be noted that different processor architectures have completely different assembler and machine languages, so that an assembler suitable for the current architecture is required and the programs are not portable or only with great restrictions . Macro assemblers allow the creation of parameterizable instructions. A macro instruction is generally implemented in more than one machine instruction.

Differentiation from high-level language compilers

Assemblers are always specific to one or a few processor types. For the IA32 architecture , for example, the assembly language and assembly language are completely different from that for the MIPS architecture . Some high-level language compilers first translate a program into assembly language and then call an assembler to generate machine language. While high-level languages are more based on human language and are therefore relatively easy to understand, assembly language is closely based on the machine. In assembly language, the opcodes and the reference of data fields (such as add BETRAG,SUMME) as so-called mnemonics correspond to the instruction set of the respective CPU ; its understanding is therefore a prerequisite for assembler. On the other hand, in a high-level language, you hardly or not at all have to worry about the underlying CPU. Completely different demands are made on a compiler for recording the runtime behavior of a program, especially when it encounters recursive functions, or when large amounts of additional source code are generated ( e.g. templates ), and in some cases code is already executed during compilation (compile-time-function evaluation) .

Although simplified and not always applicable, the distinction is often seen in the fact that a compiler converts individual instructions in the source code into several machine instructions, whereas an assembler typically uses a one-to-one mapping.

Tasks of an assembler

Implementation of command mnemonics in the instruction of an assembly language - for example the command code "CLI" in the command code "11111010" (hexadecimal 'FA') of a machine language
Converting data mnemonics into their binary representation - for example "AMOUNT" in address = 4711 and length = 8
Management of constants
Management of addresses of commands or data
Calculation of constants fixed at compilation time mov eax, 4 * 5 + 6 * 7 + OFFSET ProgrammStart
Ignoring comments during code generation
Include other program code files
Interpret and transform macro code
Conditional translation
Bundling of related data (e.g. read-only data)
Rejection of instructions not allowed for this processor or mode
Integration of debugging information or other metadata
Creation of translation listings
Generation of machine code, if necessary as object files for two-stage translation processes with linkers and the possibility of integrating further program parts (e.g. subroutines ) from libraries

Special shapes

Cross assembler

A cross assembler is a special form of assembler that runs on one computer platform H (host) and generates machine code for another computer platform T (target). This makes it a special cross compiler . Cross assemblers are mainly used today in the development of embedded systems in order to create fast and compact code for microcontrollers and DSPs . One example is the cross-assembler ASEM-51 , which runs on the host platforms MS-DOS , Windows and Linux . It generates code for the MCS-51 microcontroller family from Intel (target platform).

Disassembler

A program for translating machine language back into assembly language is called a disassembler . This back translation is possible because - unlike in high-level languages - there is a one-to-one relationship between simple assembly language and machine language. However, identifiers and comments cannot be restored because they are lost during the assembly. Most assembly languages are supplemented by macro functionalities so that this direct mapping is only partially possible.

Machine language monitor

On some platforms, there is a very simple version of an assembler, combined with the ability to test and analyze programs interactively, called a machine language monitor.

Manufacturers and Products

The Microsoft Macro Assembler ( MASM ), the Borland Turbo Assembler (TASM) and the Netwide Assembler (NASM) are widely used for the x86 processor family and compatible processors ( e.g. Intel's Pentium or AMD's Athlon ) . The Flat Assembler ( FASM ) also offers many features that a modern assembler needs. Finally, Yasm is a rewrite from NASM under a BSD license . In addition to assemblers who know the Intel syntax , there are also those who can assemble assembler code in the AT&T syntax, such as the GNU assembler (GAS), which is mainly used under Linux . As of version 2.10, GAS also supports Intel syntax via the .intel_syntax directive . On IBM - mainframe ( System z ) is the High Level Assembler used Hercules -users must either outdated assembler assembler F use or the Tachyon legacy assembler use, which under Linux for z / Series runs. For the microcontroller family MCS-51 from Intel, the first representative of which was the 8051 , there is the free macro assembler ASEM-51. Today there are already hundreds of 8051 derivatives from over 50 semiconductor manufacturers .

Web links

Wikibooks: Assembler programming for x86 processors - learning and teaching materials

Individual evidence

↑ DIN 44300.
↑ Peter Calingaert: assembler, compilers, and program translation. Computer Science Press, Potomac, MD, 1979. ISBN 0-914894-23-4 . Pp. 186-187
↑ Ram Narayam: Linux assemblers: A comparison of GAS and NASM . October 17, 2007. Retrieved July 2, 2008.
↑ Randall Hyde: Which Assembler is the Best? . Retrieved May 18, 2008.
↑ GNU Assembler News, v2.1 supports Intel syntax . April 4, 2008. Retrieved July 2, 2008.

[1] DIN 44300.

[2] Peter Calingaert: assembler, compilers, and program translation. Computer Science Press, Potomac, MD, 1979. ISBN 0-914894-23-4 . Pp. 186-187

[GASvsNASM-3] Ram Narayam: Linux assemblers: A comparison of GAS and NASM . October 17, 2007. Retrieved July 2, 2008.

[WhichAsm-4] Randall Hyde: Which Assembler is the Best? . Retrieved May 18, 2008.

[5] GNU Assembler News, v2.1 supports Intel syntax . April 4, 2008. Retrieved July 2, 2008.