Assembly language

An assembly language , also called assembler for short (from English to assemble 'assemble' ), is a programming language that is geared towards the instruction set of a certain computer type (i.e. its processor architecture ).

Assembly languages are therefore referred to as machine-oriented programming languages and - as the successor to direct programming with numerical codes - as second generation programming languages : Instead of a binary code of the machine language , commands and their operands can be written using easily understandable mnemonic symbols in text form (e.g. "MOVE"), Operands e.g. Sometimes as a symbolic address (e.g. "ZIP"), noted and displayed.

The source text of an assembler program is translated into machine code with the help of translation software ( assembler or assembler). In contrast, in high-level programming languages (high-level languages, third generation), a so-called compiler translates more abstract (more complex, not limited to the processor instruction set) instructions into the machine code of the given target architecture - or into an intermediate language.

Colloquially, the terms “machine language” and “assembler (language)” are often used synonymously.

Overview

Source text in assembly language is also referred to as assembly code . Programs in assembly languages are characterized by the fact that all possibilities of the microprocessor can be used, which is seldom necessary these days. They are generally only used if programs or individual parts of them are very time-critical, e.g. B. in high-performance computing or real-time systems . It can also be useful to use them if there is very little memory available for the programs (e.g. in embedded systems ).

Under the aspect of speed optimization, the use of assembler code can still be justified even with available highly optimizing compilers, but advantages and disadvantages should be weighed for the specific application. In the case of complex technology such as Intel Itanium and various digital signal processors , a compiler can u. It can generate better code than an average assembler programmer, since the behavior of such architectures with complex, multi-level intelligent optimizations (e.g. out-of-order execution , pipeline stalls , ...) is highly non-linear. The speed optimization is becoming more and more complex, since numerous secondary conditions have to be met. This is an equally growing problem for the ever-improving high-level language compilers as well as for assembler language programmers. For an optimal code, more and more contextual knowledge is required (e.g. cache usage , spatial and temporal locality of memory accesses), which the assembler programmer can partly (in contrast to the compiler) gain by runtime profiling of the executed code in his intended field of application. An example of this is the SSE command MOVNTQ, which can hardly be used optimally by compilers due to the lack of context knowledge.

Converting machine code back into assembly language is called disassembly . However, the process is lossy, and if there is no debug information, it is highly lossy, since a lot of information such as original identifiers or comments cannot be restored because they were not included in the machine code or calculated during assembly.

description

Program commands in machine language are made up of the operation code ( opcode ) and mostly other information such as addresses, embedded literals, length specifications, etc., depending on the command, etc. Since the numerical values of the opcodes are difficult to remember, assembly languages use more easily memorable abbreviations, so-called mnemonic ones Symbols (short mnemonics ).

Example: The following command in the machine language of x86 processors

10110000 01100001 (in hexadezimaler Darstellung: 'B0 61')

corresponds to the assembler command

    movb $0x61, %al    # AT&T-Syntax (alles nach „#“ ist Kommentar)
                       # mnemonisches Kürzel bedeutet „move_byte von/was , nach“

or.

    mov al, 61h        ; Intel-Syntax; das ‚mov‘ als mnemotechnischem Kürzel erkennt
                       ; aus dem angesprochenen ‚al‘, dass nur 1 Byte kopiert werden soll.
                       ; „mov wohin , was/woher“

and means that the hexadecimal value "61" (decimal 97) is loaded into the lower part of the register "ax"; "Ax" designates the entire register, "al" (for low) the lower part of the register. The high-quality part of the register can be addressed with "ah" (for "high").

The example shows that, although the machine code is translated into the same, the two assembler dialects formulate significantly differently.

At AT&T, the information that a byte is to be copied is in "mov b "; the Intel-mov takes it from the fact that register (part) "al" is one byte in size.
The source and destination of the copying are specified reversed.
The format for addressing a register and for specifying a direct numerical value is also different.

With the help of a computer, one can largely translate one into the other one to one. However, address transformations are carried out so that symbolic addresses can be used. In addition to the actual codes / commands (which it translates into machine code) , the input data for an assembler also contain control instructions that determine its mode of operation, for example to define a base register.

Often more complex assembly languages ( macro assembler ) are used to make programming easier. Macros are calls contained in the source code , which are automatically replaced by (mostly short) sequences of assembly commands before the actual assembly. Simple replacements that can be controlled by parameters can be made. The disassembly of code generated in this way, however, results in the pure assembler code without the macros that are expanded during compilation.

Sample program

A very simple program, the Hello World sample program, which is often used for demonstration purposes , can consist of the following assembly code in the MASM assembly language for MS-DOS :

ASSUME  CS:CODE, DS:DATA        ;- dem Assembler die Zuordnung der Segmentregister zu den Segmenten mitteilen

DATA    SEGMENT                 ;Beginn des Datensegments
Meldung db  "Hallo Welt"        ;- Zeichenkette „Hallo Welt“
        db  13, 10              ;- Neue Zeile
        db  "$"                 ;- Zeichen, das INT 21h, Unterfunktion 09h als Zeichenkettenende verwendet
DATA    ENDS                    ;Ende des Datensegments

CODE    SEGMENT                 ;Beginn des Codesegments
Anfang:                         ;- Einsprung-Label fuer den Anfang des Programms
        mov ax, DATA            ;- Adresse des Datensegments in das Register „AX“ laden
        mov ds, ax              ;  In das Segmentregister „DS“ uebertragen (das DS-Register kann nicht direkt mit einer Konstante beschrieben werden)
        mov dx, OFFSET Meldung  ;- die zum Datensegment relative Adresse des Textes in das „DX“ Datenregister laden
                                ;  die vollstaendige Adresse von „Meldung“ befindet sich nun im Registerpaar DS:DX
        mov ah, 09h             ;- die Unterfunktion 9 des Betriebssysteminterrupts 21h auswaehlen
        int 21h                 ;- den Betriebssysteminterrupt 21h aufrufen (hier erfolgt die Ausgabe des Textes am Schirm)
        mov ax, 4C00h           ;- die Unterfunktion 4Ch (Programmbeendigung) des Betriebssysteminterrupts 21h festlegen
        int 21h                 ;- diesen Befehl ausfuehren, damit wird die Kontrolle wieder an das Betriebssystem zurueckgegeben
CODE    ENDS                    ;Ende des Codesegments

END     Anfang                  ;- dem Assembler- und Linkprogramm den Programm-Einsprunglabel mitteilen
                                ;- dadurch erhaelt der Befehlszaehler beim Aufruf des Programmes diesen Wert

This list contains comparative comparisons for the Hello World program in different assembler dialects .

In a Pascal source text (a high-level language), on the „Hallo Welt“other hand , the program code can be significantly shorter:

program Hallo(output);
begin
  writeln('Hallo Welt')
end.

Different assembly languages

Every computer architecture has its own machine language and thus assembly language. Sometimes there are also several assembly language dialects (“different assembly languages” and associated assemblers) for the same processor architecture. The languages of different architectures differ in the number and type of operations.

However, all architectures have the following basic operations:

Read and write data from / to main memory in / from the processor (generally from / to a register ); almost always from register-to-register, mostly from main memory to main memory,
simple logical operations (e.g. bit operations such as AND / OR / NOT / SHIFT ),
simple control of the program flow (especially through processor flag- related jumps),
simple arithmetic operations (e.g. integer addition, integer comparison).

Certain computer architectures often also have more complex commands ( CISC ) such as B .:

Calls to input and output devices,
apply a simple operation (e.g. addition) to a vector of values,
Memory block operations (e.g. copy or fill with zeros),
Higher arithmetic: Instructions that could be copied by (several) simple ones (e.g. "Decrease the value in register A by 1; if it is now = 0, jump to program position xyz" ( DJZ A,xyz~ 'decrement A, Jump if Zero to xyz ')),
Floating point arithmetic such as floating point addition, multiplication, sine, cosine and root calculation (either implemented via special additional processors or via software routines),
massive, direct parallel programmability of the processor, e.g. with digital signal processors ,
Synchronization with other processors for SMP systems,
Interrupt controls that are particularly required for process computers .

history

The first assembler was written by Nathaniel Rochester for an IBM 701 between 1948 and 1950 .

In the 1980s and early 1990s, the language in which operating systems were written for larger computers changed from assembler to high-level languages, mostly C, but also C ++ or Objective C. The main trigger was the increasing complexity of operating systems with larger available memory in the area above of one megabyte. In assembler, for example, registers are temporarily saved when the process changes (see scheduler ), or in the x86 architecture, the part of the boot loader that must be accommodated within the 512-byte master boot record . Parts of device drivers are also written in assembly language if efficient hardware access is not possible from the high-level languages. Some high-level language compilers allow assembler code , so-called inline assembler , to be embedded directly in the actual source code .

Up until around 1990, most computer games were programmed in assembly language, as this was the only way to achieve an acceptable game speed and a program size that did not exceed the small memory of these systems on home computers and the game consoles of the time . Even today, computer games are among the programs in which smaller assembly-language program parts are most likely to be used in order to use processor extensions such as SSE .

In the past, many applications for devices that are controlled by microcontrollers often required programming in assembler in order to optimally utilize the scarce resources of these microcontrollers. In order to translate assembly code for such microcontrollers into machine code, cross assemblers are used during development. Today, microcontrollers are so cheap and powerful that modern C compilers have largely replaced assemblers in this area too. Not least because of the larger program memory with low surcharges for the chips, the advantages of high-level languages over the sometimes minor advantages of assembly language are becoming increasingly important.

Comparison to programming in a high-level language

disadvantage

Assembler programs are written very close to the hardware , as they directly map the different specifications and instruction sets of the individual computer architectures (processor architecture). Therefore, an assembly language program can i. A. not be transferred to another computer system (different processor architecture) without the source code being adapted. Depending on the differences in the assembly languages, this requires a great deal of effort to convert; it may be necessary to completely rewrite the program text. In contrast, high-level languages often only have to use one compiler for the new target platform.

Source texts in assembly language are almost always significantly longer than in a high-level language, since the instructions are less complex and therefore certain functions / operations require several assembler commands; z. For example, when comparing data logically (=> <…), unequal data formats or lengths must first be adjusted. The resulting larger number of commands increases the risk of creating confusing, poorly structured and poorly maintainable program code .

advantages

Assembler is still used to micro-optimize calculations for which the high-level language compiler does not generate sufficiently efficient code. In such cases, calculations can be programmed directly in assembler more efficiently. For example, in the field of scientific computing, the fastest variants of mathematical libraries such as BLAS or, for architecture-dependent functions such as the C standard function, are memcpystill those with assembler code. Certain, very system-related operations bypassing the operating system (e.g. writing directly to the screen memory) cannot be carried out in all high-level languages.

The benefit of assembler also lies in the understanding of how a system works, which is hidden by constructs in high-level languages. Even today, assembler is taught at many universities in order to gain an understanding of the computer architecture and how it works.

literature

Gerhard Niemeyer: Introduction to programming in ASSEMBLER. Systems IBM, Siemens, Univac, Comparex, IBM-PC / 370 . 6th revised and expanded edition. de Gruyter, Berlin a. a. 1989, ISBN 3-11-012174-3 ( De Gruyter textbook ).
Joachim Rohde: Assembler packed. (Quick and effective look-up of all relevant instruction sets for AMD and Intel. MMX and 3DNow! SSE and its extensions) . 2nd updated edition. Mitp-Verlag, Heidelberg 2007, ISBN 978-3-8266-1756-0 ( The packed reference ).
Joachim Rohde, Marcus Roming: Assembler. Programming basics. (Theory and practice under DOS and Windows. MMX and 3DNOW! Optimizing programs and reverse engineering) . 2nd updated and expanded edition. Mitp-Verlag, Bonn 2006, ISBN 3-8266-1469-0 ( 3-8266-1469-0 ).
Jeff Duntemann: Assembly Language Step-by-Step. Programming with DOS and Linux . 2nd Edition. Wiley, New York NY et al. a. 2000, ISBN 0-471-37523-3 (with 1 CD-ROM).
Paul Carter: PC Assembly Language , 2001.
Robert Britton: MIPS Assembly Language Programming . Prentice Hall, Upper Saddle River NJ 2003, ISBN 0-13-142044-5 .
Steve McConnell : Code Complete. A practical handbook of software construction . Microsoft Press, Redmond WA 1993, ISBN 1-55615-484-4 .

Web links

Wiktionary: assembly language - explanations of meanings, word origins, synonyms, translations

Wikibooks: assembler programming - learning and teaching materials

Randall Hydes: The Art of Assembly Language (HTML and PDF, but deals with its own, more abstract form of language)
Assembler X86 command lists / OpCode and descriptions
x86-64 Assembly Language Programming with Ubuntu by Ed Jorgensen (available as PDF)
i8086.de 8086/88 assembler command reference
Pentium instruction set and clock table
Sandpile.org
Compiler Explorer - interactive translation of various source languages in the web browser

Individual evidence

↑ Computer Science Duden . ISBN 3-411-05232-5 .
↑ herring, Gutekunst, Dyllon: Handbook of practical and technical computer science . P. 302, books.google.de
↑ Jörg Roth The machine program level of a computer Chapter machine language and assembler
↑ Måns Rullgård: bit-field badness. hardwarebug.org, January 30, 2010; archived from the original on February 5, 2010 ; accessed on March 4, 2010 (English).
↑ Måns Rullgård: GCC makes a mess. hardwarebug.org, May 13, 2009, archived from the original on March 16, 2010 ; accessed on March 4, 2010 (English).
↑ John Markoff: Writing the Fastest Code, by Hand, for Fun: A Human Computer Keeps Speeding Up Chips. New York Times , November 28, 2005, accessed March 4, 2010 .
↑ BLAS Benchmark August 2008. eigen.tuxfamily.org, August 1, 2008, accessed March 4, 2010 .
^ Mike Wall: Using Block Prefetch for Optimized Memory Performance. (PDF; 136 kB) mit.edu , March 19, 2002, accessed on September 22, 2012 (English).
^ Agner Fog: Optimizing subroutines in assembly language. (PDF; 873 kB) Copenhagen University College of Engineering, February 29, 2012, p. 100 , accessed on September 22, 2012 (English): "12.11 Loop unrolling"

[InfDud-1] Computer Science Duden . ISBN 3-411-05232-5 .

[2] rring, Gutekunst, Dyllon: Handbook of practical and technical computer science . P. 302, books.google.de

[3] Jörg Roth The machine program level of a computer Chapter machine language and assembler

[bit-fild-4] Måns Rullgård: bit-field badness. hardwarebug.org, January 30, 2010; archived from the original on February 5, 2010 ; accessed on March 4, 2010 (English).

[gcc-mess-5] Måns Rullgård: GCC makes a mess. hardwarebug.org, May 13, 2009, archived from the original on March 16, 2010 ; accessed on March 4, 2010 (English).

[goto-6] John Markoff: Writing the Fastest Code, by Hand, for Fun: A Human Computer Keeps Speeding Up Chips. New York Times , November 28, 2005, accessed March 4, 2010 .

[goto-bench-7] BLAS Benchmark August 2008. eigen.tuxfamily.org, August 1, 2008, accessed March 4, 2010 .

[amd2002-8] Mike Wall: Using Block Prefetch for Optimized Memory Performance. (PDF; 136 kB) mit.edu , March 19, 2002, accessed on September 22, 2012 (English).

[fog2012-9] Agner Fog: Optimizing subroutines in assembly language. (PDF; 873 kB) Copenhagen University College of Engineering, February 29, 2012, p. 100 , accessed on September 22, 2012 (English): "12.11 Loop unrolling"