Debug symbol

from Wikipedia, the free encyclopedia

In computer science, debug symbols are information that can be created for debugging executable files. These can be obtained directly from the source text , especially identifiers such as B. variable names, names of procedures and functions, etc.

Problem

When compiling the source code of a program into the machine code or bytecode , identifiers, sometimes even the original program structure, are lost (e.g. loop unrolling ). Identifiers are no longer required in the compiled program and would therefore occupy memory unnecessarily. The program structure is changed by many compilers during the optimization process ( loop unrolling in order, for example, to avoid conditional jumps and to use the instruction pipeline of modern processors) or even dissolved and replaced by other constructs (e.g. vectorization of repeated, similar operations on an array to SIMD use capabilities). If the compiler replaces loops in the program code with machine commands that combine iterated instructions (i.e. commands executed one after the other) to form a single machine language instruction, troubleshooting in the program flow becomes difficult or even impossible (see Black Box ).

The options for debugging executable files and dynamic-link libraries (DLLs) at the machine code level are then essentially limited to the output of the associated assembler commands and the current processing status ( machine register , program counter, data areas of the memory in tabular form).

It is usually difficult to understand the flow of a program in the event of an error. Special knowledge of the computer architecture and assembly language must also be available.

solution

For this reason, when compiling the program, the developer can instruct the compiler to include additional information about the program in the machine language that makes it easier to debug a program and is referred to as debug symbols or symbol information . The compiler then usually fails to make extensive optimizations. The term symbol is used in this context in the sense of identifier . Subsequently, the program events can be traced on the source language level of the programming language using a symbolic debugger .

Such debug information includes, among other things, the symbol table , which contains and manages information on functions and global variables that are defined or referenced in the program (assignment between symbolic names and machine addresses). In addition, expressions in the source language can be evaluated by the debugger, for example by creating a match between the source code and the corresponding architecture-dependent assembler code.

disadvantage

Since the information is usually included when the program is compiled in machine language, the resulting executable files are considerably larger. They are removed again in the final version of a program or can be saved as a separate file.

In addition, these symbol tables make it possible to obtain recovered source text that is much easier to understand after decompiling. This is a disadvantage especially for companies whose software source code should remain a company secret (see obfuscation ).

Since the compiler generally neglects most of the optimization, the execution speed is sometimes significantly reduced.

Commercial handling

Some companies provide separate debugging symbols for their files that can be downloaded separately for purposes of debugging their programs. Microsoft's debugger WinDbg , for example, is able to automatically download debug symbols for Windows DLLs if the source code is not available.