Pointer (computer science)

from Wikipedia, the free encyclopedia
The pointer a points to variable b . The variable b contains a number (binary 01101101) and the variable a contains the memory address of b (hexadecimal 1008). In this case the address and the data fit in one 32-bit word .
Video tutorial about pointers, dereferencing, fields and pointer arithmetic and how they work in the main memory using the C programming language as an example

With pointer ( English pointer ) is used in the computer science an object of a programming language referred to which a memory address latches.

The pointer references (points, points to) a location in the main memory of the computer. Allows variables , objects or program instructions to be stored. Obtaining the data stored there is referred to as dereferencing or referencing back ; see #pointer operations .

A pointer is a special case and in some programming languages the only way to implement the concept of a reference . In this sense, pointers are also referred to as reference variables.

Pointers are used, among other things, to manage dynamic memory . Thus, certain data structures, such as linked lists , typically using pointers implemented .

The pointer was introduced by Harold Lawson in 1964/65 and implemented in the PL / I programming language.

Pointers in programming languages

Pointers are mainly used in machine-level programming languages ​​such as assembler or powerful languages ​​such as C or C ++ . Use in strictly typed languages ​​such as Modula-2 or Ada is severely restricted and in languages ​​such as Java or Eiffel it is internally available, but hidden ( opaque ) for the programmer . With the first mentioned languages ​​it is possible to generate pointers to any place in the memory or to calculate with them.

Some programming languages ​​restrict the use of pointers because it is easy for programmers to make serious programming errors when working with pointers. For programs written in C or C ++, they are common causes of buffer overflows or memory access violations (SIGSEGVs) and the resulting crashes or security holes .

In object-oriented languages, the pointers are alternatively (C ++) or exclusively (Java, Python) by reference, which, in contrast to pointers, does not have to be explicitly dereferenced.

Pointers are basically not used in the C # or Visual Basic .NET language . All functionality that pointers provide has been replaced by concepts like delegates . However, it is possible in C #, but not in VB.NET, to declare unsafe code (which must also be specially compiled) in order to be able to use pointers, as in C ++. This can in some cases achieve better performance , or it becomes possible to access the Windows API functions.

Typed pointers

In most high-level programming languages, pointers are directly associated with data types. For example, a “pointer to an object of the type integer” can normally only refer to an object of the type “integer”. The data type of the pointer itself is determined by the type to which it refers. In the C programming language, this is a prerequisite for realizing the pointer arithmetic (see below), because the address of the predecessor or successor element can only be calculated by knowing the memory size of the associated type. Typing pointers also enables the compiler to detect type compatibility violations .

Untyped pointers

These pointers are not associated with any data type. They cannot be dereferenced, incremented, or decremented, but must be converted into a typed pointer type before they can be accessed.

Examples for this are the type void * in C and C ++, in Objective-C the type id or POINTER in Pascal.

In some high-level programming languages, there are no untyped pointers.

Zero pointer

The zero pointer is a special value (a so-called zero value , not necessarily numeric 0). If this value is assigned to a variable declared as a pointer in programming language, this indicates that the pointer variable is referring to "nothing". Zero pointers are very popular in almost all programming languages ​​to mark a “designated space”. For example, a singly linked list is usually implemented in such a way that the next pointer of the last element is given the value of the null pointer to express that there is no further element. In this way, an additional field that would have meant the end of the list can be saved.

In Pascal and Object Pascal , for example, the null pointer is called nil(Latin: “nothing” or the acronym for “not in list”). In C, the preprocessor macro contained in the standard library identifies the null pointer and hides the internal representation. The null pointer is also called in C ++ and is defined as a macro for the numeric zero ( ). The constant was introduced in the new C ++ standard C ++ 11, which enables a type-safe differentiation between 0 and the null pointer. In Python there is no null pointer (since there are no pointers), but there is a special type object that can be used for similar purposes. Donald Knuth represents the zero pointer with the symbol , this convention is also used by the tools WEB and CWEB (ibid). NULLNULL0nullptrNoneNoneType

Dereferencing a null pointer is usually not allowed. Depending on the programming language and operating system, it leads to undefined behavior or a program termination via exception handling or protection violation .

Uninitialized pointers

If a pointer variable is dereferenced that does not point to a valid memory area of ​​the corresponding type, this can also lead to unexpected behavior. A situation can arise if a variable was not initialized to a valid value before it was used or if it still refers to a memory address that is no longer valid ( wild pointer ). If the pointer does not point to a valid memory address, a protection violation can occur, as with the null pointer .

Pointer operations

Dereferencing
access the object pointed to by the pointer. In the case of a function pointer z. B. call the referenced function
Increment / Decrement
move the pointer to the object in memory behind / in front of the current object. Internally, this is done by adding or subtracting the object size. This is only known to the compiler if the type of the referenced object is clearly identified during compilation time .
To destroy
of the referenced object (see constructor / destructor ). After calling the destructor, it is advisable to set all variables that contain pointers to the destroyed object to the zero value, so that you can later recognize that no valid object is being referenced. However, this is generally not possible.
to compare
with other pointers to equality / inequality. Some languages ​​also allow a greater-less comparison between pointers.

Pointer arithmetic

Increasing or decreasing a pointer by a fixed amount, or subtracting two pointers, is called pointer arithmetic.

Since these operations are very error-prone, they are usually not supported in high-level programming languages, although there are again other methods for implementing the same functionality.

Properties of pointers to data

advantages

The use of pointers can in certain cases accelerate the program flow or help to save memory space:

  • If the amount of data to be kept in the memory by a program is unknown at the start of the program, exactly as much memory can be requested as is required ( dynamic memory management ).
  • It is possible to return memory that is no longer required to the operating system while the program is running.
  • When using fields or vectors you can quickly jump and navigate within the field using pointers. Instead of using an index and thus addressing the field elements via it, a pointer is set to the beginning of the field at the beginning of the process and this pointer is incremented with each pass. The actual step size of the increment depends on the data type concerned. This type of access to fields is automatically implemented internally in many programming languages and compilers in some places.
  • References to memory areas can be changed, e.g. B. for sorting lists without having to copy the elements ( dynamic data structures ).
  • With function calls, the transfer of a pointer to an object can avoid transferring the object itself, which in certain cases would require a very time-consuming copy of the object ( reference parameter ).
  • Instead of copying variables each time and thus making space available each time, you can in some cases simply have multiple pointers point to the same variable.
  • In the case of strings, memory contents can be addressed directly without having to go through objects and functions.

Disadvantages and dangers

There are languages ​​that deliberately refrain from using pointers (see above). The main reasons for this are:

  • Using pointers is difficult to learn, complicated, and error-prone. Thinking in terms of pointers in particular is often difficult for novice programmers. Even with experienced programmers, careless mistakes in handling pointers are still relatively common.
  • In some programming languages, no effective data type control is possible, that is, when executing, it is not possible to check which data are at the target address and whether they meet the expectations (specifications) of the program flow
  • Programming mistakes when working with pointers can have serious consequences. So it happens B. to program crashes, unnoticed damage to data (by stray pointers), buffer overflows or "lost" memory areas ( memory leaks ): The program constantly requests more memory that is no longer available to other programs, until in extreme cases the operating system is no longer sufficient can deliver.
  • If data structures are made up of pointers that refer to individual small memory blocks, this can lead to fragmentation of the address space, especially in processes that run for a long time , so that the process can not request any more memory, even though the sum of the allocated memory blocks is significantly less than the available memory is.
  • The efficiency of the processor cache suffers when a data structure refers to many memory blocks that are far apart in the address space. It can therefore make sense to use tables or fields ( array ) instead, because these have a more compact representation in memory.
  • The latter can also have negative effects in connection with paging .
  • Last but not least, a pointer is a typical starting point for malware : the malicious program only needs to change one point to point to its own program code: If there is no proper control of the memory area reserved for the program, it can be located anywhere else. In addition, buffer overflows can easily be generated using misdirected pointers. In particular, program codes located in data variables can be executed in this way. This is a typical method for initial infection.

Smart pointers

As Smart pointers (English. Smart pointers ) are objects referred to encapsulate the simple pointers and equipped with additional functions and features. For example, a smart pointer could release a dynamically allocated memory object as soon as the last reference to it is deleted.

Pointer to a COM - or CORBA - interface are in some programming languages (eg. Object Pascal ) as a smart pointer implemented.

Function pointer (method pointer)

Function pointers are a special class of pointers. They do not point to an area in the data segment, but to the entry point of a function in the code segment of the memory. This makes it possible to implement user-defined function calls whose goal is only determined at runtime. Function pointers frequently used in conjunction with callback functions (callback function) to use and provide a form of late binding is.

Member pointer

In C ++, analogous to method pointers , it is also possible to define pointers to the data members of a class:

#include <cstdint>
#include <vector>

using namespace std;

struct RGB_Pixel {
    uint8_t red = 0, green = 0, blue = 128;
};

// definiert Typalias als Zeiger auf uint8_t Datenmember der Klasse RGB_Pixel
typedef uint8_t RGB_Pixel::*Channel;

// invertiert den ausgewählten RGB-Kanal aller Pixel eines Bildes
void invert(vector<RGB_Pixel>& image, Channel channel) {
    for(RGB_Pixel& pixel: image)
        pixel.*channel = 255 - pixel.*channel;
}

int main() {
    vector<RGB_Pixel> image;
    // Memberzeiger zeigt auf den grünen RGB-Kanal
    Channel green = &RGB_Pixel::green;
    // nur der grüne RGB-Kanal wird invertiert
    invert(image, green);
}

See also

Web links

Commons : Pointers (computer science)  - collection of images, videos and audio files

Individual evidence

  1. Roland Bickel: Automated static code analysis for secure software , all-electronics.de from October 22, 2015, accessed on July 4, 2019
  2. MSDN on Unsafe Code and Pointers in C #
  3. Bjarne Stroustrup : C ++ Style and Technique FAQ
  4. Python 3 documentation: Built-in Constants
  5. ^ Donald Knuth: Fundamental Algorithms (=  The Art of Computer Programming ). 3. Edition. Addison-Wesley, 1997, ISBN 0-201-89683-4 , pp. 234 .