Field (data type)

from Wikipedia, the free encyclopedia

A field ( English field, array [ əˈɹeɪ ] (emphasis on 2nd syllable) for 'arrangement', 'arrangement' etc.) is a data structure variant in computer science , with the use of which “a lot of similarly structured data […] can be processed should". The access to certain contents of a field takes place with the help of indices which designate its position.

Terms

Synonyms

In linguistic usage and mainly shaped by the use in various programming languages ​​of their terminology and history and through translation from English, the term `` field '' described here - with the meaning `` array '' - is used with different expressions: array, table, vector, Row, sequence, data field, lineup, area, gate array, matrix , etc. "The English and more common term for field [in this meaning] is array".

The individual elements of an array are also referred to with different expressions: element, component, subfield, field element, indexed variable - sometimes also “field” or “ data field ”.

Depending on the context, the reference to 'field' alternatively relates to the declaration level of the array or stored content .

Different meaning

The term 'field' is also generally understood as an elementary, data-describing construct that is used in the source code of a computer program to define storage space. In this sense , a field is not a data type, but has a data type , and it does not matter whether it is an array or a higher-level data structure, e.g. B. Verbund or Record, listened to or not. 'Field' in this sense is essentially equated with variable or data field . Literature examples:

  • Expressions like static field, class field, field length etc. refer to 'field' in a general sense.
  • 'Field' is used (e.g. in) as a type-neutral data construct.
  • The same applies to texts such as "... content of the order identifier field" (in the link to siko.de) and "... treats each array element as a separate field" in or expressions such as "fields for data exchange", "... transfers data to a field "," Content of the data field "," Field description "," Field length ", for example in a software manual.
  • Even in documents that basically use 'field' in the sense of 'array', the expression 'field' is also used non-array-related: "Here the array has 10 fields" or "all other fields have an undefined value" in

The two essentially different meanings of 'field' are and have always been the cause of heated discussions.

Indices

A so-called 'index' is used to address an individual element in a field. For multi-dimensional fields there is an index for each dimension.

In high-level programming languages, the index is specified as an integer for ' standard fields ' . ' Associative fields ', on the other hand, allow the use of any, not necessarily numeric, but unique key values ​​as indices.

Language-specific differences

Basically, arrays can be created and processed in most programming languages . In addition to the different terms that have developed in individual languages, arrays are implemented and supported differently by the compilers of the programming languages, sometimes also in different language versions. Examples:

  • Distinction between standard field / associative array / only lists by the compiler of the programming language
  • Number of possible dimensions (intrinsically multidimensional arrays, array-in-array)
  • Maximum array size
  • Addressing the elements (from 0, from 1 or from any index, possibly also starting with a negative index)
  • Possible data formats and lengths in the array
  • Number of subfields: fixed, dynamic / variable per dimension
  • Format of the subfields: uniform for all index values, different
  • Support for operations on datasets in the array: only on elements, any structure, whole dimension, whole array
  • Addressing method:
    • The index is an integer
    • The index contains the relative distance to the beginning of the array
    • The index is a search key
Associated calculations can be programmed manually or are carried out partially or fully automatically by the compiler .

In most assembly languages , the processing of arrays is possible, but syntactically it is usually not specially supported and has to be explicitly "recreated" by the programmer: The programmer implements array elements as well as other variables, and reserves the memory space for n further versions . For processing, it determines the position of relevant elements using suitable algorithms , for example in an index register , and addresses them with suitable instructions that are not specifically designed for array processing.

When declaring , arrays are formulated in a language-specific syntax . Examples:

  • FeldX (100)(Data name plus number of array elements in round brackets): PL / I
  • FeldX [100,...](Data name plus the number of array elements for each dimension in square brackets): C #
  • FeldX [100][][](Data name plus the number of array elements in square brackets for each dimension): C / C ++ , Java
  • FeldX array (100)(Keyword 'array' plus number of elements in round brackets): Modula-2
  • FeldX occurs 100.(Keyword 'OCCURS' plus number of elements): Cobol
  • Dim FeldX (100,...)(Keyword 'Dim' plus variable name plus number of elements per dimension (in round brackets)): VBA , 'Dimension' in Fortran90 / 95

Element data type

In static typing programming languages , field contents are mostly restricted to elements of a single data type. Occasionally, however, a special case of “largely any content” is possible, with object-oriented programming languages often via polymorphism of the general base class . In dynamic typing programming languages, objects or general data structures can usually be saved in almost any composition and order. In dynamic typing programming languages, however, often only associative arrays are offered.

Field variants

Standard field

Standard field vs Associative field vs 'Field' as 'Data Field'

With the help of a field, the data of a usually uniform data type can be stored in the memory of a computer in such a way that the data can be accessed via indices. In contrast to the associative field, the standard field is set to integer indices for addressing. An index starts with a (one-dimensional) field with N elements, by default, depending on the programming language, with 0 ( C ++ : 0,1,2, ..., N-1) or 1 ( Fortran : 1,2,3, ..., N) , but can often be chosen freely (42, 43, 44, ..., N + 41).

Dynamic field

A dynamic array or list of arrays is a variable size, random access list data structure where items can be added or removed. It is provided by standard libraries in many modern programming languages .

A dynamic array is not the same as a dynamically allocated array, which is an array that is sized when the array is allocated, although a dynamic array may use such a fixed-size array as its back end.

A simple dynamic array can be created by allocating a fixed-size array that is usually larger than the number of elements immediately required. The elements of the dynamic array are stored contiguously at the beginning of the underlying array, and the remaining positions towards the end of the underlying array are reserved or not used. Elements can be added to the end of a dynamic array in constant runtime using the reserved space until this space is completely occupied.

If all of the space is consumed and an additional element is to be added, the underlying fixed-size array must be increased. Usually resizing is expensive because a new underlying array must be allocated and each element copied from the original array. Elements can be removed from the end of a dynamic array in constant time because no resizing is required. The number of elements used by the contents of the dynamic array is its logical size , while the size of the underlying array is referred to as the capacity or physical size of the dynamic array. This is the maximum possible size that can be used without moving data.

A dynamic array is required when the maximum logical size is not set or cannot be calculated before the space is reserved for the array.

Associative field

The “associative array” is a special form. It does not necessarily use integer numeric indexes, but keys to address the elements. In principle, these keys can be of any type, for example character strings , but must uniquely identify an element . For example, the product number is the index used to index data about a specific product in a product table, e.g. B .: Produkt = ProdBezeichn (ProdNr). Most often, associative fields are implemented as hash tables .

In the case of associative arrays - as an additional part of the address calculation (see below) - the position of the data must be determined using the data field specified as a key. Such fields are only called “associative” if this part of the addressing is calculated automatically by the programming language.

Dimensions

In most programming languages , a field - and thus the data stored in it - can be not just one-dimensional, but multidimensional. In the case of multi-dimensional fields, an index is used for each dimension to address their elements. The following variants can be distinguished from the concept of dimension. In the examples, a symbolic example code is used, the start index is 1.

One-dimensional fields

As in a list, the field elements are managed as an elementary data field or as a compound with several elementary fields . The information is accessed via 'ArrayName (index)'.

Examples

1. one-dimensional (like a 'list')

Vektor := array(3) of float  // Deklaration einer 1-dimensionalen Liste namens „Vektor“ mit 3 ‚freien Plätzen‘
 Vektor := (0.5, 1.7, -0.2)   // der Punkt (x=0.5 ; y=1.7 ; z=-0.2) im ℝ³
Vektor[2]thus provides the y component with the value 1.7.

2. one-dimensional (with compound data type):

Produkt := array(100) of structure{
    ProdNr := Text[6]                      // Text mit max. 6 Stellen
    Einkaufspreis := FixPunktZahl[4;2]     // Kommazahl mit 4 Stellen vor, 2 Stellen nach dem Komma
    Verkaufspreis := FixPunktZahl[4;2]     // dito
    Lagerbestand := Integer[6]             // Ganzzahl mit 6 Stellen
 } // end structure
Produkt(Index).ProdNr, Produkt(Index).Einkaufspreis provide the stated values ​​for the product to which the index points.

Multi-dimensional / in-yourself-multi-dimensional

With this variant, information such as elements of a two-dimensional surface or a three-dimensional cube can be presented. "Only the last dimension contains the elements," each piece of information is therefore all dimensions , e.g. B. Width, height, depth, equally attributable. Access takes place by specifying all indices ('<AttrName> (i1, i2, ...)'). ISO_IEC 11404 calls these fields “inherently multidimensional” under the 'General Purpose Datatypes'.

Examples

1. two-dimensional (like a matrix or a table):

In a matrix, the horizontal entries ( fields , cells ) are designated as rows , the vertical ones as columns . An individual element is clearly identified ( addressed ) by naming the row and column . Addressing via a tuple (0,0) or A1 for column A, row 1 is common.
Schachfeld := array(8,8) of String
arraydeclared Schachfeldas an array ("field") with 8 times 8 entries, ofdeclares the type of the entry, here String.
 Schachfeld := (("w_Turm" ,"w_Springer","w_Läufer","w_König", … ,"w_Turm" ),
                ("w_Bauer","w_Bauer"   ,"w_Bauer" ,"w_Bauer", … ,"w_Bauer"),
                (""       ,""          ,""        ,""       , … ,""       ),
                (""       ,""          ,""        ,""       , … ,""       ),
                (""       ,""          ,""        ,""       , … ,""       ),
                (""       ,""          ,""        ,""       , … ,""       ),
                ("s_Bauer","s_Bauer"   ,"s_Bauer" ,"s_Bauer", … ,"s_Bauer"),
                ("s_Turm" ,"s_Springer","s_Läufer","s_König", … ,"s_Turm" ))
The above assignment defines the starting order of the pieces, w_= white, s_= black. I.e. at the top sits the player with the white pieces and at the bottom the black player.
In the example, the column index runs from left to right, the row index from top to bottom. In the following, the row index is the first and the column index is the second, i.e. the indexing Schachfeld[Zeilenindex,Spaltenindex]. The column index "runs faster".
In chess , the columns are referred to as lines and addressed with the lower case letters “a” - “h”, the rows as rows , addressed with “1” - “8”. This corresponds to Schachfeld[z,s]the field (“a” - “h”) [9- s] z.
The example instructions and provide the opening move " ".Schachfeld[5,4]*1 := Schachfeld[5,2]*2Schachfeld[5,2]*1 := ""d2–d4
* 1: or [ToLine, ToColumn] * 2: or [FromLine, FromColumn]

2. multi-dimensional (here with 4 dimensions):

For a combustion chamber (x; y; z) (e.g. of an engine),
where x, y and z are given in millimeters from 1 to 50,
at every point and over a period of one second for every millisecond, a temperature specification is stored:
 temperatur := array(50,50,50,1000) of float
→ 4-dimensional array (x, y, z, time)
How hot was it at the point (x = 7; y = 12; z = 48) at time t = 617 ms?
 = temperatur( 7 , 12 , 48 , 617 )

Multi-dimensional / field contains another field

Here, a field as an element - in addition to mostly other data fields - again contains a field etc. with possibly further levels. This variant is also called a branched array . Information stored in the field belongs to exactly one dimension, i.e. H. to exactly one of its index values. Correspondingly, information in the outer field is accessed, for example, with '<Attr_in_Dim1> (i)' and information in the inner field with '<Attr_in_Dim2> (i, j) etc. "Multi-dimensional" refers to the entirety of everything in the field hierarchically (= tree-like = 'branched') stored information, not on each individual element. The number of dimensions results from the “nesting depth” of the innermost array. ISO_IEC 11404 calls these fields "induced multi-dimensional".

Example 'fields for a storage rack'
 Lagerboden := array(10)                  // Es gibt 10 Lagerböden = Feld der 1. Dimension
   LB_Bezei := Text(20)                   //  wie der jeweilige Lagerboden genannt/beschriftet wird
   …                                      //  ggf. weitere Daten je Lagerboden, z. B. max. Gesamtgewicht
   WZ_Box := array(5)                     // Je Boden max. 5 Werkzeugboxen, 2. Dimension
     WZ_Bezei := Text(20)                 //  Bezeichnung der Werkzeuge, die in der Box gelagert bzw. zu lagern sind
     WZ_Anz := numeric(4)                 //  die Anzahl der darin gelagerten Werkzeuge
     …                                    //  ggf. weitere Daten je WZ-Box, u. U. weiteres Array 3. Dim.
'LB_Bezei (I1), "Box-Nr:" & I2, WZ_Bezei [I1,I2], WZ_Anz [I1,I2] ...' provides the tool designation and its number that are stored in a specific box (storage shelf and box number).
'LB_Bezei (I1)' provides the designation of the storage floor.
'WZ_Bezei (I1,I2) = "Rohrzangen", WZ_Anz (I1, I2) = 10' creates a new tool box with 10 pipe wrenches.
'WZ_Anz (I1,I2) -1 updates the number when a tool is removed from the box.

Indices are only to be specified in the instructions for the dimension (s) from which information is addressed / retrieved, e.g. B. LB_Bezei (x) or WZ_Bezei (x, y).

Addressing a field

Despite the mostly three-dimensional content of fields, especially in the case of multi-dimensional ones, the elements stored in a field are also stored in a linear memory. The elements of a one-dimensional vector are stored one after the other in the memory, with a two-dimensional matrix the elements are stored either as row or column vectors one behind the other, with a three-dimensional field a corresponding number of matrices are stored one after the other, etc.

In most programming languages , addressing fields is handled entirely by the compiler . In assembler it has to be programmed explicitly in the source code .

Memory mapping function

two variants of the arrangement of a two-dimensional field in the main memory

A program that wants to access elements of a field must calculate their memory address .

example

Given: A 2-dimensional field with 4 lines (1..4) and 7 columns (1..7). Each element is 4 bytes in size. The element is to be accessed (row = 3, column = 6). The field begins at memory address base .

Since an element of line 3 is accessed, 2 lines must be "skipped":

2 (skip lines) * 7 (elements per line) * 4 (bytes per element) = 56 (= start of the 3rd line in the field)

In line 3, column 6 is to be accessed, so 5 additional elements must be "skipped":

5 (skip columns) * 4 (bytes per element) = 20 (= start of the 6th column in row 3)

The desired element therefore begins at address ( base + 56 + 20) = ( base + 76) and is 4 bytes long.

General

In a -dimensional field , the address of an element is calculated using the formula , for example . This formula is also called the memory mapping function .

The formula shown is only one of at least two alternatives, depending on the order in which the indices are combined into memory blocks, from the first to the last or vice versa. In English, a distinction is made between row-major order ( row- wise arrangement) and column-major order (column-wise arrangement).

It is usually up to the runtime environment of the respective compiler to make these calculations and use them in the respective command, regardless of the variant.

Dope vector

Since the products in the above formula are constant, they can be calculated once. The resulting dope vector d then enables the address of each stored element to be calculated very quickly using the formula .

Program efficiency

The processing of data within a field requires - in contrast to data fields which can be addressed without an index - additional effort to calculate the actual memory address of the data fields used. The programmer can partly influence and optimize the calculation commands required for this, mostly generated by a compiler - provided this is not already done by the compiler. The following examples give details, the application of which can lead to more efficient code , see details and further examples.

  • When using literals as an index , the compiler calculates the memory address at compile time . Some compilers determine whether the index depends on variables whose status can already be determined at compile time.
  • Use of internal data formats for the index variable so that no format conversion is required in the context of the addressing calculation.
  • Reusing already calculated access addresses instead of calculating them again for each command. Depending on the compiler, suitable addressing methods can be selected. In some cases, compilers detect this reuse and automatically generate optimized code.
  • A suitable choice of the order of the dimensions: If a field is kept in the RAM in a computer , access to field elements usually takes place fastest if addresses that follow one another are called up ( locality enables caching ). The programmer is therefore required to determine the order of the indices in the field in such a way that this also occurs in the innermost loop. Since the memory mapping function depends on the compiler, the programmer should find out about these details and then define the index that is run through in the innermost loop in the program so that it corresponds to successive elements in Ram.
  • Outsourcing of 'field' contents with (local / time local) multiple / many accesses with the same index in a separate, directly addressable memory area.
  • Address higher-level composite structure instead of many elementary data fields: For example, when moving array entries, for example when sorting array contents, the address calculation usually only takes place once per cluster when referring to a cluster (cluster name (index)) - but when referring to individual elements of the composite per composite element.

The usefulness or necessity of such efficiency measures, which should always be well documented for reasons of readability of a program, depends on various factors: They are not relevant if the compiler used automatically makes the appropriate optimizations, less relevant, for example, if the program only rarely is executed if it only has a short run time , if the field-related commands only make up a small part of the overall processing.

See also

Web links

Wiktionary: Array  - explanations of meanings, word origins, synonyms, translations
Commons : Array data structure  - collection of images, videos, and audio files
Wikibooks: Algorithms and data structures in C / fields  - learning and teaching materials

Individual evidence

  1. Programming in Fortran , Uni Bayreuth
  2. A C tutorial. C-how-to
  3. Microsoft msdn visual studio
  4. linguee
  5. rrzn University of Hanover (PDF; 411 kB)
  6. Example of "heated discussion" on leo.org
  7. msdn.microsoft.com Microsoft
  8. www2.informatik.uni-halle.de ( Memento of the original from April 29, 2015 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. University hall  @1@ 2Template: Webachiv / IABot / www2.informatik.uni-halle.de
  9. homeandlearn.co.uk Java
  10. Programmersbase Tutorial / Java Basics / Arrays ( Memento from February 23, 2015 in the Internet Archive )
  11. Rheinwerk publishing openbook.rheinwerk-verlag.de
  12. MSDN msdn.microsoft.com
  13. Maximizing Code Performance… MathWorks, Technical Articles and Newsletters