Array programming

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 65.35.11.59 (talk) at 21:52, 21 November 2005. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Array programming languages (also known as vector or multidimensional languages) generalize operations on scalars to apply transparently to vectors, matrices, and higher dimensional arrays.

APL, by Ken Iverson, was the first programming language to provide Array Programming capabilities.

The fundamental idea behind Array Programming is that operations apply at once to an entire set of values. This makes it a high-level programming model as it allows the programmer to think and operate on whole aggregates of data, without having to resort to explicit loops of individual scalar operations.

Array Programming primitives concisely express broad ideas about data manipulation. The level of conciseness can be dramatic in certain cases: it is not uncommon to find array programming language one-liners that require more than a couple of pages of Java code.

Array Programming is very well suited to implicit parallelization; a topic of much research nowadays.

Function rank is an important concept to array programming languages in general, by analogy to tensor rank in mathematics: functions that operate on data may be classified by the number of dimensions they act on. Ordinary multiplication, for example, is a scalar ranked function because it operates on zero-dimensional data (individual numbers). The cross product operation is an example of a vector rank function because it operates on vectors, not scalars. Matrix multiplication is an example of a 2-rank function, because it operates on 2-dimensional objects (matrices). Collapse operators reduce the dimensionality of an input data array by one or more dimensions. For example, summing over elements collapses the input array by 1 dimension.

Overview

In scalar languages like FORTRAN 77, C, Pascal, Ada, etc. operations apply only to single values, so a+b expresses the addition of two numbers. In such languages adding two arrays requires indexing and looping:

FORTRAN 77

   DO 10 I = 1, N
     DO 10 J = 1, N
10       A(I,J) = A(I,J) + B(I,J)


C

     for ( i=0; i<n; i++) {
         for ( j=0; j<n; j++) {
             a[i][j] = a[i][j]+b[i][j];
         }
     }

This need to loop and index to perform operations on arrays is both tedious and error prone.

In array languages, operations are generalized to apply to both scalars and arrays. Thus, a+b expresses the sum of two scalars if a and b are scalars, or the sum of two arrays if they are arrays. When applied to arrays, the operations act on corresponding elements as illustrated in the loops above. Indeed, when the array language compiler/interpreter encounters a statement like:

A := A + B

and A and B are two dimensional arrays, it generates code that is effectively the same as the C loops shown above. An array language, therefore, simplifies programming.

Example languages

The canonical examples of array programming languages are APL, its sucessor J, and Fortran 95. Others include: K, IDL, Mathematica, MATLAB, PDL, and ZPL.

Category:Array programming languages provides an exhaustive list.

See also

External links