NumPy

from Wikipedia, the free encyclopedia
NumPy

NumPy logo.svg
Basic data

Maintainer The NumPy team
developer Travis Oliphant
Publishing year 1995 (as Numeric); 2006 (as NumPy)
Current  version 1.19.1
( July 21, 2020 )
operating system cross-platform
programming language Python , C.
category Numerical library for scientific computing
License BSD (new)
www.numpy.org

NumPy is a program library for the Python programming language , which enables the easy handling of vectors , matrices or generally large, multi-dimensional arrays . In addition to the data structures , NumPy also offers efficiently implemented functions for numerical calculations.

NumPy's predecessor, Numeric, was developed under the direction of Jim Hugunin. Travis Oliphant incorporated modified functionalities of the competitor Numarray into Numeric and published this in 2005 as NumPy. The library is open source and is being further developed by many contributors.

features

The interpreter CPython installed as standard for Python executes commands as unoptimized byte code . In this Python variant, mathematical algorithms are often slower than an equivalent compiled implementation. NumPy is a high-performance alternative here. Existing iterative algorithms may have to be rewritten for multi-dimensional array operations. NumPy's operators and functions are optimized for such arrays and thus enable particularly efficient evaluation.

The handling of NumPy arrays in Python is comparable to MATLAB ; both allow algorithms to run quickly, as long as they are designed for entire arrays or matrices rather than individual scalars . MATLAB offers great expansion options with additional products such as Simulink . The integration of NumPy in Python enables the use and combination with many other packages from the extensive Python environment. The Python library SciPy offers further MATLAB-like functions . The range of functions of Matplotlib for the simple creation of plots in Python is also very similar to the capabilities of MATLAB. Internally, both MATLAB and NumPy use the two program libraries BLAS and LAPACK for efficient calculation of linear algebra .

The Python interface of the widely used computer vision package OpenCV uses NumPy arrays internally to process data. For example, images with multiple color channels are represented with three-dimensional arrays. Indexing, slicing or masking with other arrays are therefore very efficient methods to be able to access specific pixels in a targeted manner. NumPy arrays as a universal data structure for images, extracted feature points , convolution matrices and much more facilitate the development and debugging of algorithms for image processing .

The ndarray data structure

The core functionality of NumPy is based on the data structure “ndarray” ( n -dimensional array), a contiguous memory area of ​​a fixed size. Unlike in Python's own List data structure (which, contrary to its name, is actually implemented by a dynamic array ), ndarrays are typed homogeneously : All elements of an array must be of the same data type .

These arrays also allow other languages ​​to be read from allocated buffers without having to copy data. With the C / C ++ , Cython or Fortran extensions of the CPython interpreter, other existing numerics libraries can easily be used. This behavior is used, for example, by SciPy, which provides wrappers for external libraries such as BLAS or LAPACK . NumPy has native support for memory mapping of the ndarrays.

restrictions

Actually inserting or appending array entries as with Python's lists is not possible. The function np.pad()with which arrays can be extended creates new arrays of the desired size, copies existing ones into them and returns them. Even if two arrays are lined up with np.concatenate([a1,a2]), the arrays are not really chained, but a new, connected array is returned. With NumPy's function np.reshape(), a conversion is only possible if the number of array entries does not change. These restrictions are due to the fact that NumPy arrays must be created in memory as a contiguous area. The Blaze framework offers an alternative here which is intended to remedy this restriction.

The use of NumPy arrays alone compared to Python lists does not result in any speed advantage. Corresponding algorithms must first be rewritten in a vector-compatible form. This can be a disadvantage, since it may be necessary to create temporary arrays in the size of the input, which increases the storage complexity from constant to linear. A runtime compilation has already been implemented in some packages to avoid these problems. Open source solutions that can interact with NumPy include numexpr and Numba. In addition, Cython would be a statically compilable alternative.

Examples

Array generation
>>> import numpy as np
>>> x = np.array([1, 2, 3])
>>> x
array([1, 2, 3])
>>> y = np.arange(10) # analog zu Python's y = [i for i in range(10)], nur wird ein Array statt einer Liste generiert
>>> y
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Basic operations
>>> a = np.array([1, 2, 3, 6])
>>> b = np.linspace(0, 2, 4) # erzeugt vier äquidistante Werte im Intervall [0,2]
>>> c = a - b
>>> c
array([ 1.    , 1.33333333, 1.66666667, 4.    ])
>>> a**2 # Python Operatoren können direkt auf Arrays angewandt werden
array([ 1, 4, 9, 36])
Universal functions
>>> a = np.linspace(-np.pi, np.pi, 100)
>>> b = np.sin(a)
>>> c = np.cos(a)
Linear Algebra
>>> from numpy.random import rand
>>> from numpy.linalg import solve, inv
>>> a = np.array([[1, 2, 3], [3, 4, 6.7], [5, 9.0, 5]])
>>> a.transpose()
array([[1. , 3. , 5. ],
       [ 2. , 4. , 9. ],
       [ 3. , 6.7, 5. ]])
>>> inv(a)
array([[-2.27683616, 0.96045198, 0.07909605],
       [ 1.04519774, -0.56497175, 0.1299435 ],
       [ 0.39548023, 0.05649718, -0.11299435]])
>>> b = np.array([3, 2, 1])
>>> solve(a, b) # Löst die Gleichung ax = b
array([-4.83050847, 2.13559322, 1.18644068])
>>> c = rand(3, 3) # Erzeugt eine zufällige 3x3 Matrix mit Werten im Intervall [0,1]
>>> c
array([[0.3242542 , 0.94330798, 0.27474668],
       [0.45979412, 0.39204496, 0.58138993],
       [0.66361452, 0.90350118, 0.65898373]])
>>> np.dot(a, c) # Punkt-Produkt/Skalarprodukt der Matrizen
array([[ 3.23468601,  4.43790144,  3.41447772],
       [ 7.25815638, 10.45156168,  7.56499072],
       [ 9.07749071, 12.76245045,  9.9011614 ]])
>>> a @ c # Möglich ab Python 3.5 und NumPy 1.10
array([[ 3.23468601,  4.43790144,  3.41447772],
       [ 7.25815638, 10.45156168,  7.56499072],
       [ 9.07749071, 12.76245045,  9.9011614 ]])
Integration in OpenCV
>>> import numpy as np
>>> import cv2
>>> r = np.reshape(np.arange(256*256)%256,(256,256)) # 256x256 Pixel Array mit horizontalem Verlauf von 0 bis 255 für den roten Farbkanal
>>> g = np.zeros_like(r) # Array selber Größe und selbem Datentyp wie r aber gefüllt mit Nullen für den grünen Farbkanal
>>> b = r.T # Das transponierte r Array wird als vertikaler Verlauf des blauen Farbkanals verwendet.
>>> cv2.imwrite('gradients.png', np.dstack([b,g,r])) # OpenCV interpretiert Bilder in BGR, das in dritter Dimension gestapelte Array wird als 8bit RGB PNG-Datei 'gradients.png' gespeichert
True
Nearest neighbor search - Iterative Python algorithm and vectorized NumPy alternative
>>> # # # Iterative Python Variante # # #
>>> points = [[9,2,8],[4,7,2],[3,4,4],[5,6,9],[5,0,7],[8,2,7],[0,3,2],[7,3,0],[6,1,1],[2,9,6]]
>>> qPoint = [4,5,3]
>>> minIdx = -1
>>> minDist = -1
>>> for idx, point in enumerate(points): # iteriere über alle Punkte
        dist = sum([(dp-dq)**2 for dp,dq in zip(point,qPoint)])**0.5 # berechne die Euklidische Distanz jedes Punkts zu q
        if dist < minDist or minDist < 0: # wenn nötig, aktualisiere die minimale Distanz und Index des entsprechenden Punkts
            minDist = dist
            minIdx = idx

>>> print 'Nächster Punkt zu q: ', points[minIdx]
Nächster Punkt zu q: [3, 4, 4]
>>>
>>>
>>> # # # Äquivalente NumPy Vektorisierung # # #
>>> import numpy as np
>>> points = np.array([[9,2,8],[4,7,2],[3,4,4],[5,6,9],[5,0,7],[8,2,7],[0,3,2],[7,3,0],[6,1,1],[2,9,6]])
>>> qPoint = np.array([4,5,3])
>>> minIdx = np.argmin(np.linalg.norm(points-qPoint,axis=1)) # berechne alle Euklidischen Distanzen auf einmal und bestimme den Index des kleinsten Werts
>>> print 'Nächster Punkt zu q: ', points[minIdx]
Nächster Punkt zu q: [3 4 4]

history

The Python programming language was originally not optimized for numerical calculations, but quickly became well known in the field of scientific computing . In 1995 the interest group matrix-sig was founded with the aim of defining a uniform package for array handling. One of the members was Python author Guido van Rossum , who added syntax extensions, especially indexing, directly to Python in order to simplify the use of arrays. The first implementation of a matrix package was developed by Jim Fulton, which later became generalized by Jim Hugunin and known as Numeric (sometimes Numerical Python Extensions ), the forerunner of NumPy. Hugunin was then a student at MIT , but left the project in 1997 to continue working on JPython . Paul Dubois from LLNL took over the lead of the project. Other contributors from the very beginning were David Ascher, Konrad Hinsen and Travis Oliphant.

The Numarray package was developed as an alternative to Numeric, which should offer more flexibility. Both packages are now obsolete and are no longer being developed. Numarray and Numeric each had their strengths and weaknesses and for a while were still used in parallel for different areas of application. The last version (v24.2) of Numeric was released on November 11, 2005, the last (v1.5.2) of Numarray on August 24, 2006.

In early 2005, Travis Oliphant began transferring the functionality of Numarray to Numeric in order to focus the developer community on a unified project; the result was published in 2005 as NumPy 1.0. This new project was also part of SciPy, but was also offered separately as NumPy to avoid having to install the large SciPy package just to be able to work with array objects. From version 1.5.0 on, NumPy can also be used with Python 3.

In 2011 the development of a NumPy API for PyPy began . However, the full range of NumPy functions is not yet supported.

On November 15, 2017, the NumPy team announced that they would only provide new functions for Python 3 from January 1, 2019, while only bug fixes will appear for Python 2. From January 1, 2020, no further updates for Python 2 will follow.

See also

literature

Web links

Commons : NumPy  - collection of pictures, videos and audio files

Individual evidence

  1. Release 1.19.1 . July 21, 2020 (accessed July 22, 2020).
  2. a b c d Stéfan van der Walt, S. Chris Colbert and Gaël Varoquaux: The NumPy array: a structure for efficient numerical computation . In: IEEE (Ed.): Computing in Science and Engineering . 2011. arxiv : 1102.1523 .
  3. ^ Blaze Ecosystem Docs . Retrieved January 6, 2016.
  4. Francesc Alted: numexpr . Retrieved March 8, 2014.
  5. ^ Numba . Retrieved January 6, 2016.
  6. a b c K. Jarrod Millman, Michael Aivazis: Python for Scientists and Engineers . In: Computing in Science and Engineering . 13, No. 2, 2011, pp. 9-12.
  7. ^ Travis Oliphant: Python for Scientific Computing . In: Computing in Science and Engineering . 2007.
  8. a b c d David Ascher, Paul F. Dubois, Konrad Hinsen, Jim Hugunin, Travis Oliphant: Numerical Python . 1999. Retrieved January 6, 2016.
  9. numarray homepage . Retrieved on January 6, 2016.  ( Page no longer available , search in web archives )@1@ 2Template: Dead Link / www.stsci.edu
  10. NumPy Sourceforge Files . Retrieved January 6, 2016.
  11. NumPy 1.5.0 Release Notes . Retrieved January 6, 2016.
  12. PyPy Status Blog: Numpy funding and status update . Retrieved January 6, 2016.
  13. NumPyPy status . Retrieved January 6, 2016.
  14. Add NumPy python3statement / python3statement.github.io ( English ) Accessed July 7, 2018.
  15. Moving to require Python 3 , from python3statement.org, accessed October 17, 2018
  16. plan for dropping Python 2.7 support ( English ) Archived from the original on July 8, 2018. Retrieved on July 7 2018th