Numpy

Numpy#

The core tool for performant numerical computing with Python

Numpy arrays#

• multi-demensional arrays
• closed to hardware - faster
• designed for scientific computation
``````import numpy as np
ar = np.array([1,2,3,4])
ar
array([0, 1, 2, 3])
``````

Testing the speed difference#

We will use ipthyons `%timeit`

Normal python array

``````    In [12]: L = range(1000)

In [13]: %timeit [i**2 for i in L]
414 µs ± 8.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
``````

Numpy array

In numpy mathematical operations are automatically operated on each element of array

``````    In [14]: L = np.arange(1000)

In [16]: %timeit L**2
1.57 µs ± 71.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
``````

Getting help#

Numpy Docs

``````np.array?

In [3]: np.lookfor('create array')
Search results for 'create array'
---------------------------------
numpy.array
Create an array.
numpy.memmap
Create a memory-map to an array stored in a *binary* file on disk.
``````

Import convention#

When importing numpy use

``````import numpy as np
``````

Creating Arrays#

1D#

Creating

``````    >>> a = np.array([0,1,2,3])
>>> a
array([0, 1, 2, 3])
``````

Checking number of dimensions

``````    >>> a.ndim
1
``````

Checking number of deimensions

``````    >>> a.shape
(4,)
>>> len(a)
4
``````

2D#

Create it with an array/list oflists

``````    >>> b = np.array([[1,2,3,4], [5,6,7,8]])
>>> b
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
``````

Checking number of dimensions

``````    >>> b.ndim
2
``````

Return a tuple of the shapeof an array

``````    >>> b.shape
(2, 4)
``````

Check number of objects in first dinmesion

``````    >>> len(b)
2
``````

Evenly spaced#

Use `np.arange(x)`

``````arange([start,] stop[, step,], dtype=None)

>>> a = np.arange(100000)
>>> a
array([    0,     1,     2, ..., 99997, 99998, 99999])
``````

Number of points within a range#

``````linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)

a = np.linspace(0, 1, 100)
array([ 0.        ,  0.01010101,  0.02020202,  0.03030303,  0.04040404, ...
``````

Common arrays#

np.ones

``````ones(shape, dtype=None, order='C')
Return a new array of given shape and type, filled with ones.
``````

np.zeros

``````zeros(shape, dtype=float, order='C')
Return a new array of given shape and type, filled with zeros.
``````

np.eye

``````eye(N, M=None, k=0, dtype=<class 'float'>)
Return a 2-D array with ones on the diagonal and zeros elsewhere.
``````

np.diag

``````diag(v, k=0)
Extract a diagonal or construct a diagonal array.

>>> d = np.diag(np.array([1, 2, 3, 4]))
>>> d
array([[1, 0, 0, 0],
[0, 2, 0, 0],
[0, 0, 3, 0],
[0, 0, 0, 4]])
``````

np.random

``````    >>> a = np.random.rand(4)
>>> a
array([ 0.14365585,  0.96317038,  0.57808752,  0.30486506])
``````

Gaussian random numbers

Numbers on a “standard normal” distribution of mean 0 and variance 1

``````    >>> a = np.random.randn(4)
>>> a
array([-0.72186413,  1.89644724, -1.63709681, -0.76200216])
``````

Basic data types#

Numbers sometimes displayed with a trailing `.`: `2.`

``````>>> a = np.array([1.,2.,3.,])
>>> a
array([ 1.,  2.,  3.])
>>> a.dtype
dtype('float64')
``````

No `.` is `int64` with a dot is `float64`

You can explicitly specify the datatype with:

``````c = np.array([1, 2, 3], dtype=float)
``````

There is also:

complex128:

``````d = np.array([1+2j, 3+4j, 5+6*1j])
``````

String:

``````>>> a = np.array(['hello','is','it','me','you','are','looking','for'])
>>> a.dtype
dtype('<U7')
``````

Indexing and Slicing#

You access items the same as python lists

``````>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[0], a[2], a[9]
(0, 2, 9)
``````

Reversing a numpy array

``````>>> a[::-1]
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
``````

For multi-dimensional arrays indexes are tuples of intergers The Row is specified first and column second

``````>>> a[2,1]
# third row, second column
``````

Arrays can be sliced (Just like python)

``````>>> a[12:20:2] # [start:end:step]
array([22, 24, 26, 28])
``````

No slice components are requered, default is `0:last:1`

``````>>> a[:]
``````

**Remember that the end/last element is not included

``````>>> a = np.array([1,2,3,4])
>>> a[1:3]
array([2, 3])
>>> a[1:4]
array([2, 3, 4])
``````

Now the differences is you can assign slices to numpy arrays but not python lists

``````>>> a[2:] = 10
>>> a
array([ 1,  2, 10, 10])
``````

Slicing a 2-d array:

a[, ]

Example:

``````>>> a = np.diag(np.arange(1,7, dtype='float'))
>>> a
array([[ 1.,  0.,  0.,  0.,  0.,  0.],
[ 0.,  2.,  0.,  0.,  0.,  0.],
[ 0.,  0.,  3.,  0.,  0.,  0.],
[ 0.,  0.,  0.,  4.,  0.,  0.],
[ 0.,  0.,  0.,  0.,  5.,  0.],
[ 0.,  0.,  0.,  0.,  0.,  6.]])
``````

And you want the entire rows of 3 to 5 diagonally:

• y: starting at 2 ending at 4 inclusive (5) = `2:5`
• x: starting at 0 and ending at index 4 inclusive (5) = `0:5`

It is wierd as you start with the `y-axis` in the notation

``````    >>> a[2:5,:5]
array([[ 0.,  0.,  3.,  0.,  0.],
[ 0.,  0.,  0.,  4.,  0.],
[ 0.,  0.,  0.,  0.,  5.]])
``````

Copies and Views#

The slicing operation creates a view on the original array which is just a way of accessing array data. The original array is not copied.

You can use `np.may_share_memory(x, y)` to check if 2 arrays share memory

If memory is shared changing the copied or original affect the other.

To force a copy use:

``````>>> c = a[:2].copy()
``````

Fancy Indexing#

``````>>> a = np.random.randint(0, 21, 15)
>>> a
array([10,  3,  8,  0, 19, 10, 11,  9, 10,  6,  0, 20, 12,  7, 14])
>>> (a % 3 == 0)
array([False,  True, False,  True, False, False, False,  True, False,
True,  True, False,  True, False, False], dtype=bool)
>>> mask = (a % 3 == 0)
>>> extract_from_a
array([ 3,  0,  9,  6,  0, 12])
``````

Assigning new values to sub array that meets a criterion:

``````a[a % 3 == 0] = -1
``````

Using integer array mask (repeating some values):

``````>>> a = np.arange(0, 100, 10)
>>> a
array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])
>>> a[[1,2,2,3,3,4,4]]
array([10, 20, 20, 30, 30, 40, 40])
``````

Can be used to assign as well:

``````>>> a[[7,9]] = 100
``````

A new array created by an array of arrays will share the same shape

``````>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> idx = np.array([[3,4],[9,7]])
>>> idx
array([[3, 4],
[9, 7]])
>>> a[idx]
array([[3, 4],
[9, 7]])
``````

**Remember for an element `a[5,6]` the x-axis is `6` and the y-axis is `5`

Indexing#

An iteratable can be used `tuple` works the same as a `list`:

``````full[[0,1,2,3,4], [1,2,3,4,5]]
full[(0,1,2,3,4), (1,2,3,4,5)]
``````

You can use standard list manipulation notation:

``````full[(3:, [0,2,5]]
``````

Numerical operations on arrays#

With Scalars#

scalars are single element or number fields

You can simply apply the arithmetic to the whole array

``````>>> a = np.array([1, 2, 3, 4])
>>> a + 1
array([2, 3, 4, 5])
``````

Lets try `a` to the power of `2`

``````>>> a**2
array([ 1,  4,  9, 16])
``````

All arithmetic operates element-wise

``````>>> b = np.ones(4) + 1
>>> b
array([ 2.,  2.,  2.,  2.])
>>> a - b
array([-1.,  0.,  1.,  2.])
``````

Another example:

``````>>> j = np.arange(5)
>>> j
array([0, 1, 2, 3, 4])
>>> 2**(j + 1) - j
array([ 2,  3,  6, 13, 28])
``````

The operations are much faster than if you did them in pure python

Matrix Multiplcation#

In [6]: c = np.ones((3, 3))

In [7]: c Out[7]: array([[ 1., 1., 1.], [ 1., 1., 1.], [ 1., 1., 1.]])

Using `*` is not matrix multiplication

In [8]: c * c Out[8]: array([[ 1., 1., 1.], [ 1., 1., 1.], [ 1., 1., 1.]])

Using the `dot` function is matrix multiplciation

`dot` is the product of 2 arrays

In [9]: c.dot(c) Out[9]: array([[ 3., 3., 3.], [ 3., 3., 3.], [ 3., 3., 3.]])

Comparison is also element-wise#

``````    In [20]: a = np.array([1, 2, 3, 4])

In [21]: b = np.array([4, 2, 2, 4])

In [22]: a == b
Out[22]: array([False,  True, False,  True], dtype=bool)

In [23]: a > b
Out[23]: array([False, False,  True, False], dtype=bool)
``````

If you want to compare the entire array use `np.array_equal()`:

``````    a = np.array([1, 2, 3, 4])
b = np.array([1, 2, 3, 4])
np.array_equal(a, b)
``````

Logic operations#

Use `np.logical_or()` and `np.logical_and()`

``````>>> a = np.array([1, 1, 0, 0], dtype=bool)
>>> b = np.array([0, 1, 1, 0], dtype=bool)
>>> np.logical_or(a, b)
array([ True,  True,  True, False], dtype=bool)
>>> np.logical_and(a, b)
array([False,  True, False, False], dtype=bool)
``````

Transcendental operations#

Use `np.sin`, `np.cos`, `np.tan`, `np.log` and `np.exp`

``````    >>> a = np.array([-1, 0, 1, 2])
>>> a
array([-1,  0,  1,  2])
>>> np.sin(a)
array([-0.84147098,  0.        ,  0.84147098,  0.90929743])
>>> np.log(a)
__main__:1: RuntimeWarning: divide by zero encountered in log
__main__:1: RuntimeWarning: invalid value encountered in log
array([        nan,        -inf,  0.        ,  0.69314718])
>>> np.exp(a)
array([ 0.36787944,  1.        ,  2.71828183,  7.3890561 ])
``````

Mimatch#

``````>>> a = np.arange(4)
>>> a + np.array([1, 2])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: operands could not be broadcast together with shapes (4,) (2,)
``````

When the shapes of the arrays do not match they cannot be broadcast

Transpose#

Invert and reflect. Opposite on both axis.

Create a triangle with `np.triu` (Use `help(np.triu)`)

``````>>> np.triu(np.ones((3,3)),1)
array([[ 0.,  1.,  1.],
[ 0.,  0.,  1.],
[ 0.,  0.,  0.]])
``````

then transpose with:

``````>>> a.T
array([[ 0.,  0.,  0.],
[ 1.,  0.,  0.],
[ 1.,  1.,  0.]])
``````

Remember a transposition is a view, so when arrays become larger they will fail in unpredicatable ways

Extras#

• `np.allclose` - Returns True if two arrays are element-wise equal within a tolerance.
• `np.tril`- Lower triangle of an array.

Basic Reductions#

Finding the sum of an array#

``````>>> a
array([ 0,  5, 10, 15, 20, 25])
>>> a.sum()
75
>>> np.sum(a)
75
``````

On the `axis`:

``````>>> a = np.array([[1,1], [2,2]])
>>> a
array([[1, 1],
[2, 2]])
``````

Find sum of the `column` along the `y-axis` - `first dimension`

``````>>> a.sum(axis=0)
array([3, 3])
``````

Find the sum of `column` along the `x-axis` - `second dimension`:

``````>>> a.sum(axis=1)
array([2, 4])
``````

Same idea at higher dimensions:

``````>>> x = np.random.rand(2, 2, 2)
>>> x
array([[[ 0.73091254,  0.3126328 ],
[ 0.52196148,  0.51212003]],

[[ 0.07157999,  0.15920737],
[ 0.75733851,  0.99707551]]])
>>> x.sum(axis=2)[0, 1]
1.0340815143357149
``````

`min`, `max` and index of min and max

``````x = np.array([1, 3, 2])
>>> x.min()
1
>>> x.max()
3
``````

Get the index of the min or max:

``````>>> x.argmin()
0
>>> x.argmax()
1
``````

Logic operations#

``````>>> np.all([True, True, False])
False
>>> np.any([True, True, False])
True
``````

Can be used with an argument:

``````>>> a = np.array([1, 2, 3, 2])
>>> b = np.array([2, 2, 3, 2])
>>> np.all(a < 4)
True
>>> np.any(a > 4)
False
>>> np.any(a > 3)
False
>>> np.any(a > 2)
True
``````

With multiple conditions:

``````>>> a = np.array([1, 2, 3, 2])
>>> b = np.array([2, 2, 3, 2])
>>> c = np.array([6, 4, 4, 5])
>>> ((a <= b) & (b <= c))
array([ True,  True,  True,  True], dtype=bool)
>>> ((a <= b) & (b <= c)).all()
True
``````

Statistics#

y = np.array([[1, 2, 3], [5, 6, 1]])

The average or `mean`:

``````>>> y.mean()
3.0
``````

The median along the `-1` last axis

``````>>> np.median(y, axis=-1)
array([ 2.,  5.])
``````

The standard devication

``````>>> x.std()
0.81649658092772603
``````

cumsum is the cumulative sum

``````>>> y.cumsum(axis=0)
array([[1, 2, 3],
[6, 8, 4]])
>>> y.cumsum(axis=1)
array([[ 1,  3,  6],
[ 5, 11, 12]])
>>> y
array([[1, 2, 3],
[5, 6, 1]])
``````

Remember in ipython you can run bash commands with `!` in front

Eg. `!cat data/populations.txt`