Numpy
Numpy#
The core tool for performant numerical computing with Python
Numpy arrays#
- multi-dimensional arrays
- closed to hardware - faster
- designed for scientific computation
import numpy as np ar = np.array([1,2,3,4]) ar array([0, 1, 2, 3])
Testing the speed difference#
We will use ipthyons %timeit
Normal python array
In [12]: L = range(1000)
In [13]: %timeit [i**2 for i in L]
414 µs ± 8.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Numpy array
In numpy mathematical operations are automatically operated on each element of array
In [14]: L = np.arange(1000)
In [16]: %timeit L**2
1.57 µs ± 71.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Getting help#
np.array?
In [3]: np.lookfor('create array')
Search results for 'create array'
---------------------------------
numpy.array
Create an array.
numpy.memmap
Create a memory-map to an array stored in a *binary* file on disk.
Import convention#
When importing numpy use
import numpy as np
Creating Arrays#
1D#
Creating
>>> a = np.array([0,1,2,3])
>>> a
array([0, 1, 2, 3])
Checking number of dimensions
>>> a.ndim
1
Checking number of deimensions
>>> a.shape
(4,)
>>> len(a)
4
2D#
Create it with an array/list oflists
>>> b = np.array([[1,2,3,4], [5,6,7,8]])
>>> b
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
Checking number of dimensions
>>> b.ndim
2
Return a tuple of the shapeof an array
>>> b.shape
(2, 4)
Check number of objects in first dinmesion
>>> len(b)
2
Evenly spaced#
Use np.arange(x)
arange([start,] stop[, step,], dtype=None)
>>> a = np.arange(100000)
>>> a
array([ 0, 1, 2, ..., 99997, 99998, 99999])
Number of points within a range#
linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)
a = np.linspace(0, 1, 100)
array([ 0. , 0.01010101, 0.02020202, 0.03030303, 0.04040404, ...
Common arrays#
np.ones
ones(shape, dtype=None, order='C')
Return a new array of given shape and type, filled with ones.
np.zeros
zeros(shape, dtype=float, order='C')
Return a new array of given shape and type, filled with zeros.
np.eye
eye(N, M=None, k=0, dtype=<class 'float'>)
Return a 2-D array with ones on the diagonal and zeros elsewhere.
np.diag
diag(v, k=0)
Extract a diagonal or construct a diagonal array.
>>> d = np.diag(np.array([1, 2, 3, 4]))
>>> d
array([[1, 0, 0, 0],
[0, 2, 0, 0],
[0, 0, 3, 0],
[0, 0, 0, 4]])
np.random
>>> a = np.random.rand(4)
>>> a
array([ 0.14365585, 0.96317038, 0.57808752, 0.30486506])
Gaussian random numbers
Numbers on a “standard normal” distribution of mean 0 and variance 1
>>> a = np.random.randn(4)
>>> a
array([-0.72186413, 1.89644724, -1.63709681, -0.76200216])
Basic data types#
Numbers sometimes displayed with a trailing .
: 2.
>>> a = np.array([1.,2.,3.,])
>>> a
array([ 1., 2., 3.])
>>> a.dtype
dtype('float64')
No .
is int64
with a dot is float64
You can explicitly specify the datatype with:
c = np.array([1, 2, 3], dtype=float)
There is also:
complex128:
d = np.array([1+2j, 3+4j, 5+6*1j])
String:
>>> a = np.array(['hello','is','it','me','you','are','looking','for'])
>>> a.dtype
dtype('<U7')
Indexing and Slicing#
You access items the same as python lists
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[0], a[2], a[9]
(0, 2, 9)
Reversing a numpy array
>>> a[::-1]
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
For multi-dimensional arrays indexes are tuples of intergers The Row is specified first and column second
>>> a[2,1]
# third row, second column
Arrays can be sliced (Just like python)
>>> a[12:20:2] # [start:end:step]
array([22, 24, 26, 28])
No slice components are requered, default is 0:last:1
>>> a[:]
**Remember that the end/last element is not included
>>> a = np.array([1,2,3,4])
>>> a[1:3]
array([2, 3])
>>> a[1:4]
array([2, 3, 4])
Now the differences is you can assign slices to numpy arrays but not python lists
>>> a[2:] = 10
>>> a
array([ 1, 2, 10, 10])
Slicing a 2-d array:
a[
Example:
>>> a = np.diag(np.arange(1,7, dtype='float'))
>>> a
array([[ 1., 0., 0., 0., 0., 0.],
[ 0., 2., 0., 0., 0., 0.],
[ 0., 0., 3., 0., 0., 0.],
[ 0., 0., 0., 4., 0., 0.],
[ 0., 0., 0., 0., 5., 0.],
[ 0., 0., 0., 0., 0., 6.]])
And you want the entire rows of 3 to 5 diagonally:
- y: starting at 2 ending at 4 inclusive (5) =
2:5
- x: starting at 0 and ending at index 4 inclusive (5) =
0:5
It is wierd as you start with the y-axis
in the notation
>>> a[2:5,:5]
array([[ 0., 0., 3., 0., 0.],
[ 0., 0., 0., 4., 0.],
[ 0., 0., 0., 0., 5.]])
Copies and Views#
The slicing operation creates a view on the original array which is just a way of accessing array data. The original array is not copied.
You can use np.may_share_memory(x, y)
to check if 2 arrays share memory
If memory is shared changing the copied or original affect the other.
To force a copy use:
>>> c = a[:2].copy()
Fancy Indexing#
Using boolean masks
>>> a = np.random.randint(0, 21, 15)
>>> a
array([10, 3, 8, 0, 19, 10, 11, 9, 10, 6, 0, 20, 12, 7, 14])
>>> (a % 3 == 0)
array([False, True, False, True, False, False, False, True, False,
True, True, False, True, False, False], dtype=bool)
>>> mask = (a % 3 == 0)
>>> extract_from_a = a[mask]
>>> extract_from_a
array([ 3, 0, 9, 6, 0, 12])
Assigning new values to sub array that meets a criterion:
a[a % 3 == 0] = -1
Using integer array mask (repeating some values):
>>> a = np.arange(0, 100, 10)
>>> a
array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])
>>> a[[1,2,2,3,3,4,4]]
array([10, 20, 20, 30, 30, 40, 40])
Can be used to assign as well:
>>> a[[7,9]] = 100
A new array created by an array of arrays will share the same shape
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> idx = np.array([[3,4],[9,7]])
>>> idx
array([[3, 4],
[9, 7]])
>>> a[idx]
array([[3, 4],
[9, 7]])
**Remember for an element a[5,6]
the x-axis is 6
and the y-axis is 5
Indexing#
An iteratable can be used tuple
works the same as a list
:
full[[0,1,2,3,4], [1,2,3,4,5]]
full[(0,1,2,3,4), (1,2,3,4,5)]
You can use standard list manipulation notation:
full[(3:, [0,2,5]]
Numerical operations on arrays#
With Scalars#
scalars are single element or number fields
You can simply apply the arithmetic to the whole array
>>> a = np.array([1, 2, 3, 4])
>>> a + 1
array([2, 3, 4, 5])
Lets try a
to the power of 2
>>> a**2
array([ 1, 4, 9, 16])
All arithmetic operates element-wise
>>> b = np.ones(4) + 1
>>> b
array([ 2., 2., 2., 2.])
>>> a - b
array([-1., 0., 1., 2.])
Another example:
>>> j = np.arange(5)
>>> j
array([0, 1, 2, 3, 4])
>>> 2**(j + 1) - j
array([ 2, 3, 6, 13, 28])
The operations are much faster than if you did them in pure python
Matrix Multiplcation#
In [6]: c = np.ones((3, 3))
In [7]: c Out[7]: array([[ 1., 1., 1.], [ 1., 1., 1.], [ 1., 1., 1.]])
Using *
is not matrix multiplication
In [8]: c * c Out[8]: array([[ 1., 1., 1.], [ 1., 1., 1.], [ 1., 1., 1.]])
Using the dot
function is matrix multiplciation
dot
is the product of 2 arrays
In [9]: c.dot(c) Out[9]: array([[ 3., 3., 3.], [ 3., 3., 3.], [ 3., 3., 3.]])
Comparison is also element-wise#
In [20]: a = np.array([1, 2, 3, 4])
In [21]: b = np.array([4, 2, 2, 4])
In [22]: a == b
Out[22]: array([False, True, False, True], dtype=bool)
In [23]: a > b
Out[23]: array([False, False, True, False], dtype=bool)
If you want to compare the entire array use np.array_equal()
:
a = np.array([1, 2, 3, 4])
b = np.array([1, 2, 3, 4])
np.array_equal(a, b)
Logic operations#
Use np.logical_or()
and np.logical_and()
>>> a = np.array([1, 1, 0, 0], dtype=bool)
>>> b = np.array([0, 1, 1, 0], dtype=bool)
>>> np.logical_or(a, b)
array([ True, True, True, False], dtype=bool)
>>> np.logical_and(a, b)
array([False, True, False, False], dtype=bool)
Transcendental operations#
Use np.sin
, np.cos
, np.tan
, np.log
and np.exp
>>> a = np.array([-1, 0, 1, 2])
>>> a
array([-1, 0, 1, 2])
>>> np.sin(a)
array([-0.84147098, 0. , 0.84147098, 0.90929743])
>>> np.log(a)
__main__:1: RuntimeWarning: divide by zero encountered in log
__main__:1: RuntimeWarning: invalid value encountered in log
array([ nan, -inf, 0. , 0.69314718])
>>> np.exp(a)
array([ 0.36787944, 1. , 2.71828183, 7.3890561 ])
Mimatch#
>>> a = np.arange(4)
>>> a + np.array([1, 2])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: operands could not be broadcast together with shapes (4,) (2,)
When the shapes of the arrays do not match they cannot be broadcast
Transpose#
Invert and reflect. Opposite on both axis.
Create a triangle with np.triu
(Use help(np.triu)
)
>>> np.triu(np.ones((3,3)),1)
array([[ 0., 1., 1.],
[ 0., 0., 1.],
[ 0., 0., 0.]])
then transpose with:
>>> a.T
array([[ 0., 0., 0.],
[ 1., 0., 0.],
[ 1., 1., 0.]])
Remember a transposition is a view, so when arrays become larger they will fail in unpredicatable ways
Extras#
np.allclose
- Returns True if two arrays are element-wise equal within a tolerance.np.tril
- Lower triangle of an array.
Basic Reductions#
Finding the sum of an array#
>>> a
array([ 0, 5, 10, 15, 20, 25])
>>> a.sum()
75
>>> np.sum(a)
75
On the axis
:
>>> a = np.array([[1,1], [2,2]])
>>> a
array([[1, 1],
[2, 2]])
Find sum of the column
along the y-axis
- first dimension
>>> a.sum(axis=0)
array([3, 3])
Find the sum of column
along the x-axis
- second dimension
:
>>> a.sum(axis=1)
array([2, 4])
Same idea at higher dimensions:
>>> x = np.random.rand(2, 2, 2)
>>> x
array([[[ 0.73091254, 0.3126328 ],
[ 0.52196148, 0.51212003]],
[[ 0.07157999, 0.15920737],
[ 0.75733851, 0.99707551]]])
>>> x.sum(axis=2)[0, 1]
1.0340815143357149
min
, max
and index of min and max
x = np.array([1, 3, 2])
>>> x.min()
1
>>> x.max()
3
Get the index of the min or max:
>>> x.argmin()
0
>>> x.argmax()
1
Logic operations#
>>> np.all([True, True, False])
False
>>> np.any([True, True, False])
True
Can be used with an argument:
>>> a = np.array([1, 2, 3, 2])
>>> b = np.array([2, 2, 3, 2])
>>> np.all(a < 4)
True
>>> np.any(a > 4)
False
>>> np.any(a > 3)
False
>>> np.any(a > 2)
True
With multiple conditions:
>>> a = np.array([1, 2, 3, 2])
>>> b = np.array([2, 2, 3, 2])
>>> c = np.array([6, 4, 4, 5])
>>> ((a <= b) & (b <= c))
array([ True, True, True, True], dtype=bool)
>>> ((a <= b) & (b <= c)).all()
True
Statistics#
y = np.array([[1, 2, 3], [5, 6, 1]])
The average or mean
:
>>> y.mean()
3.0
The median along the -1
last axis
>>> np.median(y, axis=-1)
array([ 2., 5.])
The standard devication
>>> x.std()
0.81649658092772603
cumsum is the cumulative sum
>>> y.cumsum(axis=0)
array([[1, 2, 3],
[6, 8, 4]])
>>> y.cumsum(axis=1)
array([[ 1, 3, 6],
[ 5, 11, 12]])
>>> y
array([[1, 2, 3],
[5, 6, 1]])
Remember in ipython you can run bash commands with !
in front
Eg. !cat data/populations.txt