630 likes | 757 Vues
This outline provides a comprehensive overview of NumPy, a fundamental library for numerical computing in Python. It covers critical aspects such as creating, resizing, and indexing arrays, as well as performing operations on one and two arrays. The document also includes an introduction to linear algebra with NumPy, highlighting its efficient, vectorized computations similar to MATLAB. The user guide and documentation links are provided for further learning. This serves as an essential resource for anyone looking to leverage NumPy in statistical natural language processing.
E N D
LING / C SC 439/539Statistical Natural Language Processing Numerical Python
Outline • Overview of NumPy • Creating arrays • Resizing arrays • Indexing and selection • Operations on one array • Operations on two arrays • Linear algebra
Numerical Python • Module that can be imported in Python • Allows for: • Datatypes for vectors and matrices (called Arrays) • Vectorized computations, similar to MATLAB • Highly efficient; calls numerical libraries coded in C • Code looks much more like math • Fewer explicitly coded loops • Results in concise code
Vectorized computing • Standard Python: L = [1,2,3,4,5] L2 = [] for i in range(len(L)): L2.append(L[i] * 3) • NumPy: L = array([1,2,3,4,5]) L2 = L * 3
NumPy documentation • NumPyUser Guide • http://docs.scipy.org/doc/ • Guide to NumPy by Travis Oliphant (creator of NumPy) • http://www.tramy.us/guidetoscipy.html
First, import NumPy >>> from numpy import *
help(functionname) >>> help(eye) Help on function eye in module numpy.lib.twodim_base: eye(N, M=None, k=0, dtype=<type 'float'>) Return a 2-D array with ones on the diagonal and zeros elsewhere. Parameters ---------- N : int Number of rows in the output. M : int, optional Number of columns in the output. If None, defaults to `N`. k : int, optional Index of the diagonal: 0 refers to the main diagonal, a positive value refers to an upper diagonal, and a negative value to a lower diagonal. dtype : dtype, optional Data-type of the returned array.
help(functionname) Returns ------- I : ndarray (N,M) An array where all elements are equal to zero, except for the `k`-th diagonal, whose values are equal to one. See Also -------- diag : Return a diagonal 2-D array using a 1-D array specified by the user. Examples -------- >>> np.eye(2, dtype=int) array([[1, 0], [0, 1]]) >>> np.eye(3, k=1) array([[ 0., 1., 0.], [ 0., 0., 1.], [ 0., 0., 0.]])
Outline • Overview of NumPy • Creating arrays • Resizing arrays • Indexing and selection • Operations on one array • Operations on two arrays • Linear algebra
Arrays in NumPy • All these are arrays in NumPy: • One-dimensional vector • Two-dimensional matrix • Higher-dimensional matrix
Creating a vector (one-dimensional array) >>> v = array([1,2,3,4,5]) >>> v array([1, 2, 3, 4, 5]) >>> ndim(v) # number of dimensions 1 >>> shape(v) # 5 elements in first dim. (5,) >>> size(v) # total number of elements 5
Creating a matrix (two-dimensional array) >>> a = array([[1,2,3],[4,5,6]]) >>> a array([[1, 2, 3], [4, 5, 6]]) >>> ndim(a) # number of dimensions 2 >>> a.shape # 2 rows, 3 columns (2, 3) >>> size(a) # total number of elements 6
When coding comma-separated types in Python (e.g. arrays or lists), can press enter after a comma >>> # these all produce the same result: >>> a = array([[1,2,3],[4,5,6]]) >>> a = array([[1, 2, 3], [4, 5, 6]]) >>> a = array([[1, 2, 3], [4, 5, 6]])
Calling functions vs. object attributes >>> a.shape (2, 3) >>> shape(a) (2, 3) • Produces same results whether you pass in object to function, or access the object’s attribute • The function accesses the object‘s attribute • Both can be used interchangeably • But in cases where a function is defined in another module, you’ll want to access the function through the object • You’ll see this later with max Also: a.ndima.size ndim(a) size(a)
Special functions to create matrices(2-d arrays) >>> ones((2,3)) array([[ 1., 1., 1.], [ 1., 1., 1.]]) >>> zeros((2,3)) array([[ 0., 0., 0.], [ 0., 0., 0.]]) >>> eye(3) array([[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.]])
Type of an array >>> a array([[1, 2, 3], [4, 5, 6]]) >>> a.dtype dtype('int32') >>> b = ones([2,3]) >>> b array([[ 1., 1., 1.], [ 1., 1., 1.]]) >>> b.dtype dtype('float64')
linspace >>> # vector with linearly spaced values >>> # linspace(start, stop, num values) >>> # function determines spacing of vals for you >>> linspace(3, 16, 5) array([ 3. , 6.25, 9.5 , 12.75, 16. ]) >>> linspace(15, 19, 4) array([ 15., 16.33333333, 17.66666667, 19.])
Arrays of random numbers >>> random.rand(2,3) # uniformly dist. between 0 and 1 array([[ 0.49386404, 0.12125634, 0.58045141], [ 0.80695113, 0.32188799, 0.63249074]]) >>> random.randn(2,3) # normal dist., mean=0, var=1 array([[-0.37422103, 1.03866716, -0.53547127], [ 0.30022273, 0.23015563, 0.80873554]])
Arrays of random numbers >>> # 2 x 3 matrix, uniformly dist. between 5 and 7 >>> random.uniform(5, 7, (2, 3)) array([[ 6.50654571, 5.77650203, 6.68806597], [ 6.29241871, 6.45282975, 6.4707847 ]]) >>> # 4 x 3 matrix, rand. ints between 3 and 6 >>> random.randint(3, 6, (4, 3)) array([[3, 3, 3], [5, 3, 5], [4, 3, 3], [5, 3, 3]])
Outline • Overview of NumPy • Creating arrays • Resizing arrays • Indexing and selection • Operations on one array • Operations on two arrays • Linear algebra
Shape of an array >>> a.shape # or shape(a) (2, 3) >>> nrow = a.shape[0] >>> ncol = a.shape[1] >>> nrow 2 >>> ncol 3 >>> zeros(a.shape) # new array w/ same shape array([[ 0., 0., 0.], [ 0., 0., 0.]])
Transpose >>> a array([[1, 2, 3], [4, 5, 6]]) >>> transpose(a) # or a.transpose() array([[1, 4], [2, 5], [3, 6]]) >>> a # didn’t change it array([[1, 2, 3], [4, 5, 6]]) >>> a = transpose(a) # need to assign to variable >>> a array([[1, 4], [2, 5], [3, 6]])
Reshaping an array >>> a # 2 x 3 matrix array([[1, 2, 3], [4, 5, 6]]) >>> reshape(a, (3, 2)) # 3 x 2 matrix array([[1, 2], [3, 4], [5, 6]]) >>> reshape(a, (1,6)) # 1 x 6 matrix array([[1, 2, 3, 4, 5, 6]])
Concatenation >>> a = array([[1,2,3],[4,5,6]]) >>> a array([[1, 2, 3], [4, 5, 6]]) >>> b = zeros(a.shape) >>> b array([[ 0., 0., 0.], [ 0., 0., 0.]])
Concatenation >>> # note that it’s converted to float >>> concatenate((a,b), axis=0) array([[ 1., 2., 3.], [ 4., 5., 6.], [ 0., 0., 0.], [ 0., 0., 0.]]) >>> concatenate((a,b), axis=1) array([[ 1., 2., 3., 0., 0., 0.], [ 4., 5., 6., 0., 0., 0.]])
Try to concatenate a matrix with a vector >>> a array([[1, 2, 3], [4, 5, 6]]) >>> c = arange(3) >>> c array([0, 1, 2]) >>> concatenate((a,c), axis=0) Traceback (most recent call last): File "<pyshell#270>", line 1, in <module> concatenate((a,c), axis=0) ValueError: arrays must have same number of dimensions
Convert vector to matrix before concatenating >>> c.shape # one-dimensional (3,) >>> a.shape # two-dimensional (2, 3) >>> array([c]) array([[0, 1, 2]]) >>> concatenate((a, array([c])), axis=0) array([[1, 2, 3], [4, 5, 6], [0, 1, 2]])
Turn matrix into 1-d vector >>> a array([[1, 2, 3], [4, 5, 6]]) >>> ravel(a) array([1, 2, 3, 4, 5, 6])
append: for vectors >>> a = array([1,2,3]) >>> a = append(a, 4) >>> a array([1, 2, 3, 4]) >>> append(a, array([7,8,9])) array([1, 2, 3, 4, 7, 8, 9])
Outline • Overview of NumPy • Creating arrays • Resizing arrays • Indexing and selection • Operations on one arrays • Operations on two arrays • Linear algebra
Indexing vectors >>> b = array([3,5,7,9,11,13,15]) >>> b array([ 3, 5, 7, 9, 11, 13, 15]) >>> b[0] 3 >>> b[5:] array([13, 15]) >>> b[[0,5,2]] # indices can be in any order array([ 3, 13, 7])
Indexing arrays • Let M be a matrix of size m x n • m rows • n columns • m * n total elements • Mi,jis the entry of M at row i and column j >>> a = array([[1, 2, 3], [4, 5, 6]]) >>> a[0,1] # value at row 1, column 2 2
Indexing arrays >>> b = reshape(arange(12), (3,4)) >>> b array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) >>> b[0,:] # first row, all cols array([0, 1, 2, 3]) >>> b[1:,:] # second row to end, all cols array([[4, 5, 6, 7], [8, 9, 10, 12]]) >>> b[[0,2],:] # first & third rows, all cols array([[ 0, 1, 2, 3], [ 8, 9, 10, 11]])
Indexing arrays >>> b array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) >>> # all rows, cols 2 through 4 (exclusive) >>> b[:,1:3] array([[ 1, 2], [ 5, 6], [ 9, 10]]) >>> b[2,0] # third row, first column 6
Logical selection >>> a array([[1, 2, 3], [4, 5, 6]]) >>> a%2==0 array([[False, True, False], [ True, False, True]], dtype=bool) >>> # get even values of a >>> a[a%2==0] # returns a vector array([2, 4, 6])
Logical selection >>> # where(condition, x, y): >>> # when True, return x, else return y >>> where(a%2==0, 1, -1) # returns a matrix array([[-1, 1, -1], [ 1, -1, 1]]) >>> where(a%2==0, a, 0) # returns a matrix array([[0, 2, 0], [4, 0, 6]])
Unique >>> r = random.randint(0,5, (9,)) >>> r array([0, 3, 2, 2, 2, 1, 1, 4, 3]) >>> unique(r) array([0, 1, 2, 3, 4])
Outline • Overview of NumPy • Creating arrays • Resizing arrays • Indexing and selection • Operations on one array • Operations on two arrays • Linear algebra
Modifying entries in an array >>> a array([[1, 2, 3], [4, 5, 6]]) >>> a[1,2] = 0 >>> a array([[1, 2, 3], [4, 5, 0]])
Modifying entries in an array >>> a[1,:] = array([7,8,9]) >>> a array([[1, 2, 3], [7, 8, 9]]) >>> a[:,0:2] = array([[-1,-2],[-3,-4]]) >>> a array([[-1, -2, 3], [-3, -4, 9]])
Append an array >>> a = array([1,2,3]) >>> a = append(a, 4) >>> a array([1, 2, 3, 4]) >>> append(a, array([7,8,9])) array([1, 2, 3, 4, 7, 8, 9])
Sum >>> a array([[1, 2, 3], [4, 5, 6]]) >>> sum(a, axis=0) # sum over columns array([5, 7, 9]) >>> sum(a, 1) # sum over rows array([ 6, 15]) >>> sum(a) 21 >>> sum(sum(a)) # often in Marsland’s code 21
Elementwise numerical operations >>> a + 1 array([[2, 3, 4], [5, 6, 7]]) >>> a**2 array([[ 1, 4, 9], [16, 25, 36]]) >>> sqrt(a) array([[ 1. , 1.41421356, 1.73205081], [ 2. , 2.23606798, 2.44948974]])
Division >>> a array([[1, 2, 3], [4, 5, 6]]) >>> a / 3 array([[0, 0, 1], [1, 1, 2]]) >>> a / 3.0 array([[ 0.33333333, 0.66666667, 1. ], [ 1.33333333, 1.66666667, 2. ]])
Try to call max >>> max(a) # calling built-in function! Traceback (most recent call last): File "<pyshell#323>", line 1, in <module> max(a) ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
max and min >>> a array([[1, 2, 3], [4, 5, 6]]) >>> # below we call max, min in NumPy >>> a.max(axis=0) # max for each column array([4, 5, 6]) >>> a.max(axis=1) # max for each row array([3, 6]) >>> a.min(1) # min for each row array([1, 4])
argmax and argmin >>> r = random.randint(0,20, (3,4)) >>> r array([[18, 3, 12, 7], [ 2, 12, 5, 4], [ 5, 8, 19, 15]]) >>> # find the index with the max value >>> argmax(r) # returns index as 1-d vector 10 >>> ravel(r) array([18, 3, 12, 7, 2, 12, 5, 4, 5, 8, 19, 15]) >>> ravel(r)[argmax(r)] 19
Sorting >>> r = random.randint(0, 10, (3, 4)) >>> r array([[3, 8, 8, 1], [4, 5, 7, 7], [1, 1, 2, 8]]) >>> sort(r, axis=0) # sort each column array([[1, 1, 2, 1], [3, 5, 7, 7], [4, 8, 8, 8]]) >>> sort(r, 1) # sort each row array([[1, 3, 8, 8], [4, 5, 7, 7], [1, 1, 2, 8]])
argsort >>> q = array(['C','B','E','A','D']) >>> argsort(q) array([3, 1, 0, 4, 2]) >>> q[argsort(q)] array(['A', 'B', 'C', 'D', 'E'], dtype='|S1')
argsort C B E A D Original array 1 2 3 4 0 A B C D E Sorted array 1 2 3 4 0 Index of item in original array 1 0 4 2 3