[学习交流] 【上海校区】Jupter Notebook中的魔法方法

%run的使用
加载不在项目包中的文件
项目中的包不使用%run，使用python的import语法即可
1. %run  myscript/hello.py

2. hello("balabala")    # 这样就可以直接调用hello.py中的函数了
%timeit的使用
会先自动把程序重复执行一定次数, 程序执行时间太短就会循环更多次
通过找到最快的3次循环，得到程序执行的时间
1. %timeit  L = [i**2 for i in range(1000)]

结果:  1000 loops, best of 3: 515 µs per  loop
%time的使用
不会把程序重复运行, 只执行一次得到程序的运行时间
cputime是计算机cpu和sys 计算所用时间的和
walltime是现实世界流逝的时间
得到的时间是不稳定的
在只需要知道大概的时间或者程序运行时间太长, 适用
1. %time L = [i**2 for i in range(1000)]

结果:  CPU times: user 452 µs, sys: 17 µs, total: 469 µs
   Wall time: 536 µs
如果想要测试整段代码, 可以使用%%
举例
%%time
for e in [1,2,3,4,6]:
  print(e)

二. Numpy数据基础
np.array的使用
1. import numpy as np
2. nparr = np.array([i for i in range(10)])
3. nparr

out: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

1. nparr[5]
out: 5

1. nparr[5]=100.0
2. nparr[5]
out: 100

1. nparr.dtype
out: dtype('int64')    # 之前的100.0被隐式地类型转换了
三. 创建Numpy数组和矩阵
创建全零矩阵
1. np.zeros(10) # 默认是浮点数类型
out: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

2. np.zeros(10, dtype=int) # 可以指定为 int 类型
out: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

3. np.zeros((3,5))    # 传入元组表示维度
out:array([[0., 0., 0., 0., 0.],
   [0., 0., 0., 0., 0.],
   [0., 0., 0., 0., 0.]])

4. np.zeros(shape=(3,5), dtype=int) # 3中其实省略了shape
out:array([[0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0]])
创建全一矩阵
1. np.ones((3,5)) # 用法和zeros一样
out: array([[1., 1., 1., 1., 1.],
   [1., 1., 1., 1., 1.],
   [1., 1., 1., 1., 1.]])
创建全部都为指定值的矩阵
1. np.full((3,5), 666) # 省略了参数名 np.full(shape=(3,5), full_value=666)
out:array([[666, 666, 666, 666, 666],
   [666, 666, 666, 666, 666],
   [666, 666, 666, 666, 666]])

# 这里 full_value是 666, out就会是int型, 如果给的是666.0, out就会是浮点型
np.arrange
1. np.arange(0,1,0.2)
out: array([0. , 0.2, 0.4, 0.6, 0.8])

# np.arange使用方法和python基础语法range一样, 得到前闭后开的数据集
# 但np.arange步长可以是浮点数, 二range只能是整数
np.linspace
1. np.linspace(0,20,10)
out:array([ 0.       ,  2.22222222,  4.44444444,  6.66666667,  8.88888889,
   11.11111111, 13.33333333, 15.55555556, 17.77777778, 20.       ])

# linspace  第三个参数是得到的数据个数, 而非步长
np.random.randint
1. np.random.randint(0, 10, 10) # 生成10个 0～10的随机数, 组成的向量
out:array([1, 8, 1, 4, 6, 8, 7, 1, 7, 1])

2. np.random.randint(4,8,size=(3,5)) # 生成3x5的随机矩阵
out:array([[7, 5, 4, 5, 5],
   [4, 6, 7, 6, 5],
   [5, 6, 6, 6, 7]])

3. np.random.seed(666)  # 指定随机种子
4. np.random.randint((4,8, size=(3,5))
out:array([[4, 6, 5, 6, 6],
   [6, 5, 6, 4, 5],
   [7, 6, 7, 4, 7]])

5. np.random.randint((4,8, size=(3,5)) # 在指定了随机种子后, 会生成一样的随机矩阵
out:array([[4, 6, 5, 6, 6],
   [6, 5, 6, 4, 5],
   [7, 6, 7, 4, 7]])


# 调试时, 希望重现一样的随机矩阵, 这时需要用到np.random.seed

np.random.random
1. np.random.random() # 生成0～1的随机浮点数
out:0.2811684913927954

2. np.random.random(10)
out:array([0.46284169, 0.23340091, 0.76706421, 0.81995656, 0.39747625,
   0.31644109, 0.15551206, 0.73460987, 0.73159555, 0.8578588 ])

3. np.random.random((3,5))
out:array([[0.76741234, 0.95323137, 0.29097383, 0.84778197, 0.3497619 ],
   [0.92389692, 0.29489453, 0.52438061, 0.94253896, 0.07473949],
   [0.27646251, 0.4675855 , 0.31581532, 0.39016259, 0.26832981]])

4. np.random.normal()    # 符合正态分布,均值为0 , 方差为1 的随机浮点数
out:0.7760516793129695

5. np.random.normal(10,100) # 指定均值为10, 方差为100
out:128.06359754812632

6. np.random.normal(0,1, (3,5))    # 均值0 方差1  size为3x5
out:array([[ 0.06102404,  1.07856138, -0.79783572,  1.1701326 ,  0.1121217 ],
   [ 0.03185388, -0.19206285,  0.78611284, -1.69046314, -0.98873907],
   [ 0.31398563,  0.39638567,  0.57656584, -0.07019407,  0.91250436]])
当对函数的参数不清楚时, 使用?
1. np.random.normal? # 带上？  就会跳出对应的文档

Docstring:
normal(loc=0.0, scale=1.0, size=None) # loc均值 scale方差

Draw random samples from a normal (Gaussian) distribution.

The probability density function of the normal distribution, first
derived by De Moivre and 200 years later by both Gauss and Laplace
independently [2]_, is often called the bell curve because of
its characteristic shape (see the example below)............

2. help(np.random.normal)
# 和使用?，效果类似，
# 但? 是为跳出一个小页面的说明文档
# help  会把说明文档放在 out: 后面
四. Numpy数组和矩阵的基本操作
属性
In:x1 = np.arange(10)
In:x2 = np.arange(15).reshape(3,5)

In:x1.ndim
out: 1    # ndim表示矩阵的维度

In:x2.ndim
out: 2

In:x1.shape
out: (10,)    # shape表示  矩阵是多少x多少的

In:x2.shape
out: (3,5)

In:x1.size
out: 10    # size表示矩阵中元素的个数

In:x2.size
out: 15
numpy.array 的数据访问
In:x2
out:array([[ 0,  1,  2,  3,  4],
   [ 5,  6,  7,  8,  9],
   [10, 11, 12, 13, 14]])

In:x2[0][0]
out:0    # [][] 的访问方法  是可行的，但不建议

In:x2[2,2]
out: 12 # 建议 [,]这样访问

# python切片的语法同样适用于np.array
In:x1
out: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In:x1[:4]
out: array([0, 1, 2, 3])

# 对于多维的访问必须使用[,] 不能使用[][]

In: x2[:2][:3]
out: array([[0, 1, 2, 3, 4],    # 得到的不是前两行的前3列
   [5, 6, 7, 8, 9]])

In: x2[:2,:3]
out: array([[0, 1, 2],          # 得到了想要的结果
   [5, 6, 7]])

# 子矩阵的修改会直接影响到父矩阵
# np.array中因为追求效率, 所以子矩阵是父矩阵的引用

In: subX = x2[:2,:3]
In: subX
out: array([[0, 1, 2],
   [5, 6, 7]])

In: subX[0,0] = 100000
In: subX
out: array([[100000,    1,    2],
   [    5,    6,    7]])

In: x2
out:array([[100000,    1,    2,    3,    4],
   [    5,    6,    7,    8,    9],
   [ 10,    11,    12,    13,    14]])       # 可以看到父矩阵的[0,0]元素也改变了

# 同理，父矩阵的元素改变, 子矩阵也会改变

# 如果想要子矩阵与父矩阵不互相影响，可以使用copy

In: subX = x2[:2, :3].copy()    # 这样子矩阵和父矩阵就不会相互影响了
reshape
In: x1.reshape(2,5)
out: array([[0, 1, 2, 3, 4],
   [5, 6, 7, 8, 9]])

# 想把 x1 变为10行, 有多少列  不清楚, 可以用-1代替
In: x1.reshape(10, -1)
out:array([[0],
   [1],
   [2],
   [3],
   [4],
   [5],
   [6],
   [7],
   [8],
   [9]])



In: x1.reshape(2, -1) # 变为2 行, 多少列不去管
out:array([[0, 1, 2, 3, 4],
   [5, 6, 7, 8, 9]])
五.Numpy数组(和矩阵)的合并与分割
合并
# 一维矩阵的合并
In:x = np.array([1,2,3])
In:y = np.array([3,2,1])

In:np.concatenate([x,y])
out:array([1, 2, 3, 3, 2, 1])

In:z = np.array([666,666,666])
In:np.concatenate([x,y,z])
out: array([  1, 2, 3, 3, 2, 1, 666, 666, 666])

# 二维矩阵  的合并
In: A = np.array([[1,2,3],
In:          [4,5,6]])
In: np.concatenate([A,A])    # 这里默认参数axis=0，即拓展行数
out: array([[1, 2, 3],
   [4, 5, 6],
   [1, 2, 3],
   [4, 5, 6]])


In: np.concatenate([A,A], axis=1) # 拓展列
out: array([[1, 2, 3, 1, 2, 3],
   [4, 5, 6, 4, 5, 6]])



# 不同维度矩阵的合并方法一
In: np.concatenate([A, z.reshape(1, -1)]) # 需要将z的维度由3行1列变为1行3列, 和A一样
out: array([[  1, 2, 3],
   [  4, 5, 6],
   [666, 666, 666]])

# 补充说明
[1,2,3]  向量
[[1,2,3]]  1行3列矩阵
[[1],[2],[3]]  3行1列矩阵

# 不同维度矩阵的合并方法二: vstack和hstack
In:np.vstack([A,z])    # A的列数和z的元素个数相等, 所以使用vstack垂直方向合并，并不需要对z进行reshape
out: array([[  1, 2, 3],
   [  4, 5, 6],
   [666, 666, 666]])

In:B = np.array([[777],[777]])    # [777,777]就表示向量, hstack会报错
In:B
out:array([[777],
   [777]])

In: np.hstack([A,B])
out: array([[  1, 2, 3, 777],
   [  4, 5, 6, 777]])
分割
一维的分割
In:x = np.arange(10)
In:x1, x2, x3 = np.split(x, [3, 7])

In:x1
out:array([0, 1, 2])

In:x2
out:array([3, 4, 5, 6])

In:x3
out:array([7, 8, 9])
多维的分割
# 行方向的分割
In:A = np.arange(16).reshape((4,4))
In:A
out:array([[ 0,  1,  2,  3],
   [ 4,  5,  6,  7],
   [ 8,  9, 10, 11],
   [12, 13, 14, 15]])

In:A1, A2 = np.split(A, [2])    # 或者  upper, lower = np.vsplit(A, [2])
In:A1
out:array([[0, 1, 2, 3],
   [4, 5, 6, 7]])

In:A2
out:array([[ 8,  9, 10, 11],
   [12, 13, 14, 15]])


# 列方向的分割
In:A1, A2 = np.split(A, [2], axis=1) # axis=1指定列分割  或者 left,right = np.hsplit(A, [2])
In:A1
out:array([[ 0,  1],
   [ 4,  5],
   [ 8,  9],
   [12, 13]])

In:A2
out:array([[ 2,  3],
   [ 6,  7],
   [10, 11],
   [14, 15]])
六. Numpy中矩阵的运算
对矩阵的元素的元素
# python原生list 实现将矩阵的元素都x2
n = 10
L = [i for i in range(n)]
A = [2*e for e in L]

# numpy.array实现将矩阵的元素都x2
import numpy as np
L = np.arange(n)
A = np.array(2*e for e in L)

# np.array  如果用%%time测试运行时间, 发现比python的list实现，快得多
# np.array 可以直接*2, 符合矩阵的运算结果, python的list*2 则是list+list  连起来

A = L*2 # 可以直接得到每个元素*2的新矩阵

# np.array  几乎支持其他所有运算  如
L+1    每个元素+1
L-1
L*2
L/3
1/L    每个元素都变为  倒数
L//5    每个元素整除5
L**3    每个元素幂运算
L%7       每个元素求余数
对矩阵间的运算
运算符进行的是矩阵元素间的运算
In:A=np.arange(4).reshape(2,2)
In:A
out:array([[0, 1],
   [2, 3]])

In:B=np.full((2,2), 10)
In:B
out:array([[10, 10],
   [10, 10]])

In:A+B
out:array([[10, 11],
   [12, 13]])

In:A-B
out:array([[-10,  -9],
   [ -8,  -7]])

In:A*B
out:array([[ 0, 10],
   [20, 30]])

In:A/B
out:array([[0. , 0.1],
   [0.2, 0.3]])


#  可以看到都是元素间进行运算, 而不是矩阵的运算
矩阵运算
# 矩阵相乘
In: A.dot(B)
out:array([[10, 10],
   [50, 50]])


# 矩阵的转置
In: A.T
out: array([[0, 2],
   [1, 3]])

向量和矩阵的运算
In:v=np.array([1,2])
In:A
out:array([[0, 1],
   [2, 3]])

In:v+A    # 向量直接与矩阵相加, 会让矩阵的每一行都加上该向量
out:array([[1, 3],
   [3, 5]])



In: np.vstack([v]*A.shape[0]) # 或者可以把向量变为维度和A一样的矩阵再进行相加
out:array([[1, 2],
   [1, 2]])

In:np.tile(v, (2,1))       # tile也可以改变维度, 2表示复制为2行, 1表示复制为1列, 即列方向不拓展
out:array([[1, 2],
   [1, 2]])

In: v*A    # 这样的到是 v与A的每一行乘，不是矩阵的乘法
out:array([[0, 2],
   [2, 6]])

In: v.dot(A)    # 想要矩阵相乘依然使用dot
out:array([4, 7])

In: A.dot(v)    # v是向量, numpy会自动把v变为2*1的矩阵
out:array([2, 8])
矩阵的逆
In:A
out:array([[0, 1],
   [2, 3]])

In:invA = np.linalg.inv(A) # 得到A的逆矩阵 invA
In:invA
out:array([[-1.5,  0.5],
   [ 1. ,  0. ]])

In: A.dot(invA) # 原矩阵乘逆矩阵  得到单位矩阵
out:array([[1., 0.],
   [0., 1.]])

七. Numpy的聚合运算
# 聚合

import numpy as np
L = np.random.random(100)    # 生成100个元素的随机数组
L.dtype
out:dtype('float64')

np.sum(L) # 求和
out:48.17857390434729

sum(L)    # python的sum 也可以实现，但速度比np.sum  慢非常多
out:48.178573904347246

np.min(L) # 找出最小值
out:0.007983418226231609

np.max(L)    # 找出最大值
out:0.9875521184825461

X= np.arange(16).reshape(4,-1)
X
out:array([[ 0,  1,  2,  3],
   [ 4,  5,  6,  7],
   [ 8,  9, 10, 11],
   [12, 13, 14, 15]])

np.sum(X) # 默认为  对所有元素求和
out:120

np.sum(X, axis=0) # 对每一列  求和，  axis=0可以理解为压缩为一行
out:array([24, 28, 32, 36])

np.sum(X, axis=1) # 对每一行求和,  axis=1可以理解为  压缩为一列
out:array([ 6, 22, 38, 54])

np.prod(X) # 所有元素的乘积
out:0

np.mean(X) # 所有元素均值
out:7.5

np.median(X) # 所有元素中位数
out:7.5

np.percentile(X, q=50) # 在50%的地方找到百分位点和np.median(X) 等价
out:7.5

np.percentile(X, q=90)    # 在90% 的地方找到百分位点
out:13.5

np.var(X) # 方差
out:21.25

np.std(X) # 标准差
out:4.6097722286464435
八. Numpy中的索引运算(arg运算)
找到最值的索引
x = np.random.normal(0,1,size=1000000) # 生成均值为0 方差为1 的正态分布点
np.min(x) # 得到了最小值, 但想知道该值在哪个位置
out:-5.024177664592925

np.argmin(x) # 找到最小值的索引位置
out:628924

#同样的有argmax
排序和使用索引
# 排序
x = np.arange(16)
x
out:array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

np.random.shuffle(x) # shuffle 将x中元素顺序打乱
x
out:array([ 3,  4,  0,  7, 11,  8, 12, 15,  5, 14, 13,  1,  9,  6, 10,  2])

np.sort(x)    # 得到一个新的排好序的数列
out:array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

x.sort()       # 在原来的x中排序
x
out:array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

# arg排序
np.random.shuffle(x) # 打乱顺序
x
out:array([13, 10,  7,  0, 12,  5, 14, 15,  9,  2,  3,  6, 11,  4,  8,  1])

np.argsort(x)    # 得到的是索引, 按索引找到的元素, 都是排序好的元素
out:array([ 2, 11,  9, 15,  1, 10, 14,  7,  4,  3,  5,  6, 12,  8,  0, 13])

# partition操作
np.partition(x,3) # 小于3的放3前面大于3的放3后面, 和快速排序中的partition一致
out:array([ 1,  0,  2,  3,  4,  6,  7,  5,  8,  9, 10, 12, 11, 13, 14, 15])

np.argpartition(x,3) # 同样有 arg方法
out:array([15,  3,  9, 10, 13, 11,  2,  5, 14,  8,  1,  4, 12,  0,  6,  7])

# 多维的排序
X = np.random.randint(10, size=(4,4)) # 生成4x4的随机矩阵
X
out:array([[2, 1, 7, 8],
   [0, 8, 7, 3],
   [7, 6, 1, 9],
   [0, 2, 9, 8]])

np.sort(X) # 默认axis=1
out:array([[1, 2, 7, 8],
   [0, 3, 7, 8],
   [1, 6, 7, 9],
   [0, 2, 8, 9]])

np.sort(X, axis=1) # 沿着列的方向  对每一行排序
out:array([[1, 2, 7, 8],
   [0, 3, 7, 8],
   [1, 6, 7, 9],
   [0, 2, 8, 9]])

np.sort(X, axis=0) # 沿着行的方向  对每一行列序
out:array([[0, 1, 1, 3],
   [0, 2, 7, 8],
   [2, 6, 7, 8],
   [7, 8, 9, 9]])
九. Numpy中的比较和Fancy indexing
Fancy indexing
import numpy as np

x = np.arange(16)
x
out:array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

[x[3], x[5],x[8]]    # 得到新的数组, 但numpy有更加方便的方法
out: [3, 5, 8]

ind = [3,5,8]    # numpy提供的方法, 即fancy indexing
x[ind]
out:array([3, 5, 8])

# 多维的Fancy indexing
X = x.reshape(4,-1)
X
out:array([[ 0,  1,  2,  3],
   [ 4,  5,  6,  7],
   [ 8,  9, 10, 11],
   [12, 13, 14, 15]])

row = np.array([0,1,2]) #现在想要得到3个点, row中存放这3个点的所在行数
col = np.array([1,2,3]) # col 放3个点所在列数
X[row, col]
out:array([ 1,  6, 11])

X[0, col]
out:array([1, 2, 3])

X[:2, col] # 前两行的  对应col  得到数值
out:array([[1, 2, 3],
   [5, 6, 7]])

X[1:3,col]
out:array([[ 5,  6,  7],
   [ 9, 10, 11]])

numpy.array的比较
x
out:array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

x > 3
out:array([False, False, False, False,  True,  True,  True,  True,  True,
      True,  True,  True,  True,  True,  True,  True])

x < 3
out:array([ True,  True,  True, False, False, False, False, False, False,
   False, False, False, False, False, False, False])

x >= 3
out:array([False, False, False,  True,  True,  True,  True,  True,  True,
      True,  True,  True,  True,  True,  True,  True])

x <= 3
out:array([ True,  True,  True,  True, False, False, False, False, False,
   False, False, False, False, False, False, False])

x == 3
out:array([False, False, False,  True, False, False, False, False, False,
   False, False, False, False, False, False, False])



# 更加复杂的
2*x == 24 - 4*x
out:array([False, False, False, False,  True, False, False, False, False,
   False, False, False, False, False, False, False])


# x中有多少元素 <=3
np.sum(x<=3)
out:4

# x中有多少元素 <=3  的另一种方法
np.count_nonzero(x<=3)
out:4

# x中是否存在0
np.any(x == 0)
out: True

# 值>10 且为偶数的  元素个数
np.sum((x % 2==0) | (x>10))
out:11

不二晨 · 不二晨

奈斯，加油

帐号		自动登录	找回密码
密码			加入黑马

[学习交流] 【上海校区】Jupter Notebook中的魔法方法

1 个回复