pandas和numpy笔记（不断更新）

pandas和numpy笔记

pandas笔记

pandas.read_csv不忽略首行

可以在read_csv函数中加入参数header=None，如下：

1	train_data = pd.read_csv("train.csv", header=None)

pandas.to_csv忽略index

可以在to_csv函数中加入参数index=False，如下：

1 2	# 忽略index df.to_csv("result.csv", index=False)

pandas去掉不是数字的列

1 2	# remove the non-numeric columns train_data = train_data._get_numeric_data()

pandas转numpy

1 2	# create a numpy array with the numeric values for input into scikit-learn to_nparray = train_data.as_matrix()

获取pandas.DataFrame第一列的数据

1 2	# DataFrame第一列 ID = test_data.iloc[:, [0]]

numpy笔记

获取numpy.array中最后一列的数据

1 2	y = to_nparray[:,-1] # 最后一列 X = to_nparray[:,0:-1] # 从第一列开始到倒数第二列

numpy.dtype：int转字符串

1 2	# np.dtype: int转字符串 res = np.char.mod('%d', res)

numpy拼接字符串array

1 2	# numpy拼接字符串 res = np.core.defchararray.add(prefix, res)

numpy按行合并两个numpy.array

1 2	# 按行合并两个numpy.array，ID和res是np.array data = np.hstack((ID, res))

numpy.array转DataFrame

1 2	# numpy.array转DataFrame df = DataFrame(data, columns=['ID', 'Pred'])

numpy产生随机矩阵

# 产生小数矩阵
np.random.random((shape0, shape1))
# 产生标准正态分布矩阵
np.random.randn(shape0, shape1)

numpy保留小数点后一位

1
2
3

# out: optional
np.around(matrix, decimals=1, out=None)
# 保留两位，将decimals改为2即可

Scipy笔记

求解正态分布

可以使用scipy.stats.norm求解正态分布，示例如下：

import numpy as np
from scipy.stats import norm

x = np.array([[1,2,3], [4,5,6]])
loc = np.mean(x) # 均值
scale = np.var(x) # 方差
print(norm.pdf(x, loc, scale)) # 概率密度函数

运行结果如下：

1 2	[[0.09472978 0.11983675 0.13478507] [0.13478507 0.11983675 0.09472978]]