python数据分析之csv/txt数据的导入和保存
约定:
import numpy as np
import pandas as pd
一、CSV数据的导入和保存
csv数据一般格式为逗号分隔,可在excel中打开展示。
示例 data1.csv:
A,B,C,D
1,2,3,a
4,5,6,b
7,8,9,c
代码示例:
# 当列索引存在时
x = pd.read_csv("data1.csv")
print x
"""
A B C D
0 1 2 3 a
1 4 5 6 b
2 7 8 9 c
"""
示例data2.csv:
1,2,3,a
4,5,6,b
7,8,9,c
代码示例:
# 当列索引不存在时,默认从0开始索引
x = pd.read_csv("data2.csv", header=None)
print x
"""
0 1 2 3
0 1 2 3 a
1 4 5 6 b
2 7 8 9 c
"""
# 设置列索引
x = pd.read_csv("data2.csv",names=["A","B","C","D"])
print x
"""
A B C D
0 1 2 3 a
1 4 5 6 b
2 7 8 9 c
"""
# 将一(多)列的元素作为行(多层次)索引
x = pd.read_csv("data2.csv",names=["A","B","C","D"],index_col="D")
print x
"""
A B C
D
a 1 2 3
b 4 5 6
c 7 8 9
"""
x = pd.read_csv("data2.csv",names=["A","B","C","D"],index_col=["D","C"])
print x
"""
A B
D C
a 3 1 2
b 6 4 5
c 9 7 8
"""
示例data3.csv:
A,B,C,D
1,2,3,
NULL,5,6,b
7,nan,Nan,c
代码示例:
# 一般NULL nan 空格 等自动转换为NaN
x = pd.read_csv("data3.csv", na_values=[])
print x
"""
A B C D
0 1.0 2.0 3 NaN
1 NaN 5.0 6 b
2 7.0 NaN Nan c
"""
# 将某个元素值设置为NaN
x = pd.read_csv("data3.csv", na_values=["Nan"])
print x
"""
A B C D
0 1.0 2.0 3.0 NaN
1 NaN 5.0 6.0 b
2 7.0 NaN NaN c
"""
# 在对应列上设置元素为NaN
setNaN = {"C":["Nan"],"D":["b","c"]}
x = pd.read_csv("data3.csv",na_values=setNaN)
print x
"""
A B C D
0 1.0 2.0 3.0 NaN
1 NaN 5.0 6.0 NaN
2 7.0 NaN NaN NaN
"""
# 保存数据到csv文件
x.to_csv("data3out.csv")
"""
data3out:
,A,B,C,D
0,1.0,2.0,3.0,
1,,5.0,6.0,
2,7.0,,,
"""
# 保存数据到csv文件,设置NaN的表示,去掉行索引,去掉列索引(header)
x.to_csv("data3out.csv",index=False,na_rep="NaN",header=False)
"""
data3out:
1.0,2.0,3.0,NaN
NaN,5.0,6.0,NaN
7.0,NaN,NaN,NaN
"""
x = pd.read_csv("data3out.csv",names=["W","X","Y","Z"])
print x
"""
W X Y Z
0 1.0 2.0 3.0 NaN
1 NaN 5.0 6.0 NaN
2 7.0 NaN NaN NaN
"""
二、txt数据的导入
txt文件中的数据通常以多个空格或者逗号等分割开。
示例data4.txt:
A B C
a 1 2 3
b 4 5 6
代码示例:
# 读取数据
x = pd.read_table("data4.txt", sep="s+") # sep:分隔的正则表达式
print x
"""
A B C
a 1 2 3
b 4 5 6
"""
示例data5.txt:
1.176813 3.167020
-0.566606 5.749003
0.931635 1.589505
-0.036453 2.690988
代码示例:
# 使用numpy读取txt
x = np.loadtxt("data5.txt", delimiter=" ") # 分隔符
print x
"""
[[ 1.176813 3.16702 ]
[-0.566606 5.749003]
[ 0.931635 1.589505]
[-0.036453 2.690988]]
"""
文件与代码
声明:该文观点仅代表作者本人,牛骨文系教育信息发布平台,牛骨文仅提供信息存储空间服务。
- 上一篇: java如何获取字符串的字节数
- 下一篇: Python实现数据库数据导入Excel