牛骨文教育服务平台(让学习变的简单)
博文笔记

python数据分析之csv/txt数据的导入和保存

创建时间:2017-01-25 投稿人: 浏览次数:14295

约定:

import numpy as np
import pandas as pd

一、CSV数据的导入和保存

csv数据一般格式为逗号分隔,可在excel中打开展示。

示例 data1.csv:

A,B,C,D
1,2,3,a
4,5,6,b
7,8,9,c

代码示例:

# 当列索引存在时
x = pd.read_csv("data1.csv") 
print x
"""
   A  B  C  D
0  1  2  3  a
1  4  5  6  b
2  7  8  9  c
"""

示例data2.csv:

1,2,3,a
4,5,6,b
7,8,9,c

代码示例:

# 当列索引不存在时,默认从0开始索引
x = pd.read_csv("data2.csv", header=None) 
print x
"""
   0  1  2  3
0  1  2  3  a
1  4  5  6  b
2  7  8  9  c
"""

# 设置列索引
x = pd.read_csv("data2.csv",names=["A","B","C","D"]) 
print x
"""
   A  B  C  D
0  1  2  3  a
1  4  5  6  b
2  7  8  9  c
"""

# 将一(多)列的元素作为行(多层次)索引 
x = pd.read_csv("data2.csv",names=["A","B","C","D"],index_col="D") 
print x
"""
   A  B  C
D         
a  1  2  3
b  4  5  6
c  7  8  9
"""
x = pd.read_csv("data2.csv",names=["A","B","C","D"],index_col=["D","C"]) 
print x
"""
     A  B
D C      
a 3  1  2
b 6  4  5
c 9  7  8
"""

示例data3.csv:

A,B,C,D
1,2,3,
NULL,5,6,b
7,nan,Nan,c

代码示例:

# 一般NULL nan 空格 等自动转换为NaN
x = pd.read_csv("data3.csv", na_values=[])
print x
"""
     A    B  C    D
0  1.0  2.0  3  NaN
1  NaN  5.0  6    b
2  7.0  NaN  Nan  c
"""

# 将某个元素值设置为NaN
x = pd.read_csv("data3.csv", na_values=["Nan"])
print x
"""
     A    B    C    D
0  1.0  2.0  3.0  NaN
1  NaN  5.0  6.0    b
2  7.0  NaN  NaN    c
"""

# 在对应列上设置元素为NaN
setNaN = {"C":["Nan"],"D":["b","c"]}
x = pd.read_csv("data3.csv",na_values=setNaN)
print x
"""
     A    B    C   D
0  1.0  2.0  3.0 NaN
1  NaN  5.0  6.0 NaN
2  7.0  NaN  NaN NaN
"""

# 保存数据到csv文件
x.to_csv("data3out.csv")
"""
data3out:
,A,B,C,D
0,1.0,2.0,3.0,
1,,5.0,6.0,
2,7.0,,,
"""
# 保存数据到csv文件,设置NaN的表示,去掉行索引,去掉列索引(header)
x.to_csv("data3out.csv",index=False,na_rep="NaN",header=False)
"""
data3out:
1.0,2.0,3.0,NaN
NaN,5.0,6.0,NaN
7.0,NaN,NaN,NaN
"""
x = pd.read_csv("data3out.csv",names=["W","X","Y","Z"])
print x
"""
     W    X    Y   Z
0  1.0  2.0  3.0 NaN
1  NaN  5.0  6.0 NaN
2  7.0  NaN  NaN NaN
"""

二、txt数据的导入

txt文件中的数据通常以多个空格或者逗号等分割开。

示例data4.txt:

    A    B    C
a   1    2    3
b   4    5    6

代码示例:

# 读取数据
x = pd.read_table("data4.txt", sep="s+") # sep:分隔的正则表达式
print x
"""
   A  B  C
a  1  2  3
b  4  5  6
"""

示例data5.txt:

1.176813    3.167020
-0.566606   5.749003
0.931635    1.589505
-0.036453   2.690988

代码示例:

# 使用numpy读取txt
x = np.loadtxt("data5.txt", delimiter="	") # 分隔符
print x
"""
[[ 1.176813  3.16702 ]
 [-0.566606  5.749003]
 [ 0.931635  1.589505]
 [-0.036453  2.690988]]
"""

文件与代码

声明:该文观点仅代表作者本人,牛骨文系教育信息发布平台,牛骨文仅提供信息存储空间服务。