用PYTHON进行字符串提取的两种方法
一、提取某两个标记之间的文本内容(多行)
for line in infile:
if line.strip() == "Start":
copy = True
elif line.strip() == "End":
copy = False
elif copy:
outfile.write(line)
有文本内容如下:
12345678 | fdsjhgjhgfdshkjhkStartGood MorningHello WorldEnddashjkhjkdsfjkhk |
我需要用Python实现——获取”Start”和”End”之间的内容并写入结果文件。
解决方法1:
copy = Falsefor line in infile:
if line.strip() == "Start":
copy = True
elif line.strip() == "End":
copy = False
elif copy:
outfile.write(line)
123456789 | withopen("/path/to/input")asinfile,open("/path/to/output","w")asoutfile: copy=False forlineininfile: ifline.strip()=="Start": copy=True elifline.strip()=="End": copy=False elifcopy: outfile.write(line) |
解决方法2:
1 2 3 4 5 6 7 | with open("input.txt") as myfile: content = myfile.read() text = re.search(r"Start .*?End", content, re.DOTALL).group() with open("output.txt", "w") as myfile2: myfile2.write(text) |
解决方法3:
123456 | importitertoolswithopen("input.txt","r")asf,open("output.txt","w")asfout: whileTrue: it=itertools.dropwhile(lambdaline:line.strip()!="Start",f) ifnext(it,None)isNone:break fout.writelines(itertools.takewhile(lambdaline:line.strip()!="End",it)) |
参考链接:
http://stackoverflow.com/questions/18865058/extract-values-between-two-strings-in-a-text-file-using-python
二、提取某两个字符串之间的内容(单行)
解决方法(字符串切片):
1 2 3 4 5 6 | """ get content between str1 and str2 in str """ def getBetween(str, str1, str2): strOutput = str[str.find(str1)+len(str1):str.find(str2)] return strOutput |
参考链接:
https://github.com/bfishadow/SBB
三、其它的实现方式
1 2 3 4 5 6 7 8 9 10 11 | sed-n"/Start/,/End/p"input.txt|grep-Ev"(Start|End)" sed-e"1,/Start/d"-e"/End/,$d"input.txt awk/Start/,/End/input.txt|grep-Ev"(Start|End)" awk"/Start/{flag=1;next} /End/{flag=0} flag{ print }"input.txt awk"/End/{flag=0} flag; /Start/{flag=1}"input.txt perl-lne"print if((/Start/../End/) && !(/Start/||/End/))"input.txt |
搜索关键字:
- awk print line between
参考链接:
- http://www.unix.com/shell-programming-and-scripting/48676-how-print-only-lines-between-two-strings-using-awk.html
- https://nixtip.wordpress.com/2010/10/12/print-lines-between-two-patterns-the-awk-way/
- http://www.shellhacks.com/en/Using-SED-and-AWK-to-Print-Lines-Between-Two-Patterns
- http://stackoverflow.com/questions/17988756/how-to-select-lines-between-two-marker-patterns-which-may-occur-multiple-times-w
=EOF=
AWKPYTHONSEDTIPS
声明: 除非注明,CrazyOf.me文章均为原创,转载请以链接形式标明本文地址,谢谢!
https://crazyof.me/blog/archives/2406.html
声明:该文观点仅代表作者本人,牛骨文系教育信息发布平台,牛骨文仅提供信息存储空间服务。
- 上一篇: python用正则表达式提取中文
- 下一篇: Python 正则表达式匹配字符串中的日期和时间