python简单速成,一行代码写爬虫
当初让我学python,我是拒绝的,因为我喜欢java,不能你让我学我就去学。但是队友跑路了,甩的摊子我来接,就速成了。
其实java也可以写爬虫,然而我并没有试过,不过这次爬虫需要布置在Django的后台上,但是代码思路都是一样的。
想深入学习建议看《python for informatic》 熟练学习requests ,urllib,urllib2,re模块。
不说废话了,直接切要害,谈速成。(对了,我学Python没几天,老鸟绕道。)
平时我们上网是通过浏览器上的点击、输入来向服务器发出请求、传输消息。
代码实现的爬虫其实原理就是直接对网站的服务器发出请求、传输消息。网站是通过ip和cookie来判断用户是谁的。cookie在保持登陆状态需要用到
最基本的爬虫
import urllib2
print urllib2.urlopen("https://msdn.microsoft.com/magazine/default.aspx").read()
#没错,如果不算导入模块的话,只有一行代码。
#中文乱码的话,在windows下需要用到字码转型decode().encode(),
#decode()是解码,根据所爬网站的编码能看到对应的编码格式
#encode()是编码 即read().decode("gbk").encode("utf-8")
一般情况下直接爬下来的,是网页的源码,大多数网页都是html的框架,标签语言很适合用正则re来提取所需信息。不会正则建议自己去看一下。
但是基本上re.findall(pattern,string)方法够用,pattern是目标信息的大体格式,string是被检测字符。
接下来是模拟登陆。
模拟登陆、以及网上抢单、抢票一类的爬虫是好实现的,关键在于两点,抓包分析传输的数据和用机器学习来破解验证码(这个我还没做完,做完再更新)。
模拟登陆 ,无非是post ,用抓包软件看一下Post 的链接和表单里的内容。windows下Fiddler还是很好用的。(传送门:http://www.telerik.com/fiddler)
Fiddler教程(http://kb.cnblogs.com/page/130367/)十分钟看完
爬虫思路
# -*- coding:utf-8 -*-
#authonr : Max
import urllib2
import urllib
import cookielib
import re
#登录所需的url
url = "http://202.195.144.163/jndx/default5.aspx"
filename = "cookie.txt"
#新建cookie来保存登录状态
cookie = cookielib.MozillaCookieJar(filename)
#建立opener 相当于一个浏览器
#调用urllib2.HTTPCookieProcessor()处理cookie
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))
account = "1070414532"
password = "511623191111111111"
#看完fiddler的教程就知道headers data的作用了,大体就是服务器识别,post传输信息
postdata = {"__VIEWSTATE":"dDwtNTgxODgzNDk1O3Q8O2w8aTwxPjs+O2w8dDw7bDxpPDQ+Oz47bDx0PHA8O3A8bDxvbmNsaWNrOz47bDx3aW5kb3cuY2xvc2UoKVw7Oz4+Pjs7Pjs+Pjs+Pjs+0L9OGiPtTSMlqZUfLGSIwTyi9hc=",
"TextBox1":account,"TextBox2":password,"RadioButtonList1":"ѧ��","Button1":""}
headers = {"Connection": "keep-alive","User-Agent":"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1;Miser Report)"}
data = urllib.urlencode(postdata)
request = urllib2.Request(url,data,headers)
response = opener.open(request).read().decode("gb2312").encode("utf-8")
print response
pattern = "<span id="xhxm">.*? (.*?)同学</span></em>"
name = re.findall(pattern,response)
print name[0].decode("utf-8").encode("gb2313")
cookie.save(ignore_discard=True,ignore_expires=True)
for item in cookie:
print "Cookie.name="+item.name
print "Cookie.value="+item.value
postdata2 = {"__EVENTTARGET":"xqd","__EVENTARGUMENT":"","__VIEWSTATE":"dDwtMTY3ODA2Njg2OTt0PDtsPGk8MT47PjtsPHQ8O2w8aTwxPjtpPDI+O2k8ND47aTw3PjtpPDk+O2k8MTE+O2k8MTM+O2k8MTU+O2k8MjE+O2k8MjM+O2k8MjU+O2k8Mjc+O2k8Mjk+O2k8MzE+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPDEyMDEyLTIwMTMwOz4+Oz47Oz47dDx0PHA8cDxsPERhdGFUZXh0RmllbGQ7RGF0YVZhbHVlRmllbGQ7PjtsPHhuO3huOz4+Oz47dDxpPDM+O0A8MjAxNi0yMDE3OzIwMTUtMjAxNjsyMDE0LTIwMTU7PjtAPDIwMTYtMjAxNzsyMDE1LTIwMTY7MjAxNC0yMDE1Oz4+O2w8aTwwPjs+Pjs7Pjt0PHQ8OztsPGk8MT47Pj47Oz47dDxwPHA8bDxUZXh0Oz47bDzlrablj7fvvJoxMDcwNDE0NTMyOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzlp5PlkI3vvJrlkajnp5Hnvr07Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPOWtpumZou+8mueJqeiBlOe9keW3peeoi+WtpumZojs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w85LiT5Lia77ya6Ieq5Yqo5YyWOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzooYzmlL/nj63vvJroh6rliqjljJYxNDA1Oz4+Oz47Oz47dDw7bDxpPDE+Oz47bDx0PEAwPDs7Ozs7Ozs7Ozs+Ozs+Oz4+O3Q8cDxsPFZpc2libGU7PjtsPG88Zj47Pj47bDxpPDE+Oz47bDx0PEAwPDs7Ozs7Ozs7Ozs+Ozs+Oz4+O3Q8QDA8cDxwPGw8UGFnZUNvdW50O18hSXRlbUNvdW50O18hRGF0YVNvdXJjZUl0ZW1Db3VudDtEYXRhS2V5czs+O2w8aTwxPjtpPDA+O2k8MD47bDw+Oz4+Oz47Ozs7Ozs7Ozs7Pjs7Pjt0PEAwPHA8cDxsPFBhZ2VDb3VudDtfIUl0ZW1Db3VudDtfIURhdGFTb3VyY2VJdGVtQ291bnQ7RGF0YUtleXM7PjtsPGk8MT47aTw0PjtpPDQ+O2w8Pjs+Pjs+Ozs7Ozs7Ozs7Oz47bDxpPDA+Oz47bDx0PDtsPGk8MT47aTwyPjtpPDM+O2k8ND47PjtsPHQ8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+O2k8NT47aTw2Pjs+O2w8dDxwPHA8bDxUZXh0Oz47bDzljZXniYfmnLrljp/nkIblj4rlupTnlKjor77nqIvorr7orqE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPOaWueebiuawkTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8MC41Oz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDwyMS0yMTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Jm5ic3BcOzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Jm5ic3BcOzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w85peg5pa55ZCROz4+Oz47Oz47Pj47dDw7bDxpPDA+O2k8MT47aTwyPjtpPDM+O2k8ND47aTw1PjtpPDY+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPOi/kOWKqOaOp+WItuezu+e7n+e7vOWQiOiuvuiuoTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w85r2Y5bqt6b6ZOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDwwLjU7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDIwLTIwOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDwmbmJzcFw7Oz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDwmbmJzcFw7Oz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzml6DmlrnlkJE7Pj47Pjs7Pjs+Pjt0PDtsPGk8MD47aTwxPjtpPDI+O2k8Mz47aTw0PjtpPDU+O2k8Nj47PjtsPHQ8cDxwPGw8VGV4dDs+O2w855S15rCU5o6n5Yi25Y+KUExD57u85ZCI6K6+6K6hOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzotbXlv6Dnm5Y7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDAuNTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8MDEtMTY7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPCZuYnNwXDs7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPCZuYnNwXDs7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPOaXoOaWueWQkTs+Pjs+Ozs+Oz4+O3Q8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+O2k8NT47aTw2Pjs+O2w8dDxwPHA8bDxUZXh0Oz47bDzov4fnqIvmjqfliLbns7vnu5/nu7zlkIjorr7orqE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPOmprOS5heelpTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8MC41Oz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDwxOS0xOTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Jm5ic3BcOzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Jm5ic3BcOzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Jm5ic3BcOzs+Pjs+Ozs+Oz4+Oz4+Oz4+O3Q8QDA8cDxwPGw8UGFnZUNvdW50O18hSXRlbUNvdW50O18hRGF0YVNvdXJjZUl0ZW1Db3VudDtEYXRhS2V5czs+O2w8aTwxPjtpPDA+O2k8MD47bDw+Oz4+Oz47Ozs7Ozs7Ozs7Pjs7Pjt0PEAwPHA8cDxsPFBhZ2VDb3VudDtfIUl0ZW1Db3VudDtfIURhdGFTb3VyY2VJdGVtQ291bnQ7RGF0YUtleXM7PjtsPGk8MT47aTw0PjtpPDQ+O2w8Pjs+Pjs+Ozs7Ozs7Ozs7Oz47bDxpPDA+Oz47bDx0PDtsPGk8MT47aTwyPjtpPDM+O2k8ND47PjtsPHQ8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPDIwMTYtMjAxNzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Mjs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w86L+Q5Yqo5o6n5Yi257O757uf57u85ZCI6K6+6K6hOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzmvZjluq3pvpk7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDAuNTs+Pjs+Ozs+Oz4+O3Q8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPDIwMTYtMjAxNzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Mjs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w86L+H56iL5o6n5Yi257O757uf57u85ZCI6K6+6K6hOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzpqazkuYXnpaU7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDAuNTs+Pjs+Ozs+Oz4+O3Q8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPDIwMTYtMjAxNzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Mjs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w85Y2V54mH5py65Y6f55CG5Y+K5bqU55So6K++56iL6K6+6K6hOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzmlrnnm4rmsJE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDAuNTs+Pjs+Ozs+Oz4+O3Q8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPDIwMTYtMjAxNzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Mjs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w855S15rCU5o6n5Yi25Y+KUExD57u85ZCI6K6+6K6hOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzotbXlv6Dnm5Y7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDAuNTs+Pjs+Ozs+Oz4+Oz4+Oz4+Oz4+Oz4+Oz5ZZFyEFVR8MH9GkWWTFr2SyUKuGg==",
"xnd":"2016-2017","xqd":"1"}
data2 = urllib.urlencode(postdata2)
url2 = "http://202.195.144.163/jndx/xskbcx.aspx?xh="+account+"&xm=%D6%DC%BF%C6%D3%F0&gnmkdm=N121603"
headers2 = {"Connection": "keep-alive","User-Agent":"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1;Miser Report)","Referer":"http://202.195.144.163/jndx/xs_main.aspx?xh=1070414532"}
request2 = urllib2.Request(url2,data2,headers2)
response2 = opener.open(request2)
page = response2.read().decode("gb2312","ignore").encode("utf-8")
pattern = "<td align="Center".*?>(.*?)</td>"
lessons = re.findall(pattern,page)
for item in lessons:
print item
整理后的代码
# -*- coding:utf-8 -*-
# author :Max
import re
import urllib
import urllib2
import cookielib
class Spider:
filename = "cookie.txt"
cookie = cookielib.MozillaCookieJar(filename)
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))
def __init__(self,url,account,password,xnd,xqd):
self.url = url
self.account = account
self.password = password
self.xnd = xnd
self.xqd = xqd
def loginWeb(self):
headers = {"Connection": "keep-alive","User-Agent":"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1;Miser Report)"}
data = urllib.urlencode({"__VIEWSTATE":"dDwtNTgxODgzNDk1O3Q8O2w8aTwxPjs+O2w8dDw7bDxpPDQ+Oz47bDx0PHA8O3A8bDxvbmNsaWNrOz47bDx3aW5kb3cuY2xvc2UoKVw7Oz4+Pjs7Pjs+Pjs+Pjs+0L9OGiPtTSMlqZUfLGSIwTyi9hc=",
"TextBox1":self.account,"TextBox2":self.password,"RadioButtonList1":"ѧ��","Button1":""})
request = urllib2.Request(self.url,data,headers)
try :
response = Spider.opener.open(request).read().decode("gb2312").encode("utf-8")
pattern = "<span id="xhxm">.*? (.*?)同学</span></em>"
name = re.findall(pattern, response)
return name[0]
except urllib2.HTTPError,e:
print "HTTPError = "+e.code
return "Error"
def timeTable(self):
headers = {"Connection": "keep-alive","User-Agent":"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1;Miser Report)","Referer":"http://202.195.144.163/jndx/xs_main.aspx?xh=1070414532"}
data = urllib.urlencode({"__EVENTTARGET":"xqd","__EVENTARGUMENT":"","__VIEWSTATE":"dDwtMTY3ODA2Njg2OTt0PDtsPGk8MT47PjtsPHQ8O2w8aTwxPjtpPDI+O2k8ND47aTw3PjtpPDk+O2k8MTE+O2k8MTM+O2k8MTU+O2k8MjE+O2k8MjM+O2k8MjU+O2k8Mjc+O2k8Mjk+O2k8MzE+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPDEyMDEyLTIwMTMwOz4+Oz47Oz47dDx0PHA8cDxsPERhdGFUZXh0RmllbGQ7RGF0YVZhbHVlRmllbGQ7PjtsPHhuO3huOz4+Oz47dDxpPDM+O0A8MjAxNi0yMDE3OzIwMTUtMjAxNjsyMDE0LTIwMTU7PjtAPDIwMTYtMjAxNzsyMDE1LTIwMTY7MjAxNC0yMDE1Oz4+O2w8aTwwPjs+Pjs7Pjt0PHQ8OztsPGk8MT47Pj47Oz47dDxwPHA8bDxUZXh0Oz47bDzlrablj7fvvJoxMDcwNDE0NTMyOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzlp5PlkI3vvJrlkajnp5Hnvr07Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPOWtpumZou+8mueJqeiBlOe9keW3peeoi+WtpumZojs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w85LiT5Lia77ya6Ieq5Yqo5YyWOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzooYzmlL/nj63vvJroh6rliqjljJYxNDA1Oz4+Oz47Oz47dDw7bDxpPDE+Oz47bDx0PEAwPDs7Ozs7Ozs7Ozs+Ozs+Oz4+O3Q8cDxsPFZpc2libGU7PjtsPG88Zj47Pj47bDxpPDE+Oz47bDx0PEAwPDs7Ozs7Ozs7Ozs+Ozs+Oz4+O3Q8QDA8cDxwPGw8UGFnZUNvdW50O18hSXRlbUNvdW50O18hRGF0YVNvdXJjZUl0ZW1Db3VudDtEYXRhS2V5czs+O2w8aTwxPjtpPDA+O2k8MD47bDw+Oz4+Oz47Ozs7Ozs7Ozs7Pjs7Pjt0PEAwPHA8cDxsPFBhZ2VDb3VudDtfIUl0ZW1Db3VudDtfIURhdGFTb3VyY2VJdGVtQ291bnQ7RGF0YUtleXM7PjtsPGk8MT47aTw0PjtpPDQ+O2w8Pjs+Pjs+Ozs7Ozs7Ozs7Oz47bDxpPDA+Oz47bDx0PDtsPGk8MT47aTwyPjtpPDM+O2k8ND47PjtsPHQ8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+O2k8NT47aTw2Pjs+O2w8dDxwPHA8bDxUZXh0Oz47bDzljZXniYfmnLrljp/nkIblj4rlupTnlKjor77nqIvorr7orqE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPOaWueebiuawkTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8MC41Oz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDwyMS0yMTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Jm5ic3BcOzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Jm5ic3BcOzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w85peg5pa55ZCROz4+Oz47Oz47Pj47dDw7bDxpPDA+O2k8MT47aTwyPjtpPDM+O2k8ND47aTw1PjtpPDY+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPOi/kOWKqOaOp+WItuezu+e7n+e7vOWQiOiuvuiuoTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w85r2Y5bqt6b6ZOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDwwLjU7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDIwLTIwOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDwmbmJzcFw7Oz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDwmbmJzcFw7Oz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzml6DmlrnlkJE7Pj47Pjs7Pjs+Pjt0PDtsPGk8MD47aTwxPjtpPDI+O2k8Mz47aTw0PjtpPDU+O2k8Nj47PjtsPHQ8cDxwPGw8VGV4dDs+O2w855S15rCU5o6n5Yi25Y+KUExD57u85ZCI6K6+6K6hOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzotbXlv6Dnm5Y7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDAuNTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8MDEtMTY7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPCZuYnNwXDs7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPCZuYnNwXDs7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPOaXoOaWueWQkTs+Pjs+Ozs+Oz4+O3Q8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+O2k8NT47aTw2Pjs+O2w8dDxwPHA8bDxUZXh0Oz47bDzov4fnqIvmjqfliLbns7vnu5/nu7zlkIjorr7orqE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPOmprOS5heelpTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8MC41Oz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDwxOS0xOTs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Jm5ic3BcOzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Jm5ic3BcOzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Jm5ic3BcOzs+Pjs+Ozs+Oz4+Oz4+Oz4+O3Q8QDA8cDxwPGw8UGFnZUNvdW50O18hSXRlbUNvdW50O18hRGF0YVNvdXJjZUl0ZW1Db3VudDtEYXRhS2V5czs+O2w8aTwxPjtpPDA+O2k8MD47bDw+Oz4+Oz47Ozs7Ozs7Ozs7Pjs7Pjt0PEAwPHA8cDxsPFBhZ2VDb3VudDtfIUl0ZW1Db3VudDtfIURhdGFTb3VyY2VJdGVtQ291bnQ7RGF0YUtleXM7PjtsPGk8MT47aTw0PjtpPDQ+O2w8Pjs+Pjs+Ozs7Ozs7Ozs7Oz47bDxpPDA+Oz47bDx0PDtsPGk8MT47aTwyPjtpPDM+O2k8ND47PjtsPHQ8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPDIwMTYtMjAxNzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Mjs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w86L+Q5Yqo5o6n5Yi257O757uf57u85ZCI6K6+6K6hOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzmvZjluq3pvpk7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDAuNTs+Pjs+Ozs+Oz4+O3Q8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPDIwMTYtMjAxNzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Mjs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w86L+H56iL5o6n5Yi257O757uf57u85ZCI6K6+6K6hOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzpqazkuYXnpaU7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDAuNTs+Pjs+Ozs+Oz4+O3Q8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPDIwMTYtMjAxNzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Mjs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w85Y2V54mH5py65Y6f55CG5Y+K5bqU55So6K++56iL6K6+6K6hOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzmlrnnm4rmsJE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDAuNTs+Pjs+Ozs+Oz4+O3Q8O2w8aTwwPjtpPDE+O2k8Mj47aTwzPjtpPDQ+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPDIwMTYtMjAxNzs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w8Mjs+Pjs+Ozs+O3Q8cDxwPGw8VGV4dDs+O2w855S15rCU5o6n5Yi25Y+KUExD57u85ZCI6K6+6K6hOz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDzotbXlv6Dnm5Y7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDAuNTs+Pjs+Ozs+Oz4+Oz4+Oz4+Oz4+Oz4+Oz5ZZFyEFVR8MH9GkWWTFr2SyUKuGg==",
"xnd":self.xnd,"xqd":self.xqd})
url = "http://202.195.144.163/jndx/xskbcx.aspx?xh="+self.account+"&xm=%D6%DC%BF%C6%D3%F0&gnmkdm=N121603"
request = urllib2.Request(url,data,headers)
response = Spider.opener.open(request).read().decode("gb2312","ignore").encode("utf-8")
pattern = "<td align="Center".*?>(.*?)</td>"
lessons = re.findall(pattern, response)
for item in lessons:
print item
if __name__ == "__main__":
print "try spider"
url = "http://202.195.144.163/jndx/default5.aspx"
account = ""#学号
password = ""#密码
xnd = "2016-2017"#学年
xqd = "1"#学期
spider = Spider(url,account,password,xnd,xqd)
name = spider.loginWeb()
if name == "Error":
print "Web Login Failed"
else:
spider.timeTable()
print "run over"
声明:该文观点仅代表作者本人,牛骨文系教育信息发布平台,牛骨文仅提供信息存储空间服务。
- 上一篇: 今日头条app数据包分析
- 下一篇: Python读取文件小结(csv,txt)