爬蟲內涵段子

wanghandou發表於2017-10-20

import urllib2
import urllib
import re

class spilder:
	def __init__(self):
		self.page=1#初始頁是1
		self.switch=True#如果是True就開始爬

	def loadpage(self):
		"""下載頁面"""
		print u"正在下載頁面...."
		url="http://www.isocialkey.com/article/list_5_"+str(self.page)+".html"
		print url
		headers={"User-Agent":" Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"}
		request=urllib2.Request(url,headers=headers)
		response=urllib2.urlopen(request)
		html=response.read()

		#將正則匹配物件應用到html原始碼字串中，返回這個頁面的所有段子的列表
		pattern=re.compile('<div\sclass="f18 mb20">(.*?)</div>',re.S)
		content_list=pattern.findall(html)

		#呼叫dealpage()替換掉段子裡的雜七雜八
		self.dealpage(content_list)

	def dealpage(self,content_list):
		"""處理爬來的頁面中的段子
		content_list：每頁的段子列表集合"""
		print u"正在處理頁面......"

		for item in content_list:
		#將集合中的每個段子進行處理，替換掉多餘的符號
			item=item.replace("<p>","").replace("</p>","").replace("<br>","").replace("<br />","")
			self.writepage(item)

	def writepage(self,item):
		"""把每一條段子寫入檔案裡
		item：處理後的每一條段子"""
		print u"正在儲存段子"
#寫入檔案內
		with open("duanzi.txt","a") as f:#a是可讀可寫，要是用w的話，duanzi.txt中的內容每次都會被下一個覆蓋
			f.write(item)

	def duanzispilder(self):
		"""控制爬蟲的進行"""
		while(self.switch):
			self.loadpage()
			command=raw_input(("如果繼續爬取，請輸入回車(退出輸入quit)").encode("gb18030"))
			if command=="quit":
				self.switch=False#變成False就可以結束爬取
			self.page+=1#page自加
		print u"謝謝使用...."

if __name__=="__main__":
	we=spilder()
	we.duanzispilder()

python爬蟲-抓取內涵吧內涵段子
2017-12-21
Python爬蟲
Python爬取內涵段子裡的段子
2021-09-09
Python
Python網路爬蟲（正則, 內涵段子，貓眼電影, 鏈家爬取）
2018-10-30
Python爬蟲
網路爬蟲——爬取糗事百科笑料段子
2015-12-23
爬蟲
ScienceDirect內容爬蟲
2021-07-21
爬蟲
PHP 爬蟲爬取社群文章內容
2017-09-30
PHP爬蟲
scrapy定製爬蟲-爬取javascript內容
2014-03-11
爬蟲JavaScript
【Python學習】爬蟲爬蟲爬蟲爬蟲~
2018-05-03
Python爬蟲
爬蟲：多程式爬蟲
2021-05-19
爬蟲
通用爬蟲與聚焦爬蟲
2023-04-18
爬蟲
爬蟲--Scrapy簡易爬蟲
2020-10-07
爬蟲
Python爬取糗事百科段子
2018-08-31
Python
python爬蟲---網頁爬蟲，圖片爬蟲，文章爬蟲，Python爬蟲爬取新聞網站新聞
2019-01-04
Python爬蟲網頁網站
單個Acticity顯示多個列表，仿內涵段子詳情頁的熱門評論、全部評論
2018-12-12
反爬蟲之字型反爬蟲
2019-06-27
爬蟲
爬蟲進階：反反爬蟲技巧
2018-06-28
爬蟲
JB的Python之旅-爬蟲篇-新浪微博內容爬取
2018-06-30
Python爬蟲
爬蟲
2024-11-16
爬蟲
[爬蟲手記] 我是如何在3分鐘內開發完一個爬蟲的
2019-05-27
爬蟲
【爬蟲】爬蟲專案推薦 / 思路
2020-04-21
爬蟲
網路爬蟲——爬蟲實戰（一）
2022-01-29
爬蟲
【python爬蟲】python爬蟲demo
2018-02-21
Python爬蟲
爬蟲那些事－爬蟲設計思路
2017-08-02
爬蟲
簡單的爬蟲：爬取網站內容正文與圖片
2021-09-09
爬蟲網站
python 爬蟲如何爬取動態生成的網頁內容
2024-10-31
Python爬蟲網頁
Python爬蟲爬取B站up主所有動態內容
2024-05-08
Python爬蟲
python爬蟲：使用BeautifulSoup修改網頁內容
2020-04-05
Python爬蟲網頁
Python爬蟲，抓取淘寶商品評論內容!
2018-06-24
Python爬蟲
爬蟲專案（一）爬蟲+jsoup輕鬆爬知乎
2017-02-07
爬蟲JS
爬蟲與反爬蟲技術簡介
2022-09-20
爬蟲
爬蟲技術(二)－客戶端爬蟲
2017-03-14
爬蟲客戶端
2個月精通Python爬蟲——3大爬蟲框架+6場實戰+反爬蟲技巧+分散式爬蟲
2018-06-28
Python爬蟲框架分散式
request爬蟲
2019-02-16
爬蟲
爬蟲2
2018-01-31
爬蟲
科普：爬蟲
2018-06-29
爬蟲
scrapy爬蟲
2012-05-09
爬蟲
爬蟲概述
2024-05-02
爬蟲
app爬蟲
2024-05-04
APP爬蟲

爬蟲 內涵段子

相關文章

爬蟲內涵段子