python爬蟲學習(2)-抓取百度貼吧內容

# -*- coding: utf-8
import urllib2
import urllib
import re,os
import time
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

class Tiebar:
#初始化資料
def __init__(self,base_url,see_lz):
self.base_url = base_url
self.see_lz = '?see_lz=' + see_lz
self.page = 1
self.user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
self.headers = { 'User-Agent' : self.user_agent }
self.tool = Tool()
self.out_put_file = 'd:/python/test/out.txt'
#獲取頁面內容的方法
def get_cotent(self,page):
try:
url = self.base_url + self.see_lz + '&pn=' + str(page)
request = urllib2.Request(url,headers=self.headers)
response = urllib2.urlopen(request)
act_url = response.geturl()
print 'init url=',url,'act url=',act_url
if url == act_url:
content = response.read()
return content
else:
return None
except urllib2.URLError, e:
if hasattr(e,"reason"):
print u"連線貼吧頁面失敗,錯誤原因",e.reason
return None
#獲取帖子主題
def get_titile(self):
content = self.get_cotent(1)
pattern = re.compile('<h3 .*?>(.*?)</h3>')
result = re.search(pattern,content)
if result:
return result.group(1).strip()
else:
return None
#獲取帖子的頁數
def get_page_num(self):
content = self.get_cotent(1)
pattern = re.compile('<li class="l_reply_num.*?.*?<span.*?>(.*?)',re.S)
result = re.search(pattern,content)
if result:
return result.group(1).strip()
else:
return None
#獲取帖子內容
def get_tiebar(self,page):
content = self.get_cotent(page).decode('utf-8')
str = ''
if not content:
print "抓取完畢"
return None
patt_content = re.compile('<a data-field=.*?class="p_author_name j_user_card".*?>(.*?)</a>.*?'
+ '<div id=".*?" class="d_post_content j_d_post_content "> '
+ '(.*?)

',re.S)

Python爬蟲，抓取淘寶商品評論內容!
2018-06-24
Python爬蟲
學習python做爬蟲主要學習哪些內容呢?
2020-07-20
Python爬蟲
【Python學習】爬蟲爬蟲爬蟲爬蟲~
2018-05-03
Python爬蟲
python爬蟲—學習筆記-2
2024-04-10
Python爬蟲筆記
ScienceDirect內容爬蟲
2021-07-21
爬蟲
爬取百度貼吧實戰，python教你如何獲取
2020-12-07
Python
Python爬蟲抓取股票資訊
2021-01-03
Python爬蟲
python爬蟲：使用BeautifulSoup修改網頁內容
2020-04-05
Python爬蟲網頁
python爬蟲學習1
2020-11-29
Python爬蟲
python爬蟲是什麼?學習python爬蟲難嗎
2021-03-31
Python爬蟲
用Python爬蟲抓取代理IP
2019-04-17
Python爬蟲
爬蟲，可用於增加訪問量和抓取網站全頁內容
2018-09-08
爬蟲網站
Python 爬蟲網頁內容提取工具xpath(二)
2018-12-08
Python爬蟲網頁
Python 爬蟲網頁內容提取工具xpath(一)
2018-12-06
Python爬蟲網頁
JB的Python之旅-爬蟲篇-新浪微博內容爬取
2018-06-30
Python爬蟲
Python爬蟲爬取B站up主所有動態內容
2024-05-08
Python爬蟲
python 爬蟲如何爬取動態生成的網頁內容
2024-10-31
Python爬蟲網頁
python爬蟲2
2019-01-07
Python爬蟲
Python爬蟲--2
2024-03-24
Python爬蟲
Python爬蟲抓取技術的門道
2019-09-21
Python爬蟲
為什麼學習python及爬蟲，Python爬蟲[入門篇]？
2018-11-21
Python爬蟲
python爬蟲抓取哈爾濱天氣資訊（靜態爬蟲）
2020-04-05
Python爬蟲
什麼是爬蟲?學習Python爬蟲難不難?
2019-11-05
爬蟲Python
python爬蟲之快速對js內容進行破解
2019-07-08
Python爬蟲JS
python爬蟲—學習筆記-4
2024-04-23
Python爬蟲筆記
python爬蟲js逆向學習（二）
2020-07-03
Python爬蟲JS
Python爬蟲學習筆記(三)
2021-01-30
Python爬蟲筆記
python爬蟲學習筆記（二）
2020-11-24
Python爬蟲筆記
python爬蟲之抓取小說(逆天邪神)
2022-03-10
Python爬蟲
學習C語言還是學習Python爬蟲?
2020-11-23
C語言Python爬蟲
Python爬蟲系統化學習(3)
2021-02-25
Python爬蟲
Python爬蟲系統化學習(4)
2021-03-01
Python爬蟲
Python爬蟲教程-05-python爬蟲實現百度翻譯
2018-09-06
Python爬蟲
Python爬蟲學習線路圖丨Python爬蟲需要掌握哪些知識點
2018-12-10
Python爬蟲
python爬蟲學習01--電子書爬取
2020-07-13
Python爬蟲
一入爬蟲深似海，總結python爬蟲學習筆記！
2019-02-14
爬蟲Python筆記
爬蟲百戰穿山甲（2）：百度翻譯爬蟲
2021-04-15
爬蟲
Python爬蟲之Scrapy學習（基礎篇）
2019-03-04
Python爬蟲
Python爬蟲新手教程：手機APP資料抓取 pyspider
2019-07-20
Python爬蟲APPIDE

python爬蟲學習(2)-抓取百度貼吧內容

相關文章