【Python】下載百度空間文章的python原始碼

程式設計師啟航發表於2019-07-03

原文網址 : http://blog.itpub.net/69913713/viewspace-2649420/

純python新手寫的關於下載百度空間文章python原始碼，程式碼寫的不好，能用不能看。大家看看效果就行，不要求程式碼的精簡程度。大牛請飄過。

下載百度空間文章python原始碼使用方法：
在cmd中輸入：> python "F:\Walkbox\Python\mywork\baidu\getArticleId - r1.py" bspeng922 6
命令格式：python 檔案存放路徑 [使用者名稱] [下載頁數]
下載頁數可以不填，不填則為全部下載。如果大於實際總頁數，則會重複下載第一頁的內容

這段程式碼只能是新版的百度空間，只測試了”低調優雅“模板，生成的是html檔案；
同時我突然發現一個奇特的功能，這段程式碼竟然可以用來刷百度空間的訪問量，不錯哦。

下載百度空間文章python原始碼，如下：

`# -*- coding: utf8 -*-`
`import` `urllib`
`import` `re,os,sys,time`
`def` `articleDownload(username,pageCount):`
`#判斷傳入的引數是否合法`
`if` `username` `=``=` `"``" : username = "``bspeng922"`
`if` `pageCount` `=``=` `""` `or` `int``(pageCount)<``0` `:`
`pageCount` `=` `0`
`else``:`
`pageCount` `=` `int``(pageCount)` `+` `1`
`print` `"Blog: [http://hi.baidu.com/new/%s](http://hi.baidu.com/new/%s)"``%``username`
`#檔案儲存目錄，可修改`
`saveDrive` `=` `"E:\\test"`  `#directory to save html files`
`#html檔案儲存目錄`
`if` `not` `os.path.exists(saveDrive) :`
`os.mkdir(saveDrive)`
`mydrive` `=` `os.path.join(saveDrive,username)`
`if` `not` `os.path.exists(mydrive) :`
`os.mkdir(mydrive)`
`#圖片儲存目錄`
`imgDir` `=` `"img"`
`imgPath` `=` `os.path.join(saveDrive,username,imgDir)`
`if` `not` `os.path.exists(imgPath):`
`os.mkdir(imgPath)`
`#判斷傳入的頁數是否為0，為0則全部下載`
`if` `pageCount` `=``=` `0` `:`
`fstbaidu` `=` `urllib.urlopen(``"[http://hi.baidu.com/new/%s](http://hi.baidu.com/new/%s)"``%``username)   `
`totalRecord,pagesize``=``0``,``0`
`for` `fstline` `in` `fstbaidu:       `
`if` `fstline.find(``"allCount"``)>``0``:` `#only one tag`
`totalRecord` `=` `int``(fstline[fstline.index(``"'")+1:fstline.rindex("'"``)])`
`if` `fstline.find(``"pageSize"``)>``0``:`
`pagesize` `=` `int``(fstline[fstline.index(``"'")+1:fstline.rindex("'"``)])`
`if` `pagesize !``=` `0` `and` `totalRecord !``=` `0``:`
`pageCount` `=` `totalRecord``/``pagesize`
`if` `totalRecord` `/` `float``(pagesize) > totalRecord``/``pagesize:`
`pageCount` `=` `pageCount` `+` `2`
`fstbaidu.close()`
`print` `"Page Count: "``,pageCount` `-` `1`
`#根據文章ID獲得文章實際連結`
`articleCount` `=` `0`   
`sumHtmlPath` `=` `os.path.join(saveDrive,``"%s.html"``%``username)`
`sumfile` `=` `open``(sumHtmlPath,``"w"``)` `#the sum file`
`aTagCmp` `=` `re.``compile``(``"""<a href="/%s/item/([\w]*?)" class="a-incontent a-title cs-contentblock-hoverlink" target=_blank>(.*?)</a>"""``%``username)`
`for` `page` `in` `range``(``1``,pageCount):`
`thisPageUrl` `=` `urllib.urlopen(``"[http://hi.baidu.com/new/%s?page=%d](http://hi.baidu.com/new/%s?page=%d)"``%``(username,page))`
`print` `"Page: "``,page`
`for` `line` `in` `thisPageUrl:`
`if` `line.find(``"a-incontent a-title"``)>``0` `:`
`articleCount` `+``=` `1`    `#部落格文章數目`
`linefind` `=` `aTagCmp.findall(line)`
`#print linefind`
`for` `line` `in` `linefind :`
`#文章的ID和名稱`
`myurl` `=` `line[``0``]`
`mytitle` `=` `line[``1``]`
`sumfile.write(``"""<a href='%s\\%s.html' target='blank'>%s</a><br>"""``%``(username,myurl,mytitle))`
`#獲得真實的文章，並儲存`
`thispath` `=` `os.path.join(mydrive,``"%s.html"``%``myurl)`
`thisfile` `=` `open``(thispath,``'w'``)`
`thisArticle` `=` `urllib.urlopen(``"[http://hi.baidu.com/%s/item/%s](http://hi.baidu.com/%s/item/%s)"``%``(username,myurl))`
`for` `thisline` `in` `thisArticle:`
`imgCount` `=` `0`
`badImg` `=` `0`
`if` `thisline.find(``"content-head clearfix"``)>``0``:` `#只取正文`
`#匹配圖片標籤`
`imgTagCmp` `=` `re.``compile``(``"""<img.*?src="(.*?)".*?>"""``)`
`imglist` `=` `imgTagCmp.findall(thisline)`
`for` `imglink` `in` `imglist :`
`imageNewPath` `=` `""`
`#print imglink`
`if` `imglink.find(``"""://"""``)>``0``:`
`imageName` `=` `imglink[imglink.rindex(``"/"``)``+``1``:]`
`#下載圖片`
`try``:`
`urllib.urlretrieve(imglink,os.path.join(imgPath,imageName))`
`imgCount` `+``=` `1`
`except` `:` `#不能下載則報錯`
`print` `"cannot download this image: "``+``imageName`
`#替換圖片連結`
`imageNewPath` `=` `"""<img src="%s/%s" />"""``%``(imgDir,imageName)`
`thisImgCmp` `=` `re.``compile``(``"""<img width="\d{1,4}" height="\d{1,4}" src="[http://.](http://./)*?/%s" />|<img src="[http://.](http://./)*?/%s" small="0" />|<img src="[http://.](http://./)*?/%s" />|<img small="0" src="[http://.](http://./)*?/%s" />"""``%``(imageName,imageName,imageName,imageName))`
`#print imageNewPath`
`try``:`
`#print thisImgCmp.findall(thisline)`
`thisline` `=` `thisImgCmp.sub(imageNewPath,thisline)` `#每次都對當前圖片標籤進行替換`
`#print thisline`
`except``:`
`print` `"UnExpect error"`
`else``:``#www.iplaypy.com`
`badImg` `+``=` `1`
`#刪除多餘的內容`
`pos` `=` `thisline.find(``"mod-post-info clearfix"``)`
`if` `pos>``0` `:`
`thisline` `=` `thisline[``0``:pos``-``12``]`
`thisfile.write(thisline.strip())               `
`thisfile.close()`
`thisArticle.close()`
`#print "Image Count: %d  Bad Image: %d"%(imgCount, badImg)`
`thisPageUrl.close()`
`sumfile.close()`
`print` `"Article Count: "``,articleCount`
`if` `__name__` `=``=` `"__main__"``:`
`st` `=` `time.time()`
`#獲得命令列引數`
`if` `len``(sys.argv)` `=``=` `2``:`
`uname` `=` `sys.argv[``1``]`
`pages` `=` `0`
`elif` `len``(sys.argv)>``2``:`
`uname` `=` `sys.argv[``1``]`
`pages` `=` `int``(sys.argv[``2``])``+``1`
`else``:`
`uname` `=` `raw_input``(``"Username -> "``)`
`pages` `=` `raw_input``(``"Page -> "``)`
`articleDownload(uname,pages)`
`et` `=` `time.time()`
`print` `"Time used: %0.2fs"``%``(et``-``st)`

大家在學python的時候肯定會遇到很多難題，以及對於新技術的追求，這裡推薦一下我們的Python資源分享秋秋裙：855408893 內有安裝包，學習視訊資料，免費直播實戰案例。這裡是Python學習者的聚集地，零基礎，進階，都歡迎每日分享一些學習的方法和需要注意的小細節

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/69913713/viewspace-2649420/，如需轉載，請註明出處，否則將追究法律責任。

基於Python的QQ空間相簿中的所有照片下載器
2019-05-27
Python
如何利用python原始碼下載進行下單？
2021-09-11
Python原始碼
repo下載Android原始碼時報 python windowserror 5
2019-02-16
Android原始碼PythonWindowsError
python名稱空間
2018-12-28
Python
python原始碼包怎麼下載？方便安裝嗎？
2021-09-11
Python原始碼
《Python資料分析與挖掘實戰》原始碼下載
2022-11-28
Python原始碼
【Python】Python 使用http時間同步設定系統時間原始碼
2019-06-27
PythonHTTP原始碼
Python名稱空間包
2024-10-18
Python
python 爬蟲下載百度美女圖片
2024-04-18
Python爬蟲
Python作用域和名稱空間
2019-03-09
Python
Python - 使用__slots__節省空間
2024-03-30
Python
HSV顏色空間下的圖片相似性計算(python版)
2018-03-26
Python
QQ空間相簿批量下載【原圖】
2018-08-27
Python中名稱空間包簡介
2024-05-10
Python
如何用 Python 指令碼批量下載 Google 影象？
2018-07-13
Python指令碼Go
怎麼下載python的庫
2021-09-11
Python
Python實戰專案：打乒乓（原始碼分享）（文章較短，直接上程式碼）
2022-08-31
Python原始碼
最新採集下載QQ空間相簿照片的方法
2020-05-08
Python 下載圖片
2024-03-12
Python
python下載包（pycharm）
2024-03-26
PythonPyCharm
python 各版本下載
2024-08-19
Python
python下載模組
2021-09-11
Python
AOSP 原始碼下載
2018-12-26
原始碼
Python 儲存字串時是如何節省空間的？
2019-04-21
Python字串
Python程式碼混淆工具，Python原始碼保密、加密、混淆
2024-02-05
Python原始碼加密
TP開發的原始碼下載站系統，素材下載站原始碼系統，線上下載原始碼系統
2019-05-11
原始碼
Python3 名稱空間和作用域
2020-01-01
Python
『OpenCV-Python』色彩空間及色彩轉換
2024-11-21
OpenCVPython
深入講解Python名稱空間規則！
2022-12-23
Python
【Python】Python抓取分享頁面的原始碼示例
2019-06-27
Python原始碼
python tkinter 超級瑪麗原始碼下載顏色碰撞檢測試程式.py
2020-10-30
Python原始碼
70 行 python 程式碼實現桌布批量下載
2019-01-30
Python
清除python下載的錯誤包
2024-11-23
Python
Mac下載Android原始碼的方法
2019-03-25
MacAndroid原始碼
Python中名稱空間是什麼?名稱空間生命週期是多久？
2022-01-05
Python
死磕python位元組碼-手工還原python原始碼
2018-09-21
Python原始碼
python 寫的搜尋引擎 - 原始碼
2019-08-20
Python原始碼
python 國內下載地址
2024-03-30
Python

【Python】下載百度空間文章的python原始碼

相關文章