用python爬取鏈家的租房資訊

ALBDXV發表於2020-10-29

原文網址 : https://blog.csdn.net/ALBDXV/article/details/109375377

用python爬取鏈家的租房資訊（記錄自己的第一個python程式碼），
裡面涉及到的主要的點有：使用代理ip訪問；讀取網頁；翻頁等。歡迎交流

程式碼如下：

import requests
import urllib.request#urllib.request功能的瞭解
from bs4 import BeautifulSoup#BeautifulSoup功能瞭解
import bs4
import random
import re

##通過函式獲取網頁資訊
def gethtml(url):
    #用代理IP訪問
    proxy_support = urllib.request.ProxyHandler({'http':'119.6.144.73:81'})
    opener = urllib.request.build_opener(proxy_support)
    opener.addheaders = [('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.18363')]
    urllib.request.install_opener(opener)
    #讀取網頁資訊
    #zf = urllib.request.urlopen('https://sh.lianjia.com/zufang/anting/rt200600000001l0/')
    zf = urllib.request.urlopen(url)
    #'https://ks.lianjia.com/zufang/kunshan/rt200600000001l0/'
    html = zf.read()
    ht = html.decode('utf8')
    zf.close
    Soup = BeautifulSoup(ht,'lxml')
    return Soup

##定義一些迴圈中會用到的變數
info = []
page = 1
TotalNumber = 0
urlMain = 'https://ks.lianjia.com/zufang/kunshan/'
urlOption = 'rt200600000001l0/'
#一共有多少條結果，防止找到限制條件以外的推薦結果
Number = int(gethtml(urlMain+urlOption).find(class_ = 'content__article').find(class_ = 'content__title').find('span').text)
print('已找到{}套租房'.format(Number))

##用while迴圈去讀取每一頁的租房資訊
while TotalNumber <= Number:
    print('正在讀取第%d頁'%page)
    if page == 1:
        url = urlMain + urlOption
    else:
        url = urlMain +'pg{}'.format(page) + urlOption
    Soup = gethtml(url)
    ###找到地址，價格，網址在網頁中的位置，然後用find篩選出來
    Soup = Soup.find_all(class_ = 'content__list--item')
    numberOfThisPage = len(Soup)
    print('該頁有%d條租房資訊'%numberOfThisPage)
    print('')
    counter = 0
    for Soup in Soup:
        #print(Soup)
        counter+=1
        #print(counter)
        Address = Soup.find(class_ = 'content__list--item--des').find_all('a')
        if Address == []:
            continue
        else:
            #print(Address)
            Address_DistrictName = Address[2].text
            Address_Location = Address[0].text+'，'+Address[1].text
            Price = Soup.find('em').text
            Website = Soup.find(class_ = 'content__list--item--title').find('a')['href']
            Website = 'https://sh.lianjia.com'+Website
        info.append([Address_DistrictName,Address_Location,Price,Website])
        ###寫入表格中
        fo=open("鏈家崑山租房資訊——全部.csv","w")
        for row in info:
            fo.write(",".join(row)+"\n")
        fo.close()
        if counter == numberOfThisPage:
            break
    TotalNumber += counter
    page += 1

爬取結果如下（一共八百多條資訊）：
--|--

python爬蟲--爬取鏈家租房資訊
2020-05-16
Python爬蟲
python爬取北京租房資訊
2018-05-18
Python
初識Scrapy框架+爬蟲實戰(7)-爬取鏈家網100頁租房資訊
2018-06-12
框架爬蟲
Python爬取鏈家成都二手房源資訊 asyncio + aiohttp 非同步爬蟲實戰
2020-09-22
PythonAIHTTP非同步爬蟲
爬蟲實戰——58同城租房資料爬取
2019-12-04
爬蟲
利用requestes\pyquery\BeautifulSoup爬取某租房公寓(深圳市)4755條租房資訊及總結
2020-11-22
python爬蟲——爬取大學排名資訊
2019-08-02
Python爬蟲
一小時入門Python爬蟲，連我都會了！Python爬取租房資料例項
2019-08-02
Python爬蟲
Python爬蟲實戰：爬取淘寶的商品資訊
2021-09-11
Python爬蟲
小白學 Python 爬蟲（25）：爬取股票資訊
2019-12-24
Python爬蟲
python itchat 爬取微信好友資訊
2018-06-02
Python
Python爬蟲爬取淘寶，京東商品資訊
2020-02-11
Python爬蟲
Python網路爬蟲（正則, 內涵段子，貓眼電影, 鏈家爬取）
2018-10-30
Python爬蟲
用urllib爬取鏈家北京地區所有小區的戶型圖
2018-08-13
Python爬取網頁的所有內外鏈
2021-04-09
Python網頁
python-python爬取豆果網（菜譜資訊）
2019-01-22
Python
python爬蟲58同城（多個資訊一次爬取）
2018-11-04
Python爬蟲
Python爬蟲訓練：爬取酷燃網視訊資料
2020-10-23
Python爬蟲
python 爬取騰訊視訊的全部評論
2021-02-17
Python
python實現微博個人主頁的資訊爬取
2021-01-03
Python
Python3爬取貓眼電影資訊
2020-11-06
Python
[Python3]selenium爬取淘寶商品資訊
2021-09-09
Python
如何爬取前程無憂python職位資訊
2021-09-11
Python
Python爬取所有人位置資訊——騰訊位置大資料！
2020-11-13
Python大資料
scrapy爬取鏈家二手房存到mongo資料庫
2021-01-03
Go資料庫
用python爬取知識星球
2019-02-16
Python
Python爬蟲之小說資訊爬取與資料視覺化分析
2021-01-09
Python爬蟲視覺化
用Nodejs Cheerio爬取NPM包詳細資訊
2019-05-06
NodeJSNPM
Python一鍵爬取你所關心的書籍資訊
2019-03-05
Python
Python爬取天氣資訊並語音播報
2020-11-17
Python
Python筆記：網頁資訊爬取簡介（一）
2020-11-11
Python筆記網頁
Java爬蟲-爬取疫苗批次資訊
2024-06-03
Java爬蟲
用 appium 爬取招聘資訊這塊的內容怎麼用 bs4 爬取？求大佬指導
2020-07-12
APP
python爬蟲小專案--飛常準航班資訊爬取variflight（上）
2019-03-23
Python爬蟲
爬蟲Selenium+PhantomJS爬取動態網站圖片資訊（Python）
2018-03-24
爬蟲JS網站Python
Python爬取股票資訊，並實現視覺化資料
2020-09-25
Python視覺化
《吊打分析師》實戰—深圳鏈家租房資料分析 | 附原始碼
2020-04-05
原始碼
[python應用案例] 一.BeautifulSoup爬取天氣資訊併傳送至QQ郵箱
2018-05-03
Python

用python爬取鏈家的租房資訊

相關文章