Python爬蟲

記錄&日常發表於2020-10-11

python爬取鏈家上面的北京租房資訊

1.導包:

import requests
from bs4 import BeautifulSoup

2.獲取url頁面下的內容,返回soup物件:

def get_page(url):
    responce = requests.get(url)
    soup = BeautifulSoup(responce.text,'lxml')
    return soup

3.封裝成函式,作用是獲取列表頁下面的所有租房頁面的連結,返回一個連結列表:

def get_links(link_url):
    soup = get_page(link_url)
    content_list = soup.find('div', class_="content__list")
    links_div = content_list.find_all('div', class_="content__list--item")
    links = []
    for div in links_div:
        tmp = div.find('a')
        if tmp != -1:
            pass
        links.append(div.a.get('href'))
    return links

4.使用get_links(llink_url)函式獲取鏈家首頁的所有租房頁面的連結

url = 'https://bj.lianjia.com/zufang/'
get_links(url)

5.獲取租房頁面的房屋資訊

house_url='https://bj.lianjia.com/zufang/BJ2360825321093611520.html?nav=0&unique_id=ee73be87-3abd-477e-af89-a3f320eda277zufang1602224217718'
soup=get_page(house_url)
price=int(soup.find('div',class_='content__aside--title').span.text)
good_house=soup.find('p',class_='content__aside--tags').text.replace('\n', '')
content__aside__list=soup.find('ul',class_='content__aside__list')
content__aside__list = content__aside__list.find_all('li')
house_way=content__aside__list[0].text[5:]
house_type=content__aside__list[1].text[5:]
house_floor=content__aside__list[2].text[5:][:-2]
riskwarning=content__aside__list[3].text[5:]
info = {
        '房屋價格':price,
        '必看好房':good_house,
        '租賃方式':house_way,
        '房屋型別':house_type,
        '朝向樓層':house_floor,
        '風險提示':riskwarning
}
info

相關文章