python爬蟲獲取搜狐汽車的配置資訊和swf動態圖表的銷量資料-------詳細教學

凍雨冷霧發表於2019-08-05

原文網址 : https://www.cnblogs.com/techs-wenzhe/p/11301635.html

Python爬蟲

前情提要：需要爬取搜狐汽車的所有配置資訊，具體配置對應的引數. 以及在動態圖表上的歷史銷量。

比如: 一汽奧迪旗下Q5L 的《40 TFSI 榮享進取型國VI 》的歷史銷量和該配置的引數資訊。

因此整體分兩個大塊，一個是配置引數，一個是歷史銷量。

下面開始正文

第一步:

首先觀察網頁：http://db.auto.sohu.com/home/

在搜狐汽車主頁檢視原始碼，找到對應的配置資訊連結所在DIV：

（可以任選一個汽車的配置資訊頁比如奧迪A4 發現連結為 http://db.auto.sohu.com/yiqiaudi/2374

而其配置資訊的頁面為 http://db.auto.sohu.com/yiqiaudi/2374/trim.html .

故需要找的是主頁內每一款車的具體連結所在的div模組，有了這個每次加上trim.html 進行迴圈爬取即可。）

對應DIV模組為：

因此利用xpath定位到具體的 a標籤下 class="model-a"即可鎖定所有車型的連結，程式碼如下:

import requests
import pandas as pd
import re
from lxml import etree
import numpy as np
import collections
import pickle
# 總頁面的根據字母提取子url和對應品牌
# treeNav brand_tit
url_all='https://db.auto.sohu.com/home/'
req_all=requests.get(url_all)
wb_all=req_all.text; #網頁原始碼
html_all = etree.HTML(wb_all)
 
 
# model-a 先獲取所有車型連結反推建立上層 或根據連結分析建立(英文不推薦)
js=html_all.xpath('//a[@class="model-a"]')
h=[]
for i in js:
    h.append(i.xpath('@href')[0])

所得到的h 即為所有子連結的list；

第二步:

觀察某一個車型子連結下配置頁面的原始碼，尋找配置的具體引數命名和具體引數的值。如下圖

<table id="trimArglist" cellspacing="0" cellpadding="0">
                <tbody>
                <tr id="SIP_C_102">
                    <th class="th1">
                        <div class="th1_div">
                            <a href="http://db.auto.sohu.com/baike/244.shtml" target="_blank">廠商指導價</a>：&nbsp;
                        </div>
                    </th>
                    <td class="th2"></td>
                    <td class="th3">&nbsp;</td>
                    <td class="th4">&nbsp;</td>
                    <td class="th5">&nbsp;</td>
                </tr>
                <tr id="SIP_C_101" class="new-energy-car">
                    <th class="th1">補貼後售價：&nbsp;</th>
                    <td class="th2"></td>
                    <td class="th3">&nbsp;</td>
                    <td class="th4">&nbsp;</td>
                    <td class="th5">&nbsp;</td>
                </tr>
                <tr id="SIP_C_103">
                    <th class="th1">
                        <a href="http://db.auto.sohu.com/baike/247.shtml" target="_blank">4S店報價</a>：&nbsp;
                    </th>
                    <td class="th2">&nbsp;</td>
                    <td class="th3">&nbsp;</td>
                    <td class="th4">&nbsp;</td>
                    <td class="th5">&nbsp;</td>
                </tr>
                <tr id="ttr_0">
                    <th colspan="60" class="colSpan6" style="border-top: 0px">
                        <span>車輛基本引數</span>
                        <span class="sqsq">收起</span>
                    </th>
                </tr>

                <tr id="SIP_C_105">
                    <th class="th1">
                        <a href="http://db.auto.sohu.com/baike/249.shtml" target="_blank">級別</a>：&nbsp;
                    </th>
                    <td class="th2">&nbsp;</td>
                    <td class="th3">&nbsp;</td>
                    <td class="th4">&nbsp;</td>
                    <td class="th5">&nbsp;</td>
                </tr>
                <tr id="SIP_C_109">
                    <th class="th1">
                        <a href="http://db.auto.sohu.com/baike/249.shtml" target="_blank">上市時間</a>：&nbsp;
                    </th>
                    <td class="th2">&nbsp;</td>
                    <td class="th3">&nbsp;</td>
                    <td class="th4">&nbsp;</td>
                    <td class="th5">&nbsp;</td>
                </tr>

經過觀察上圖看到，具體的配置名稱可以通過xpath定位到 table[@id="trimArglist"] 後提取內中所有 a標籤的名稱。但是此處並未有具體配置的值資訊。因此只是一個配置名稱的集合。

而連結配置名稱與配置引數值的樞紐是他的 tr id 比如上圖中：上市時間的表格 id

SIP_C_109 發現該值出現在js的一個引數裡： 見下圖


即呼叫js的引數對錶格進行賦值。連結樞紐是表格id代表其物理意義。

然後我們即提取id 然後搜尋js的這個賦值引數即可。
因在js中所有配置的引數賦值本身就是字典形式，即var trim={SIP_C_103:xxx,SIP_C_104:xxx}
因此直接在python中執行這一js賦值語句即可得到一個字典，然後再將這些
id號比如SIP_C_103對應的網頁div表格下a標籤的中文進行替換即可 

程式碼如下：

# 所有車
df={}
df=collections.OrderedDict()
for o in h: 
    ############################################## 整車配置 #################################################
    url='https:'+o+'/trim.html'
    req=requests.get(url)
    wb_data=req.text  #網頁原始碼
 
    # xpath定位至其js賦給車輛頁面引數的地方
    html = etree.HTML(wb_data) 
    js=html.xpath('//script[@type="text/javascript"]')
    # 這裡有很多js 尋找js記憶體在引數配置備註的這一條 
    k=[]
    for i in range(len(js)):
        if js[i].text!=None:
            if len(re.findall('// 引數配置',js[i].text))!=0:
                k.append(js[i]);
    js=k.copy()
    js=k.copy()
    sss=js[0].text 
    # 定位到具體js的某一個變數 trimParam  順便處理js賦值中TRUE 和false在python中會報錯 因此定義為字元。
    sss=sss[sss.find('trimParam'):]
    sss=sss.replace('false','"false"') 
    sss=sss.replace('true','"true"')
    # 直接呼叫js的賦值. 某些車輛停售或暫未發售無引數預設就繼續迴圈執行(continue)
    exec(sss)
    if len(trimParam)==0:
        continue
    # js對引數賦值時對應的代號的物理意義：比如 SIP_C_103的意義可能是為 續航里程，把代號換掉
    c=[]
    TB=html.xpath('//table[@id="trimArglist"]')
    for i in list(trimParam[0]['SIP_T_CONF'].keys()):
        tbname=TB[0].xpath('//table//tr[@id=\"'+i+'\"]//th[@class="th1"]')
        for j in range(len(trimParam)):
            if len(tbname)!=0:
                if tbname[0].text.replace(' ','')=='\n':
                    tbname=TB[0].xpath('//tr[@id=\"'+i+'\"]//th[@class="th1"]//a')
                c.append(tbname[0].text)
                trimParam[j]['SIP_T_CONF'][tbname[0].text] = trimParam[j]['SIP_T_CONF'].pop(i)  
                try:
                    trimParam[j]['SIP_T_CONF'][tbname[0].text]=trimParam[j]['SIP_T_CONF'][tbname[0].text]['v']
                except:
                    trimParam[j]['SIP_T_CONF'][tbname[0].text]='';
                
                #車輛沒有的配置資料不進行記錄
                if (trimParam[j]['SIP_T_CONF'][tbname[0].text]=='-')  | (trimParam[j]['SIP_T_CONF'][tbname[0].text]==''): # 車輛配置裡-代表車無此配置廠商也無法進行安裝此配置
                    del trimParam[j]['SIP_T_CONF'][tbname[0].text]
            else:
                # 某些配置在js中沒有引數進行賦值，發現是一些複寫的引數比如已有長寬高的資訊和引數值，但是存在名字為長的資訊但沒有賦值，因此不要
                c.append(np.nan)
                del trimParam[j]['SIP_T_CONF'][i]
 
    trimParam_dict={}
    for i in range(len(trimParam)):
        trimParam_dict[trimParam[i]['SIP_T_NAME']]=trimParam[i];
        
 
    # 反推建立資料字典         
    if trimParam[0]['brandName'] not in df.keys():
        df[trimParam[0]['brandName']]={}
    if trimParam[0]['subbrandName'] not in df[trimParam[0]['brandName']].keys():
        df[trimParam[0]['brandName']][trimParam[0]['subbrandName']]={}
        df[trimParam[0]['brandName']][trimParam[0]['subbrandName']]={}
        df[trimParam[0]['brandName']][trimParam[0]['subbrandName']]={}
    df[trimParam[0]['brandName']][trimParam[0]['subbrandName']][trimParam[0]['modelName']]={}
    df[trimParam[0]['brandName']][trimParam[0]['subbrandName']][trimParam[0]['modelName']]['配置引數']=trimParam_dict

最後反推建立字典是根據配置裡的品牌，子品牌，車輛配置名稱資訊建立上層字典的key來定位自身。

至此配置資訊的字典格式就完成了，因為訪問每一個車型時都會進行資料處理，因此訪問間隔不會太短導致被反爬機制封掉。

接下來是動態圖表的銷量資訊 ,我們希望承接上文，在每一個子品牌的車型旗下直接新建一個key（本來只有上文的配置引數key），讓他記錄歷史的銷量資訊。

首先動態圖表的歷史資料在網頁原始碼上搜不到，那麼我們呼叫瀏覽器的控制檯來觀察他在動態圖表上顯示資料的整個響應過程，通過這個來找圖表呼叫的資料來源是什麼。

開啟控制檯觀察一個車型子連結的銷量頁面。見下圖：

左側為動態圖表，右側為控制檯，現在我們點一下全部資料

響應的資訊資料出現了，見右側控制檯xml引數，觀察xml的header(當前是response返回的資料)

不難發現資料是從這個連結得到的，如下圖:

這跟我們車型的關係樞紐就是model後的那一串數字即為車型id號的連結，那麼每一款車型的id號知道了，就能獲取每一個銷量資料的連結，

車型id號恰好我們在呼叫js賦值時發現是有的那麼在之前的迴圈中提取id號然後處理銷量資料即可，程式碼如下面的銷量資料部分：

for o in h: 
    ############################################## 整車配置 #################################################
    url='https:'+o+'/trim.html'
    req=requests.get(url)
    wb_data=req.text  #網頁原始碼
 
    # xpath定位至其js賦給車輛頁面引數的地方
    html = etree.HTML(wb_data) 
    js=html.xpath('//script[@type="text/javascript"]')
    # 這裡有很多js 尋找js記憶體在引數配置備註的這一條 
    k=[]
    for i in range(len(js)):
        if js[i].text!=None:
            if len(re.findall('// 引數配置',js[i].text))!=0:
                k.append(js[i]);
    js=k.copy()
    js=k.copy()
    sss=js[0].text 
    # 定位到具體js的某一個變數 trimParam
    sss=sss[sss.find('trimParam'):]
    sss=sss.replace('false','"false"')
    sss=sss.replace('true','"true"')
    # 直接呼叫js的賦值.
    exec(sss)
    if len(trimParam)==0:
        continue
    # js對引數賦值時對應的代號的物理意義：比如 SIP_C_103的意義可能是為 續航里程，把代號換掉
    c=[]
    TB=html.xpath('//table[@id="trimArglist"]')
    for i in list(trimParam[0]['SIP_T_CONF'].keys()):
        tbname=TB[0].xpath('//table//tr[@id=\"'+i+'\"]//th[@class="th1"]')
        for j in range(len(trimParam)):
            if len(tbname)!=0:
                if tbname[0].text.replace(' ','')=='\n':
                    tbname=TB[0].xpath('//tr[@id=\"'+i+'\"]//th[@class="th1"]//a')
                c.append(tbname[0].text)
                trimParam[j]['SIP_T_CONF'][tbname[0].text] = trimParam[j]['SIP_T_CONF'].pop(i)  
                try:
                    trimParam[j]['SIP_T_CONF'][tbname[0].text]=trimParam[j]['SIP_T_CONF'][tbname[0].text]['v']
                except:
                    trimParam[j]['SIP_T_CONF'][tbname[0].text]='';
                
                #車輛沒有的配置資料不進行記錄
                if (trimParam[j]['SIP_T_CONF'][tbname[0].text]=='-')  | (trimParam[j]['SIP_T_CONF'][tbname[0].text]==''): # 車輛配置裡-代表車無此配置廠商也無法進行安裝此配置
                    del trimParam[j]['SIP_T_CONF'][tbname[0].text]
            else:
                # 某些配置在js中沒有引數進行賦值，發現是一些複寫的引數比如已有長寬高的資訊和引數值，但是存在名字為長的資訊但沒有賦值，因此不要
                c.append(np.nan)
                del trimParam[j]['SIP_T_CONF'][i]
 
    trimParam_dict={}
    for i in range(len(trimParam)):
        trimParam_dict[trimParam[i]['SIP_T_NAME']]=trimParam[i];
        
 
    # 反推建立資料字典         
    if trimParam[0]['brandName'] not in df.keys():
        df[trimParam[0]['brandName']]={}
    if trimParam[0]['subbrandName'] not in df[trimParam[0]['brandName']].keys():
        df[trimParam[0]['brandName']][trimParam[0]['subbrandName']]={}
        df[trimParam[0]['brandName']][trimParam[0]['subbrandName']]={}
        df[trimParam[0]['brandName']][trimParam[0]['subbrandName']]={}
    df[trimParam[0]['brandName']][trimParam[0]['subbrandName']][trimParam[0]['modelName']]={}
    df[trimParam[0]['brandName']][trimParam[0]['subbrandName']][trimParam[0]['modelName']]['配置引數']=trimParam_dict
    
        
    ############################################## 銷量資料 #################################################
    vehicle_model_id= trimParam[0]['SIP_T_MODELID']
    url='https://db.auto.sohu.com/cxdata/xml/sales/model/model'+str(vehicle_model_id)+'sales.xml'
    req=requests.get(url)
    wb_data=req.text  #網頁原始碼
    sales=re.findall(r'(?<=<sales).*?(?=/>)',wb_data)
    if len(sales)==0:
        continue;
    else:
        df[trimParam[0]['brandName']][trimParam[0]['subbrandName']][trimParam[0]['modelName']]['歷史銷量']={}
        for i in sales:
            df[trimParam[0]['brandName']][trimParam[0]['subbrandName']][trimParam[0]['modelName']]['歷史銷量'][re.findall(r'(?<=date=").*?(?=")',i)[0]]=int(re.findall(r'(?<=salesNum=").*?(?=")',i)[0])
    print(trimParam[0]['subbrandName']+trimParam[0]['modelName']+'--num'+str(h.index(o))+'--total:'+str(len(h)))

至此整個字典就定義好了，最上層為品牌，其次是子品牌，然後是配置，最後分銷量和配置資訊。

接下來要麼就已字典的格式利用pymongo存到mongodb裡去，要麼改成dataframe格式存入sql都可。需要注意的是mongodb存入的過程中字典key不可以出現" ."點的符號因此需要替換。

給出一個替換函式供參考

# fix 字典內keys含有.並替換
def fix_dict(data, ignore_duplicate_key=True):
    """
    Removes dots "." from keys, as mongo doesn't like that.
    If the key is already there without the dot, the dot-value get's lost.
    This modifies the existing dict!

    :param ignore_duplicate_key: True: if the replacement key is already in the dict, now the dot-key value will be ignored.
                                 False: raise ValueError in that case.
    """
    if isinstance(data, (list, tuple)):
        list2 = list()
        for e in data:
            list2.append(fix_dict(e))
        # end if
        return list2
    if isinstance(data, dict):
        # end if
        for key, value in data.items():
            value = fix_dict(value)
            old_key = key
            if "." in key:
                key = old_key.replace(".", "_")
                if key not in data:
                    data[key] = value
                else:
                    error_msg = "Dict key {key} containing a \".\" was ignored, as {replacement} already exists".format(
                        key=key_old, replacement=key)
                    if force:
                        import warnings
                        warnings.warn(error_msg, category=RuntimeWarning)
                    else:
                        raise ValueError(error_msg)
                    # end if
                # end if
                del data[old_key]
            # end if
            data[key] = value
        # end for
        return data
    # end if
    return data
# end def
df_2=fix_dict(df);

View Code

我這裡做成一個首字母的key在品牌之前，然後按照pkl的格式儲存到本地

#按照首字母檢索的字典
for letter in range(65,91):
     df2[chr(letter)]={}
for i in df.keys():
    df2[lazy_pinyin(i)[0][0].upper()][i]=df[i]

#本地檔案儲存
output = open('soho_vehicle.pkl', 'wb')        
pickle.dump(df2, output)
output.close()

後續也可以再處理存到sql 並另存為csv或excel 用於檢視。

import pandas as pd
import numpy as np
import pickle

output = open('soho_vehicle.pkl', 'wb')        
df=pickle.load(output)
output.close()    
# 配置資訊整理
a=[]
for o in df.keys():
    for i in df[o].keys():
        for j in df[o][i].keys():
            for k in df[o][i][j]['配置引數'].keys():
                df[o][i][j]['配置引數'][k]['SIP_T_CONF']['子品牌']=df[o][i][j]['配置引數'][k]['subbrandName']       
                df[o][i][j]['配置引數'][k]['SIP_T_CONF']['品牌']=df[o][i][j]['配置引數'][k]['brandName']       
                df[o][i][j]['配置引數'][k]['SIP_T_CONF']['款式']=df[o][i][j]['配置引數'][k]['modelName']     
                df[o][i][j]['配置引數'][k]['SIP_T_CONF']['配置名稱']=df[o][i][j]['配置引數'][k]['SIP_T_NAME']                                        
                df[o][i][j]['配置引數'][k]['SIP_T_CONF']['是否電動']=df[o][i][j]['配置引數'][k]['SIP_C_ISELECTRIC']    
                a.append(pd.Series(df[o][i][j]['配置引數'][k]['SIP_T_CONF']))
                
df_trim=pd.DataFrame(a)
             
df_trim=df_trim.replace(np.nan,'---');   
cols = list(df_trim)
for i in cols:
    df_trim[i]=df_trim[i].str.strip();
    df_trim[i]=df_trim[i].apply(lambda x:x.replace('m³','立方米'))
    df_trim[i]=df_trim[i].apply(lambda x:x.replace('\xa0',' '))
    

#df_trim['配置名稱']=df_trim['配置名稱'].apply(lambda x:x.replace('m³','立方米'))    
cols=list(pd.Series(cols).str.strip());
  

cols.insert(0, cols.pop(cols.index('保修政策')))
cols.insert(0, cols.pop(cols.index('車聯網：'))) 
cols.insert(0, cols.pop(cols.index('自動泊車入位')))    
cols.insert(0, cols.pop(cols.index('車身穩定控制')))                 
cols.insert(0, cols.pop(cols.index('車載資訊服務')))
cols.insert(0, cols.pop(cols.index('車道保持輔助系統：')))
cols.insert(0, cols.pop(cols.index('車道偏離預警系統：')))
cols.insert(0, cols.pop(cols.index('倒車車側預警系統：')))
cols.insert(0, cols.pop(cols.index('主動剎車/主動安全系統')))
cols.insert(0, cols.pop(cols.index('中央差速器結構')))
cols.insert(0, cols.pop(cols.index('底盤結構')))
cols.insert(0, cols.pop(cols.index('轉向助力')))
cols.insert(0, cols.pop(cols.index('輪轂材料')))
cols.insert(0, cols.pop(cols.index('進氣形式：')))
cols.insert(0, cols.pop(cols.index('每缸氣門數(個)')))
cols.insert(0, cols.pop(cols.index('氣門結構')))
cols.insert(0, cols.pop(cols.index('汽缸容積(cc)')))
cols.insert(0, cols.pop(cols.index('汽缸排列形式')))
cols.insert(0, cols.pop(cols.index('最大馬力(ps)')))
cols.insert(0, cols.pop(cols.index('最大扭矩(N·m/rpm)')))
cols.insert(0, cols.pop(cols.index('最大功率(kW/rpm)')))
cols.insert(0, cols.pop(cols.index('擋位個數')))
cols.insert(0, cols.pop(cols.index('變速箱型別')))
cols.insert(0, cols.pop(cols.index('變速箱')))
cols.insert(0, cols.pop(cols.index('壓縮比')))
cols.insert(0, cols.pop(cols.index('發動機電子防盜')))
cols.insert(0, cols.pop(cols.index('發動機型號')))
cols.insert(0, cols.pop(cols.index('發動機啟停技術')))
cols.insert(0, cols.pop(cols.index('發動機')))
cols.insert(0, cols.pop(cols.index('工信部油耗(L/100km)')))
cols.insert(0, cols.pop(cols.index('排放標準')))
cols.insert(0, cols.pop(cols.index('供油方式')))
cols.insert(0, cols.pop(cols.index('整車最大扭矩(N·m)：')))
cols.insert(0, cols.pop(cols.index('整車最大功率(kW)：')))
cols.insert(0, cols.pop(cols.index('軸距(mm)')))
cols.insert(0, cols.pop(cols.index('整備質量(kg)')))
cols.insert(0, cols.pop(cols.index('長x寬x高(mm)')))
cols.insert(0, cols.pop(cols.index('車體結構')))
cols.insert(0, cols.pop(cols.index('官方最高車速(km/h)')))
cols.insert(0, cols.pop(cols.index('官方0-100加速(s)')))
cols.insert(0, cols.pop(cols.index('快充時間(小時)：')))
cols.insert(0, cols.pop(cols.index('快充電量(%)：')))
cols.insert(0, cols.pop(cols.index('充電相容性：')))
cols.insert(0, cols.pop(cols.index('充電方式：')))
cols.insert(0, cols.pop(cols.index('電池種類：')))
cols.insert(0, cols.pop(cols.index('電池容量(kWh)：')))
cols.insert(0, cols.pop(cols.index('電動機總功率(kW)：')))
cols.insert(0, cols.pop(cols.index('電動機總扭矩(N·m)：')))
cols.insert(0, cols.pop(cols.index('電動機總扭矩(N·m)：')))
cols.insert(0, cols.pop(cols.index('電機佈局：')))
cols.insert(0, cols.pop(cols.index('電機數：')))
cols.insert(0, cols.pop(cols.index('電機型別：')))
cols.insert(0, cols.pop(cols.index('上市時間')))
cols.insert(0, cols.pop(cols.index('動力型別：')))
cols.insert(0, cols.pop(cols.index('驅動方式')))
cols.insert(0, cols.pop(cols.index('補貼後售價：')))
cols.insert(0, cols.pop(cols.index('4S店報價')))
cols.insert(0, cols.pop(cols.index('廠商指導價')))
cols.insert(0, cols.pop(cols.index('級別')))
cols.insert(0, cols.pop(cols.index('百公里耗電量(kWh/100km)：')))
cols.insert(0, cols.pop(cols.index('是否電動')))
cols.insert(0, cols.pop(cols.index('配置名稱')))
cols.insert(0, cols.pop(cols.index('款式')))
cols.insert(0, cols.pop(cols.index('子品牌')))
cols.insert(0, cols.pop(cols.index('品牌')))

df_trim = df_trim.ix[:, cols]
df_trim=df_trim.replace(np.nan,'---');   
df_trim=df_trim.drop(['高度(mm)','長度(mm)','寬度(mm)'],axis=1)
df_trim=df_trim.drop(['車門數(個)','4S店報價'],axis=1)
df_trim.to_csv('soho_veh_trim_para.csv',encoding='gbk')

#銷量
a=[]
for o in df.keys():
    for i in df[o].keys():
        for j in df[o][i].keys():
            try:
                k=list(df[o][i][j]['配置引數'].keys())[0];
                df[o][i][j]['銷量']['子品牌']=df[o][i][j]['配置引數'][k]['subbrandName']    
                df[o][i][j]['銷量']['品牌']=df[o][i][j]['配置引數'][k]['brandName']       
                df[o][i][j]['銷量']['款式']=df[o][i][j]['配置引數'][k]['modelName'] 
                a.append(pd.Series(df[o][i][j]['銷量']))
            except:
                continue

df_sales=pd.DataFrame(a)         

cols = list(df_sales)
cols.reverse()
cols.insert(0, cols.pop(cols.index('款式')))
cols.insert(0, cols.pop(cols.index('子品牌')))
cols.insert(0, cols.pop(cols.index('品牌')))
df_sales = df_sales.ix[:, cols]
df_sales=df_sales.fillna(0)
df_sales.to_csv('soho_veh_sales.csv',encoding='gbk') 
    
#存入 sql
from sqlalchemy import create_engine
import pandas as pd
import numpy as np
from sqlalchemy.types import VARCHAR  
host = '127.0.0.1'
port= 3306
db = 'soho_vehicle'
user = 'root'
password = 'twz1478963'
engine = create_engine(str(r"mysql+mysqldb://%s:" + '%s' + "@%s/%s?charset=utf8") % (user, password, host, db))
df_sales.to_sql('soho_veh_sales', con=engine, if_exists='append', index=False) #如量級過大使用chunksize
df_trim=df_trim.drop(['內飾可選顏色','車身可選顏色'],axis=1)
df_trim.to_sql('soho_veh_trim', con=engine, if_exists='append', index=False) #如量級過大使用chunksize

View Code

歡迎交流！

爬蟲Selenium+PhantomJS爬取動態網站圖片資訊（Python）
2018-03-24
爬蟲JS網站Python
python網路爬蟲（7）爬取靜態資料詳解
2019-06-07
Python爬蟲
python爬蟲——爬取大學排名資訊
2019-08-02
Python爬蟲
小白學 Python 爬蟲（25）：爬取股票資訊
2019-12-24
Python爬蟲
Python資料爬蟲學習筆記（11）爬取千圖網圖片資料
2018-09-18
Python爬蟲筆記
如何高效獲取大資料?動態ip代理：用爬蟲!
2019-01-24
大資料爬蟲
iphone 獲取地址的詳細資訊
2021-01-26
iPhone
「資料分析」2種常見的反爬蟲策略，資訊驗證和動態反爬蟲
2022-02-23
爬蟲
最簡單的網路圖片的爬取 --Pyhon網路爬蟲與資訊獲取
2020-04-04
爬蟲
Python爬蟲精簡步驟1 獲取資料
2020-02-17
Python爬蟲
python爬蟲獲取天氣網實時資料
2022-11-29
Python爬蟲
python爬蟲--爬取鏈家租房資訊
2020-05-16
Python爬蟲
Python 爬蟲獲取網易雲音樂歌手資訊
2019-03-04
Python爬蟲
python爬蟲，獲取中國工程院院士資訊
2021-12-04
Python爬蟲
利用Python爬蟲獲取招聘網站職位資訊
2021-08-09
Python爬蟲網站
Python爬蟲實戰：爬取淘寶的商品資訊
2021-09-11
Python爬蟲
獲取爬蟲動態IP的三種方法
2022-06-06
爬蟲
如何用python爬蟲分析動態網頁的商品資訊？
2021-09-11
Python爬蟲網頁
用Python網路爬蟲獲取Mikan動漫資源
2020-08-26
Python爬蟲
Python爬蟲框架：scrapy爬取高考派大學資料
2019-10-07
Python爬蟲框架
Python爬蟲實戰案例：取喜馬拉雅音訊資料詳解
2020-12-05
Python爬蟲音訊
python爬蟲汽車之家全車型及基本引數入資料庫（截止50524個資料）（詳解）
2020-09-16
Python爬蟲資料庫
python 爬蟲 5i5j房屋資訊獲取並儲存到資料庫
2018-08-20
Python爬蟲資料庫
Python爬蟲訓練：爬取酷燃網視訊資料
2020-10-23
Python爬蟲
輕鬆利用Python爬蟲爬取你想要的資料
2021-09-10
Python爬蟲
ExtJs獲取GridPanel選中行的詳細資訊
2021-09-09
JS
Golang 爬蟲快速入門 | 獲取 B 站全站的視訊資料
2020-04-14
Golang爬蟲
爬蟲：拉勾自動投遞簡歷+資料獲取
2020-10-21
爬蟲
Python爬蟲之小說資訊爬取與資料視覺化分析
2021-01-09
Python爬蟲視覺化
爬蟲實踐之獲取網易雲評論資料資訊
2022-03-29
爬蟲
微信小程式呼叫騰訊地圖獲取jsonp資料解析詳細地址資訊
2018-08-03
微信小程式地圖JSON
Python爬蟲爬取淘寶，京東商品資訊
2020-02-11
Python爬蟲
python 爬蟲如何爬取動態生成的網頁內容
2024-10-31
Python爬蟲網頁
房產資料爬取、智慧財產權資料爬取、企業工商資料爬取、抖音直播間資料python爬蟲爬取
2024-07-11
Python爬蟲
python爬蟲抓取哈爾濱天氣資訊（靜態爬蟲）
2020-04-05
Python爬蟲
sqlplus -prelim和oradebug捕獲資料庫hang住的詳細資訊
2019-06-25
SQL資料庫
Python爬蟲實戰詳解：爬取圖片之家
2020-11-04
Python爬蟲
2023年9月全球最暢銷電動汽車車型銷量（附原資料表）
2023-11-24

python爬蟲獲取搜狐汽車的配置資訊 和swf動態圖表的銷量資料-------詳細教學

相關文章

python爬蟲獲取搜狐汽車的配置資訊和swf動態圖表的銷量資料-------詳細教學