python查詢百度seo資訊

pythontab發表於2013-07-18

原文網址 : https://www.pythontab.com/html/2013/pythonhexinbiancheng_0718/503.html

Python

一個簡單的python查詢百度關鍵詞排名的函式，特點：

1、UA隨機

2、操作簡單方便，直接getRank(關鍵詞，域名)就可以了

3、編碼轉化。編碼方面應該沒啥問題了。

4、結果豐富。不僅有排名，還有搜尋結果的title，URL，快照時間，符合SEO需求

缺點：

單執行緒，速度慢

#coding=utf-8
 
import requests
import BeautifulSoup
import re
import random
 
def decodeAnyWord(w):
    try:
        w.decode('utf-8')
    except:
        w = w.decode('gb2312')
    else:
        w = w.decode('utf-8')
    return w
 
def createURL(checkWord):   #create baidu URL with search words
    checkWord = checkWord.strip()
    checkWord = checkWord.replace(' ', '+').replace('\n', '')
    baiduURL = 'http://www.baidu.com/s?wd=%s&rn=100' % checkWord
    return baiduURL 
 
def getContent(baiduURL):   #get the content of the serp
    uaList = ['Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.1.4322;+TencentTraveler)',
    'Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+2.0.50727;+.NET+CLR+3.0.4506.2152;+.NET+CLR+3.5.30729)',
    'Mozilla/5.0+(Windows+NT+5.1)+AppleWebKit/537.1+(KHTML,+like+Gecko)+Chrome/21.0.1180.89+Safari/537.1',
    'Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1)',
    'Mozilla/5.0+(Windows+NT+6.1;+rv:11.0)+Gecko/20100101+Firefox/11.0',
    'Mozilla/4.0+(compatible;+MSIE+8.0;+Windows+NT+5.1;+Trident/4.0;+SV1)',
    'Mozilla/4.0+(compatible;+MSIE+8.0;+Windows+NT+5.1;+Trident/4.0;+GTB7.1;+.NET+CLR+2.0.50727)',
    'Mozilla/4.0+(compatible;+MSIE+8.0;+Windows+NT+5.1;+Trident/4.0;+KB974489)']
    headers = {'User-Agent': random.choice(uaList)}
    ipList = ['202.43.188.13:8080',
    '80.243.185.168:1177',
    '218.108.85.59:81']
    proxies = {'http': 'http://%s' % random.choice(ipList)}
    r = requests.get(baiduURL, headers = headers, proxies = proxies)
    return r.content
 
def getLastURL(rawurl): #get final URL while there're redirects
    r = requests.get(rawurl)
    return r.url
 
def getAtext(atext):    #get the text with <a> and </a>
    pat = re.compile(r'<a .*?>(.*?)</a>')
    match = pat.findall(atext)
    pureText = match[0].replace('<em>', '').replace('</em>', '')
    return pureText
 
def getCacheDate(t):    #get the date of cache
    pat = re.compile(r'<span class="g">.*?(\d{4}-\d{1,2}-\d{1,2})  </span>')
    match = pat.findall(t)
    cacheDate = match[0]
    return cacheDate
 
def getRank(checkWord, domain): #main line
    checkWord = checkWord.replace('\n', '')
    checkWord = decodeAnyWord(checkWord)
    baiduURL = createURL(checkWord) 
    cont = getContent(baiduURL)
    soup = BeautifulSoup.BeautifulSoup(cont)
    results = soup.findAll('table', {'class': 'result'})    #find all results in this page
    for result in results:
        checkData = unicode(result.find('span', {'class': 'g'}))
        if re.compile(r'^[^/]*%s.*?' %domain).match(checkData): #改正則
            nowRank = result['id']  #get the rank if match the domain info
 
            resLink = result.find('h3').a
            resURL = resLink['href']
            domainURL = getLastURL(resURL)  #get the target URL
            resTitle = getAtext(unicode(resLink))   #get the title of the target page
 
            rescache = result.find('span', {'class': 'g'})
            cacheDate = getCacheDate(unicode(rescache)) #get the cache date of the target page
 
            res = u'%s, 第%s名, %s, %s, %s' % (checkWord, nowRank, resTitle, cacheDate, domainURL)
            return res.encode('gb2312')
            break
    else:
        return '>100'
 
domain = 'www.douban.com' #set the domain which you want to search.
 
 
 
f = open('r.txt')
for w in f.readlines():
    print getRank(w, domain)
 
f.close()

jpa 聯合查詢資料，查詢使用者資訊與部門資訊
2019-05-25
域名查詢資訊怎麼操作？可以查詢哪些資訊？（中科三方）
2023-03-17
域名資訊查詢怎麼操作？Godaddy的whois域名資訊查詢在哪裡？
2022-08-13
Go
【python】百度關鍵詞排名查詢實現
2018-12-03
Python
利用 Python 爬蟲實現快遞物流資訊查詢
2020-09-25
Python爬蟲
ps -ef | grep 查詢資訊
2024-01-22
海光 DCU資訊查詢
2024-11-05
企業資訊查詢工具
2024-10-30
每秒百萬條資訊查詢天翼雲助力江蘇核酸檢測資訊查詢
2022-03-17
Oracle OCP(22)：查詢表資訊
2019-01-30
Oracle
mysql查詢表基礎資訊
2024-05-31
MySql
GaussDB 200系統資訊查詢
2020-12-12
百度查詢疫苗真假教程疫苗怎麼查詢真假
2018-07-27
join方法應用之—查詢航班資訊
2020-10-08
DreamJudge-1177-查詢學生資訊
2024-06-15
Python 使用xpath爬蟲查詢身份證資訊和手機號資訊並寫入Excel表格
2018-11-02
Python爬蟲Excel
python天氣查詢
2024-05-06
Python
通過bundle Id查詢應用資訊
2018-05-28
查詢使用者登入資訊sql
2020-01-18
SQL
GBase 庫中查詢表的列資訊
2021-12-28
cpufetch – 查詢cpu架構資訊的工具
2021-10-12
架構
Python全棧MongoDB資料庫（資料的查詢）
2018-08-20
Python全棧MongoDB資料庫
Python全棧 MongoDB 資料庫（資料的查詢）
2018-08-22
Python全棧MongoDB資料庫
如何查詢GBase資料庫中表的comment資訊
2022-02-23
資料庫
MySQL - 資料查詢 - 簡單查詢
2020-12-27
MySql
Python—Django：關於在Django框架中對資料庫的查詢函式，查詢集和關聯查詢
2020-10-31
PythonDjango框架資料庫函式
極兔快遞怎麼查詢物流資訊支援匯出查詢結果嗎？
2022-03-24
一個免費的 Whois 資訊查詢介面
2024-10-10
Java ——MongDB 插入資料、模糊查詢、in查詢
2018-08-10
Java
資料庫高階查詢之子查詢
2018-07-15
資料庫
SSH：hiberate實現資料的查詢（單查詢和全查詢）
2019-01-01
資料庫基礎查詢--單表查詢
2018-07-15
資料庫
elasticsearch查詢之大資料集分頁查詢
2022-02-08
Elasticsearch大資料
小知識系列:查詢資料庫資料的元資訊
2021-10-13
資料庫
python資料庫-mongoDB的高階查詢操作(55)
2019-07-17
Python資料庫MongoDB
Linux基礎命令---查詢使用者資訊finger
2019-01-24
Linux
如何離線查詢 IP 來源和 ISP 資訊
2020-09-25
prometheus 問題排查 grafana頁面資訊查詢不全
2024-11-28
PrometheusGrafana
如何一鍵查詢淘寶訂單物流資訊
2020-12-30

python查詢百度seo資訊

相關文章