Python爬蟲之路-JS的解析

Jiayu920716發表於2021-01-04

原文網址 : https://blog.csdn.net/Yuyu920716/article/details/112167390

Python爬蟲JS

JS的解析

學習目標：

瞭解定位js的方法
瞭解新增斷點觀察js的執行過程的方法
應用 js2py獲取js的方法

1 確定js的位置

對於前面人人網的案例，我們知道了url地址中有部分引數，但是引數是如何生成的呢？

毫無疑問，引數肯定是js生成的，那麼如何獲取這些引數的規律呢？通過下面的學習來了解

1.1 觀察按鈕的繫結js事件

在這裡插入圖片描述

通過點選按鈕，然後點選Event Listener，部分網站可以找到繫結的事件，對應的，只需要點選即可跳轉到js的位置

1.2 通過search all file 來搜尋

部分網站的按鈕可能並沒有繫結js事件監聽，那麼這個時候可以通過搜尋請求中的關鍵字來找到js的位置，比如livecell

在這裡插入圖片描述

點選美化輸出選項

在這裡插入圖片描述

可以繼續在其中搜尋關鍵字

在這裡插入圖片描述

2 觀察js的執行過程

找到js的位置之後，我們可以來通過觀察js的位置，找到js具體在如何執行，後續我們可以通過python程式來模擬js的執行，或者是使用類似js2py直接把js程式碼轉化為python程式去執行

觀察js的執行過程最簡單的方式是新增斷點

在這裡插入圖片描述

新增斷點的方式：在左邊行號點選即可新增，對應的右邊BreakPoints中會出現現有的所有斷點

新增斷點之後繼續點選登入，每次程式在斷點位置都會停止，通過如果該行有變數產生，都會把變數的結果展示在Scoope中

在上圖的右上角有1，2，3三個功能，分別表示：
- 1：繼續執行到下一個斷點
- 2：進入呼叫的函式中
- 3：從呼叫的函式中跳出來

3 js2py的使用

在知道了js如何生成我們想要的資料之後，那麼接下來我們就需要使用程式獲取js執行之後的結果了

3.1 js2py的介紹

js2py是一個js的翻譯工具，也是一個通過純python實現的js的直譯器，github上原始碼與示例

3.2 js的執行思路

js的執行方式大致分為兩種：

在瞭解了js內容和執行順序之後，通過python來完成js的執行過程，得到結果
在瞭解了js內容和執行順序之後，使用類似js2py的模組來執js程式碼，得到結果

但是在使用python程式實現js的執行時候，需要觀察的js的每一個步驟，非常麻煩，所以更多的時候我們會選擇使用類似js2py的模組去執行js，接下來我們來使用js2py實現人人網登入引數的獲取

3.3 具體的實現

定位進行登入js程式碼

formSubmit: function() {
        var e, t = {};
        $(".login").addEventListener("click", function() {
            t.phoneNum = $(".phonenum").value,
            t.password = $(".password").value,
            e = loginValidate(t),
            t.c1 = c1 || 0,
            e.flag ? ajaxFunc("get", "http://activity.renren.com/livecell/rKey", "", function(e) {
                var n = JSON.parse(e).data;
                if (0 == n.code) {
                    t.password = t.password.split("").reverse().join(""),
                    setMaxDigits(130);
                    var o = new RSAKeyPair(n.e,"",n.n)
                      , r = encryptedString(o, t.password);
                    t.password = r,
                    t.rKey = n.rkey
                } else
                    toast("公鑰獲取失敗"),
                    t.rKey = "";
                ajaxFunc("post", "http://activity.renren.com/livecell/ajax/clog", t, function(e) {
                    var e = JSON.parse(e).logInfo;
                    0 == e.code ? location.href = localStorage.getItem("url") || "" : toast(e.msg || "登入出錯")
                })
            }) : toast(e.msg)
        })
    }

從程式碼中我們知道:

我們要登入需要對密碼進行加密和獲取rkey欄位的值
rkey欄位的值我們直接傳送請求rkey請求就可以獲得
密碼是先反轉然後使用RSA進行加密, js程式碼很複雜, 我們希望能通過在python中執行js來實現

實現思路:

使用session傳送rKey獲取登入需要資訊
- url: http://activity.renren.com/livecell/rKey
- 方法: get
根據獲取資訊對密碼進行加密
2.1 準備使用者名稱和密碼

2.2 使用js2py生成js的執行環境:context

2.3 拷貝使用到js檔案的內容到本專案中

2.4 讀取js檔案的內容,使用context來執行它們

2.5 向context環境中新增需要資料

2.6 使用context執行加密密碼的js字串

2.7 通過context獲取加密後密碼資訊
使用session傳送登入請求
- URL: http://activity.renren.com/livecell/ajax/clog
- 請求方法: POST
- 資料:
```
phoneNum: xxxxxxx
password: (加密後生產的)
c1: 0
rKey: rkey請求獲取的
```

具體程式碼

需要提前下載幾個js檔案到本地：

BigInt.js

RSA.js

Barrett.js

import requests
import json
import js2py

# - 實現思路:
#   - 使用session傳送rKey獲取登入需要資訊
#     - url: http://activity.renren.com/livecell/rKey
#     - 方法: get
#  獲取session物件
session = requests.session()
headers = {
    "User-Agent": "Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Mobile Safari/537.36",
    "X-Requested-With": "XMLHttpRequest",
    "Content-Type":"application/x-www-form-urlencoded"
}
# 設定session的請求頭資訊
session.headers = headers

response = session.get("http://activity.renren.com/livecell/rKey")
# print(response.content.decode())
n = json.loads(response.content)['data']

#   - 根據獲取資訊對密碼進行加密
#     - 準備使用者名稱和密碼
phoneNum = "131..."
password = "****"
#     - 使用js2py生成js的執行環境:context
context = js2py.EvalJs()
#     - 拷貝使用到js檔案的內容到本專案中
#     - 讀取js檔案的內容,使用context來執行它們
with open("BigInt.js", 'r', encoding='utf8') as f:
    context.execute(f.read())

with open("RSA.js", 'r', encoding='utf8') as f:
    context.execute(f.read())
with open("Barrett.js", 'r', encoding='utf8') as f:
    context.execute(f.read())


# - 向context環境中新增需要資料
context.t = {'password': password}
context.n = n
#     - 執行加密密碼的js字元
js = '''
       t.password = t.password.split("").reverse().join(""),
       setMaxDigits(130);
       var o = new RSAKeyPair(n.e,"",n.n)
        , r = encryptedString(o, t.password);
      '''
context.execute(js)
# - 通過context獲取加密後密碼資訊
# print(context.r)
password = context.r
#   - 使用session傳送登入請求
#     - URL: http://activity.renren.com/livecell/ajax/clog
#     - 請求方法: POST
#     - 資料:
#       - phoneNum: 15565280933
#       - password: (加密後生產的)
#       - c1: 0
#       - rKey: rkey請求獲取的
data = {
    'phoneNum': '131....',
    'password': password,
    'c1':0,
    'rKey':n['rkey']
}

# print(session.headers)
response = session.post("http://activity.renren.com/livecell/ajax/clog", data=data)
print(response.content.decode())

# 訪問登入的資源
response = session.get("http://activity.renren.com/home#profile")
print(response.content.decode())

小結

通過在chrome中觀察元素的繫結事件可以確定js
通過在chrome中search all file 搜尋關鍵字可以確定js的位置
觀察js的資料生成過程可以使用新增斷點的方式觀察
js2py的使用
- 需要準備js的內容
- 生成js的執行環境
- 在執行環境中執行js的字串，傳入資料，獲取結果

Python爬蟲之路-chrome在爬蟲中的使用
2021-01-04
Python爬蟲Chrome
Python爬蟲之路-selenium在爬蟲中的使用
2021-01-04
Python爬蟲
python爬蟲實戰，爬蟲之路，永無止境
2022-01-27
Python爬蟲
python爬蟲js逆向
2019-09-22
Python爬蟲JS
Python爬蟲之路-爬蟲基礎知識(理論)
2021-01-04
Python爬蟲
Python爬蟲之路-jsonpath模組
2021-01-04
Python爬蟲JSON
Python爬蟲之路-lxml模組
2021-01-04
Python爬蟲XML
Python爬蟲js處理
2020-03-31
Python爬蟲JS
python爬蟲之JS逆向
2022-06-11
Python爬蟲JS
python爬蟲之解析連結
2020-12-01
Python爬蟲
Python爬蟲的兩套解析方法和四種爬蟲實現
2018-07-03
Python爬蟲
python爬蟲之js逆向（三）
2020-01-06
Python爬蟲JS
python爬蟲之js逆向（二）
2019-11-05
Python爬蟲JS
【Python學習】爬蟲爬蟲爬蟲爬蟲~
2018-05-03
Python爬蟲
Python爬蟲之資料解析（XPath）
2018-12-18
Python爬蟲
python爬蟲js逆向學習（二）
2020-07-03
Python爬蟲JS
Python爬蟲之JS逆向分析技巧
2020-04-17
Python爬蟲JS
Python爬蟲：爬取instagram，破解js加密引數
2019-04-09
Python爬蟲JS加密
Python 爬蟲從入門到進階之路（十）
2019-07-03
Python爬蟲
Python 爬蟲從入門到進階之路（十五）
2019-07-10
Python爬蟲
Python 爬蟲從入門到進階之路（九）
2019-07-02
Python爬蟲
Python 爬蟲從入門到進階之路（十二）
2019-07-05
Python爬蟲
Python 爬蟲從入門到進階之路（十七）
2019-07-12
Python爬蟲
Python 爬蟲從入門到進階之路（二）
2019-06-20
Python爬蟲
Python 爬蟲從入門到進階之路（十一）
2019-07-04
Python爬蟲
Python 爬蟲從入門到進階之路（六）
2019-06-27
Python爬蟲
Python 爬蟲從入門到進階之路（八）
2019-07-01
Python爬蟲
Python 爬蟲從入門到進階之路（七）
2019-06-28
Python爬蟲
Python 爬蟲從入門到進階之路（十八）
2019-07-15
Python爬蟲
Python 爬蟲從入門到進階之路（十六）
2019-07-11
Python爬蟲
Python 爬蟲從入門到進階之路（三）
2019-06-21
Python爬蟲
Java爬蟲與Python爬蟲的區別？
2023-10-25
Java爬蟲Python
python爬蟲---網頁爬蟲，圖片爬蟲，文章爬蟲，Python爬蟲爬取新聞網站新聞
2019-01-04
Python爬蟲網頁網站
python就是爬蟲嗎-python就是爬蟲嗎
2020-10-29
Python爬蟲
python爬蟲簡單實現逆向JS解密
2019-08-29
Python爬蟲JS解密
Python爬蟲進階之JS逆向入門
2019-05-29
Python爬蟲JS
Python爬蟲的用途
2018-08-16
Python爬蟲
python 爬蟲
2024-04-20
Python爬蟲

Python爬蟲之路-JS的解析

JS的解析

學習目標：

1 確定js的位置

1.1 觀察按鈕的繫結js事件

1.2 通過search all file 來搜尋

2 觀察js的執行過程

3 js2py的使用

3.1 js2py的介紹

3.2 js的執行思路

3.3 具體的實現

從程式碼中我們知道:

實現思路:

具體程式碼

小結

相關文章