抖音直播彈幕抓取PHP版本

chenggx發表於2023-09-26

最近需要抓取抖音直播的彈幕訊息,網上找了一下基本上都是 python 的版本,雖然用起來沒有太大的影響,但本著 PHP 是世界上最好的語言:trollface: 就寫了一個簡單的指令碼方便使用。以下是主要程式碼:

  1. 首先透過直播連結獲取 ttwid

    $client = new Client();
    
         $response =  $client->get($liveUrl, [
             'headers' => [
                 'accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
                 'User-Agent' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36',
                 'cookie' => '__ac_nonce=0638733a400869171be51',
             ]
         ]);
    $cookieString = $response->getHeader('Set-Cookie');
    $cookieArray = explode(';', $cookieString[0]);
    $ttwidStr = $cookieArray[0];
    return substr($ttwidStr, strpos($ttwidStr, '=') + 1);
  2. 在透過該連結解析出roomid

         $html = $response->getBody()->getContents();
         $pattern = '/roomId\\\\":\\\\"(\d+)\\\\"/';
         preg_match($pattern, $html, $matches);
    
         return $matches[1];
  3. 拼接出websocket 連線和請求頭

    $header = [
                 'cookie' => 'ttwid=' . $ttwid,
                 'user-agent' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
             ];
             $webSocketUrl = 'ws://webcast3-ws-web-lq.douyin.com/webcast/im/push/v2/?app_name=douyin_web&version_code=180800&webcast_sdk_version=1.3.0&update_version_code=1.3.0&compress=gzip&internal_ext=internal_src:dim|wss_push_room_id:' . $liveRoomId . '|wss_push_did:7188358506633528844|dim_log_id:20230521093022204E5B327EF20D5CDFC6|fetch_time:1684632622323|seq:1|wss_info:0-1684632622323-0-0|wrds_kvs:WebcastRoomRankMessage-1684632106402346965_WebcastRoomStatsMessage-1684632616357153318&cursor=t-1684632622323_r-1_d-1_u-1_h-1&host=https://live.douyin.com&aid=6383&live_id=1&did_rule=3&debug=false&maxCacheMessageNumber=20&endpoint=live_pc&support_wrds=1&im_path=/webcast/im/fetch/&user_unique_id=7188358506633528844&device_platform=web&cookie_enabled=true&screen_width=1440&screen_height=900&browser_language=zh&browser_platform=MacIntel&browser_name=Mozilla&browser_version=5.0%20(Macintosh;%20Intel%20Mac%20OS%20X%2010_15_7)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/113.0.0.0%20Safari/537.36&browser_online=true&tz_name=Asia/Shanghai&identity=audience&room_id=' . $liveRoomId . '&heartbeatDuration=0&signature=00000000';
  1. 最後透過workman 中的AsyncTcpConnection 進行連結獲取資料
$wsClient = new AsyncTcpConnection($webSocketUrl);

            // 設定以ssl加密方式訪問,使之成為wss
            $wsClient->transport = 'ssl';
            $wsClient->headers = $header;

            $parseMsg = new ParseMsg($conn);

            $wsClient->onMessage = [$parseMsg, 'on_message'];

            $wsClient->connect();

具體具體的解析程式碼和 protobuf 我放在github 上面了,需要的朋友自己去看吧。

還有一個比較重要的點是彈幕訊息是透過google 的 protobuf 協議進行編碼,需要大家瞭解一下protobuf 協議

抖音直播彈幕抓取 (github.com)

提供一個測試地址吧 ws://47.93.122.172:4200

訊息格式如下:

{
    "url":"https://live.douyin.com/619592756125"
}
本作品採用《CC 協議》,轉載必須註明作者和本文連結

相關文章