本是同根生，相煎何太急-用Google語音識別API破解reCaptcha驗證碼

wyzsk發表於2020-08-19

原文網址 : https://zhuanlan.kanxue.com/article-12596.htm

GoAPIAPT

作者： A胖 · 2014/04/29 12:16

from：http://www.debasish.in/2014/04/attacking-audio-recaptcha-using-googles.html

0x00 背景

關於驗證碼和驗證碼破解的入門，請看：/tips/?id=141

什麼是reCaptcha？

reCaptchas是由Google提供的基於雲的驗證碼系統，透過結合程式生成的驗證碼和較難被OCR識別的圖片，來幫助Google數字化一些書籍，報紙和街景裡的門牌號等。

enter image description here

reCaptcha同時還有聲音驗證碼的功能，用來給盲人提供服務。

0x01 細節

中心思想：

用Google的Web Speech API語音識別來破解它自己的reCaptcha聲音驗證碼.

enter image description here

下面來看一下用來語音識別的API

Chrome瀏覽器內建了一個基於HTML5的語音輸入API，透過它，使用者可以透過麥克風輸入語音，然後Chrome會識別成文字，這個功能在Android系統下也有。如果你不熟悉這個功能的話這裡有個demo:
https://www.google.com/intl/en/chrome/demos/speech.html

我一直很好奇這個語音識別API是如何工作的，是透過瀏覽器本身識別的還是把音訊傳送到雲端識別呢？

透過抓包發現，好像的確會把語音傳送到雲端，不過傳送出去的資料是SSL加密過的。

於是我開始翻Chromium專案的原始碼，終於我找到了有意思的地方：

http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/

實現過程非常簡單，首先從mic獲取音訊資料，然後傳送到Google的語音識別Web服務，返回JSON格式的識別結果。用來識別的Web API在這裡:

https://www.google.com/speech-api/v1/recognize

比較重要的一點是這個API只接受flac格式的音訊(無損格式，真是高大上)。

既然知道了原理，寫一個利用這個識別API的程式就很簡單了。

#!bash
./google_speech.py hello.flac

原始碼：

#!python
'''
Accessing Google Web Speech API using Pyhon
Author : Debasish Mandal

'''

import httplib
import sys

print '[+] Sending clean file to Google voice API'
f = open(sys.argv[1])
data = f.read()
f.close()
google_speech = httplib.HTTPConnection('www.google.com')
google_speech.request('POST','/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US',data,{'Content-type': 'audio/x-flac; rate=16000'})
print google_speech.getresponse().read()
google_speech.close()

研究了一下reCaptcha的語音驗證碼後，你會發現基本上有兩種語音驗證碼，一種是非常簡單的，沒有加入很多噪音，語音也很清晰。另外一種是非常複雜的，故意加了很多噪音，連真人很難聽出來。這種驗證碼裡面估計加了很多嘶嘶的噪聲，並且用很多人聲作為干擾。

關於這個語音驗證碼的細節可以參考這裡https://groups.google.com/forum/#!topic/recaptcha/lkCyM34zbJo

在這篇文章中我主要寫了如何解決前一種驗證碼，雖然我為了破解後一種複雜的驗證碼也做了很多努力，但是實在是太困難了，即使是人類對於它的識別率也很低。

使用者可以把recaptcha的語音驗證碼以mp3格式下載下來，但是Google語音識別介面只接受flac格式，所以我們需要對下載回來的mp3進行一些處理然後轉換成flac再提交。

我們先手工驗證一下這樣行不行：

首先把recaptcha播放的音訊下載成mp3檔案。

然後用一個叫Audacity的音訊編輯軟體開啟,如圖

enter image description here

把第一個數字的聲音複製到新視窗中，然後再重複一次，這樣我們把第一位數字的聲音複製成連續的兩個相同聲音。

比如這個驗證碼是76426，我們的目的是把7先分離出來，然後讓7的語音重複兩次。

enter image description here

最後把這段音訊儲存成wav格式，再轉換成flac格式,然後提交到API。

#!bash
[email protected] ~/Desktop/audio/heart attack/final $ sox cut_0.wav -r 16000 -b 16 -c 1 cut_0.flac lowpass -2 2500
[email protected] ~/Desktop/audio/heart attack/final $ python send.py cut_0.flac

enter image description here

很好，伺服器成功識別了這段音訊並且返回了正確的結果，下面就需要把這個過程自動化了。

在自動提交之前，我們需要了解一下數字音訊是處理什麼原理。

這個stackoverflow的問題是個很好的教程:

http://stackoverflow.com/questions/732699/how-is-audio-represented-with-numbers

把一個wav格式的檔案用16進位制編輯器開啟:

enter image description here

用Python WAVE模組處理wav格式的音訊:

wave模組提供了一個很方便介面用來處理wav格式:

#!python
import wave 
f = wave.open('sample.wav', 'r') 
print '[+] WAV parameters ',f.getparams() 
print '[+] No. of Frames ',f.getnframes() 
for i in range(f.getnframes()): 
    single_frame = f.readframes(1) 
    print single_frame.encode('hex') 
f.close()

getparams()函式返回一個元組，內容是關於這個wav檔案的一些後設資料，例如頻道數量，取樣寬度，取樣率，幀數等等。

getnframes()返回這個wav檔案有多少幀。

執行這個python程式後，會把sample.wav的每一幀用16進製表示然後print出來

[+] WAV parameters (1, 2, 44100, 937, 'NONE', 'not compressed')  
[+] No. of Frames 937  
[+] Sample 0 = 62fe    <- Sample 1  
[+] Sample 1 = 99fe   <- Sample 2  
[+] Sample 2 = c1ff    <- Sample 3  
[+] Sample 3 = 9000  
[+] Sample 4 = 8700  
[+] Sample 5 = b9ff  
[+] Sample 6 = 5cfe  
[+] Sample 7 = 35fd  
[+] Sample 8 = b1fc  
[+] Sample 9 = f5fc  
[+] Sample 10 = 9afd  
[+] Sample 11 = 3cfe  
[+] Sample 12 = 83fe  
[+] ....

從輸出檔案中我們可以看到，這個wav檔案是單通道的,每個通道是2位元組長，因為音訊是16位元的，我們也可以用 getsampwidth()函式來判斷通道寬度，getchannels() 可以用來確定音訊是單聲道還是立體聲。

接下來對每幀進行解碼，這個16進位制編碼實際上是小端序儲存的(little-endian),所以還需要對這段python程式做一些修改，並且利用struct模組把每幀的值轉換成帶符號的整數。

#!python
import wave 
import struct 

f = wave.open('sample.wav', 'r') 
print '[+] WAV parameters ',f.getparams() 
print '[+] No. of Frames ',f.getnframes() 
for i in range(f.getnframes()): 
    single_frame = f.readframes(1) 
    sint = struct.unpack('<h', single_frame) [0]
    print "[+] Sample ",i," = ",single_frame.encode('hex')," -> ",sint[0] 
f.close()

修改完畢後再次執行，輸出內容差不多這樣:

[+] WAV parameters (1, 2, 44100, 937, 'NONE', 'not compressed')
[+] No. of Frames 937
[+] Sample 0 = 62fe -> -414
[+] Sample 1 = 99fe -> -359
[+] Sample 2 = c1ff -> -63
[+] Sample 3 = 9000 -> 144
[+] Sample 4 = 8700 -> 135
[+] Sample 5 = b9ff -> -71
[+] Sample 6 = 5cfe -> -420
[+] Sample 7 = 35fd -> -715
[+] Sample 8 = b1fc -> -847
[+] Sample 9 = f5fc -> -779
[+] Sample 10 = 9afd -> -614
[+] Sample 11 = 3cfe -> -452
[+] Sample 12 = 83fe -> -381
[+] Sample 13 = 52fe -> -430
[+] Sample 14 = e2fd -> -542

這樣是不是更明白了?下面用python的matplotlib畫圖模組把這些數值畫出來:

#!python
import wave 
import struct 
import matplotlib.pyplot as plt 

data_set = [] 
f = wave.open('sample.wav', 'r') 
print '[+] WAV parameters ',f.getparams() 
print '[+] No. of Frames ',f.getnframes() 
for i in range(f.getnframes()): 
    single_frame = f.readframes(1)
    sint = struct.unpack('<h', single_frame)[0]
    data_set.append(sint) 
f.close() 
plt.plot(data_set) 
plt.ylabel('Amplitude')
plt.xlabel('Time') 
plt.show()

enter image description here

這個圖實際上就是聲音的波形圖

進一步自動化:

下面這段python程式透過音量不同把音訊檔案分割成多個音訊檔案，相當於圖片驗證碼識別中的圖片分割步驟。

#!python
'''
簡單的基於音量的音訊檔案分割程式  


作用:   

1. 簡單的降噪處理  
2. 識別檔案中的高音量部分  
3. 根據高音量部分的數目把檔案分割成獨立檔案  
   
'''  

import wave
import sys
import struct
import os
import time
import httplib
from random import randint


ip = wave.open(sys.argv[1], 'r')
info = ip.getparams()
frame_list = []
for i in range(ip.getnframes()):
    sframe = ip.readframes(1)
    amplitude = struct.unpack('<h', sframe)[0]
    frame_list.append(amplitude)
ip.close()
for i in range(0,len(frame_list)):
    if abs(frame_list[i]) < 25:
        frame_list[i] = 0
################################  Find Out most louder portions of the audio file ###########################
thresh = 30
output = []
nonzerotemp = []
length = len(frame_list)
i = 0
while i < length:
    zeros = []
    while i < length and frame_list[i] == 0:
        i += 1
        zeros.append(0)
    if len(zeros) != 0 and len(zeros) < thresh:
        nonzerotemp += zeros
    elif len(zeros) > thresh:
        if len(nonzerotemp) > 0 and i < length:
            output.append(nonzerotemp)
            nonzerotemp = []
    else:
        nonzerotemp.append(frame_list[i])
        i += 1
if len(nonzerotemp) > 0:
    output.append(nonzerotemp)

chunks = []
for j in range(0,len(output)):
    if len(output[j]) > 3000:
        chunks.append(output[j])
#########################################################################################################

for l in chunks:
    for m in range(0,len(l)):
        if l[m] == 0:
             l[m] = randint(-0,+0)

inc_percent = 1 #10 percent

for l in chunks:
    for m in range(0,len(l)):
        if l[m] <= 0:
            # negative value
            l[m] = 0 - abs(l[m]) + abs(l[m])*inc_percent/100
        else:
            #positive vaule
            l[m] =     abs(l[m]) + abs(l[m])*inc_percent/100

########################################################

# Below code generates separate wav files depending on the number of loud voice detected.

NEW_RATE = 1 #Change it to > 1 if any amplification is required

print '[+] Possibly ',len(chunks),'number of loud voice detected...'
for i in range(0, len(chunks)):
    new_frame_rate = info[0]*NEW_RATE
    print '[+] Creating No. ',str(i),'file..'
    split = wave.open('cut_'+str(i)+'.wav', 'w')
    split.setparams((info[0],info[1],info[2],0,info[4],info[5]))
#   split.setparams((info[0],info[1],new_frame_rate,0,info[4],info[5]))

    #Add some silence at start selecting +15 to -15
    for k in range(0,10000):
        single_frame = struct.pack('<h', randint(-25,+25))
        split.writeframes(single_frame)
    # Add the voice for the first time
    for frames in chunks[i]:
        single_frame = struct.pack('<h', frames)
        split.writeframes(single_frame)

    #Add some silence in between two digits
    for k in range(0,10000):
        single_frame = struct.pack('<h', randint(-25,+25))
        split.writeframes(single_frame)

    # Repeat effect :  Add the voice second time
    for frames in chunks[i]:
        single_frame = struct.pack('<h', frames)
        split.writeframes(single_frame)

    #Add silence at end
    for k in range(0,10000):
        single_frame = struct.pack('<h', randint(-25,+25))
        split.writeframes(single_frame)

    split.close()#Close each files

當這個檔案被分割成多份之後我們可以簡單的把他們轉換成flac格式然後把每個檔案單獨傳送到Google語音識別API進行識別。

影片已翻牆下載回來：

Solving reCaptcha Audio Challenge using Google Web Speech API Demo

現在我們已經解決了簡單的音訊驗證碼，我們再來嘗試一下複雜的。

這個圖片是用前面的程式畫出來的複雜語音驗證碼的波形圖:

enter image description here

從圖裡我們可以看到，這段音訊中一直存在一個恆定的噪聲，就是中間橫的藍色的那條，對於這樣的噪聲我們可以用標準的離散傅立葉變換，透過快速傅立葉變換fast Fourier transform(掛在高樹上的注意了!)來解決。

回到多年前校園中的數字訊號處理這門課，讓我們在純潔的正弦波 s(t)=sint(w*t)上疊加一個白噪聲，S(t)=S(t+n), F為S的傅立葉變換，把頻率高於和低於w的F值設為0，噪聲就被這樣過濾掉了。

enter image description here

比如這張圖裡，正弦波的頻譜域被分離了出來，只要把多餘頻率切掉，再逆變換回去就相當於過濾掉部分噪音了。其實自己寫這樣的過濾器實在太蛋疼了，Python有不少音訊處理庫並且自帶降噪濾鏡。

但是就像識別圖形驗證碼一樣，噪音(相當於圖片裡的干擾線和噪點)並不是破解語音驗證碼的難點，對於計算機來說，最難的部分還是分割，在複雜的語音驗證碼裡，除了主要的人聲之外，背景中還有2，3個人在唸叨各種東西，並且音量和主要的聲音差不多，無法透過音量分離，這樣的手段即使對於人類也很難識別的出。

我把目前的程式碼放在了https://github.com/debasishm89/hack_audio_captcha

這些程式碼還很原始，有很大改進的餘地。

0x02 結論

我把這個問題報給了Google安全團隊，他們說這個東西就是這樣設計的(苦逼的作者),如果系統懷疑對方不是人是機器的時候會自動提升到高難度驗證碼，目前Google不打算改進這個設計。

本文章來源於烏雲知識庫，此映象為了方便大家學習研究，文章版權歸烏雲知識庫！

前後端，相煎何太急...
2021-11-08
後端
在國內使用Google驗證碼reCaptcha
2020-10-30
GoAPT
谷歌ReCaptcha系統被破解機器語音驗證準確率高達85%
2019-01-03
谷歌APT
reCAPTCHA系統被破！語音驗證準確度85%
2019-01-03
APT
驗證碼識別
2024-06-20
驗證碼的識別和運用
2024-06-21
初探驗證碼識別
2020-08-19
python 驗證碼識別示例（一）某個網站驗證碼識別
2018-08-03
Python網站
百度API---語音識別
2020-12-19
API
谷歌reCaptcha驗證碼服務再次被攻破
2020-08-26
谷歌APT
用python呼叫百度語音識別api批量處理本地語音檔案
2020-11-08
PythonAPI
【驗證碼識別專欄】今天不煉丹，用 cv 來秒驗證碼
2024-12-10
影片直播app原始碼，傳送驗證碼驗證碼識別
2023-10-11
APP原始碼
常見驗證碼的弱點與驗證碼識別
2020-08-19
python利用Tesseract識別驗證碼
2019-01-21
Python
Python識別網站驗證碼
2020-08-19
Python網站
極驗驗證碼破解與研究
2020-11-22
使用 Fantom 程式語言實現英文數字驗證碼識別
2024-11-30
語音識別模型
2024-10-29
模型
【全網最高識別率】國稅局驗證碼識別
2020-12-05
用某語言API實現讓伺服器無密碼驗證
2018-09-11
API伺服器密碼
機器視覺以及驗證碼識別
2019-02-16
視覺
神器！使用Python 輕鬆識別驗證碼
2024-05-12
Python
使用 Ruby 識別英文數字驗證碼
2024-10-21
使用 Swift 識別英文數字驗證碼
2024-10-21
Swift
使用 OCaml 識別英文數字驗證碼
2024-10-22
使用Go語言破解滑塊驗證碼的完整流程
2024-10-11
Go
使用node+puppeteer破解驗證碼
2019-01-08
使用 Turing 破解滑塊驗證碼
2024-11-13
頂象驗證碼破解與研究
2021-01-13
playwright--自動化（二）：過滑塊驗證碼驗證碼缺口識別
2022-01-04
JavaScript的語音識別
2018-08-25
JavaScript
語音識別技術
2018-03-04
語音情感識別--RNN
2021-09-09
RNN
驗證碼的前世今生：從圖文識別到無感驗證
2020-05-13
實時驗證碼技術可改進生物識別身份驗證
2018-04-18
【JAVA】使用百度語音識別 Rest API，遇到識別結果顯示亂碼的問題和解決
2020-12-18
JavaRESTAPI
使用 Chapel 實現滑動驗證碼識別
2024-11-16

本是同根生，相煎何太急-用Google語音識別API破解reCaptcha驗證碼

0x00 背景

0x01 細節

0x02 結論

相關文章