安裝如下一些依賴:
- ffmpeg-python
- transformers
使用如下程式碼進行識別:
import whisper
model = whisper.load_model("small.pt")
result = model.transcribe("output_audio.wav")
print(result["text"])
另一個更為底層的呼叫方法:
audio = whisper.load_audio("output.wav")
audio = whisper.pad_or_trim(audio)
mel = whisper.log_mel_spectrogram(audio).to(model.device)
_,probs = model.detect_language(mel)
print("Detected language: {}".format(max(probs, key=probs.get)))
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)
print("You say:",result.text)
其中模型可以開啟__init__.py
檔案進行復制,如small模型在https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt
。
參考文章:
https://github.com/openai/whisper/tree/v20230306