文字到語音(tts)

_clai發表於2024-05-02

原文網址 : https://www.cnblogs.com/chlai/p/18170377

Web Speech API

使你能夠將語音資料合併到 Web應用程式中。Web Speech API 有兩個部分：SpeechSynthesis 語音合成（文字到語音 TTS）和 SpeechRecognition 語音識別（非同步語音識別）

SpeechSynthesis: 語音服務的控制器介面, 獲取裝置上關於可用的合成聲音的資訊，開始、暫停語音，或除此之外的其他命令

語音合成透過 SpeechSynthesis 介面進行訪問，它提供了文字到語音（TTS）的能力，這使得程式能夠讀出它們的文字內容（通常使用裝置預設的語音合成器）。不同的聲音類型別透過 SpeechSynthesisVoice 物件進行表示，不同部分的文字則由 SpeechSynthesisUtterance 物件來表示。你可以將它們傳遞給 SpeechSynthesis.speak() 方法來產生語音。

SpeechSynthesisUtterance: 語音請求。它包含語音服務應該閱讀的內容以及如何閱讀的資訊（例如語言，音高和音量）

SpeechRecognition: 語音識別

語音識別透過 SpeechRecognition 介面進行訪問，它提供了識別從音訊輸入（通常是裝置預設的語音識別服務）中識別語音情景的能力。一般來說，你將使用該介面的建構函式來構造一個新的 SpeechRecognition 物件，該物件包含了一系列有效的物件處理函式來檢測識別裝置麥克風中的語音輸入。SpeechGrammar 介面則表示了你應用中想要識別的特定文法。文法則透過 JSpeech Grammar Format (JSGF.) 來定義。

SpeechGrammar: 語音識別物件服務想要識別的一系列詞語或模式

文字到語音

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Web Speech API</title>
  </head>
  <body>
    <strong>Web Speech API</strong>
    <hr />
    <select class="select-voice"></select>
    <textarea class="text" cols="50" rows="10"></textarea><br /><br />
    <button type="button" class="btn-play">文字語音播放</button>
    <button type="button" class="btn-pause">暫停播放</button>
    <button type="button" class="btn-resume">恢復播放</button>
    <button type="button" class="btn-end">停止播放</button>

    <script>
      // Web Speech API

      const playBtn = document.querySelector('.btn-play');
      const pauseBtn = document.querySelector('.btn-pause');
      const resumeBtn = document.querySelector('.btn-resume');
      const endBtn = document.querySelector('.btn-end');

      // 文字轉語音

      // 建立 SpeechSynthesisUtterance 物件
      const synth = globalThis.speechSynthesis;
      // console.log("synth => ", synth)

      const text = document.querySelector('.text');
      text.value = 'hellow world, this is a test for web speech api.';

      // 選擇語音聲音
      const selectVoice = document.querySelector('.select-voice');
      const fragment = document.createDocumentFragment();
      const voiceList = [];
      synth.addEventListener('voiceschanged', () => {
        if (voiceList.length === 0) {
          synth.getVoices().forEach((voice) => {
            if (voice.lang.includes('zh')) {
              const option = document.createElement('option');
              option.dataset.lang = voice.lang;
              option.value = voice.name;
              option.textContent = voice.name;
              fragment.appendChild(option);
              voiceList.push(voice);
            }
          });
          selectVoice.appendChild(fragment);
        }
        // 選擇語音聲音
        handleSelectVoice();

        playBtn.removeAttribute('disabled');
      });

      selectVoice.addEventListener('change', handleSelectVoice);

      // 切換語音聲音
      function handleSelectVoice() {
        /** @type {SpeechSynthesisVoice} */
        const selectedVoice = voiceList.at(selectVoice.selectedIndex);
        utterance.voice = selectedVoice;
        // console.log('selectedVoice => ', selectedVoice.name);
      }

      const utterance = new SpeechSynthesisUtterance();
      // 設定文字內容
      utterance.text = text.value;
      const info = {
        start: 0,
        end: 0,
        elapsedTime: 0,
        paused: false,
      };
      playBtn.addEventListener('click', () => {
        // 移除所有語音談話佇列中的談話
        synth.pending && synth.cancel();
        // 新增一個 utterance 到語音談話佇列；它將會在其他語音談話播放完之後播放。
        synth.speak(utterance);
      });
      // 暫停播放
      pauseBtn.addEventListener('click', () => {
        synth.pause();
      });
      // 恢復播放
      resumeBtn.addEventListener('click', () => {
        synth.cancel();
        const sliceText = utterance.text.slice(info.end);
        // console.log('sliceText => ', sliceText);
        utterance.text = sliceText;
        synth.speak(utterance);
      });
      // 結束播放
      endBtn.addEventListener('click', () => {
        synth.cancel();
        info.paused = false;
      });

      utterance.addEventListener('boundary', (e) => {
        const {
          charIndex,
          charLength,
          elapsedTime,
          utterance: { text },
        } = e;
        // name: `word` 所語音的字元，`sentence` 完整句的邊界

        // 儲存正在語音的字元索引和已讀時間
        // const char = text.slice(charIndex, charIndex + charLength);
        info.start = charIndex;
        info.end = charIndex + charLength;
        info.elapsedTime = elapsedTime;
      });
      utterance.addEventListener('pause', (e) => {
        console.log('pause');
      });
      utterance.addEventListener('resume', (e) => {
        console.log('resume');
      });
      utterance.addEventListener('end', (e) => {
        console.log('end');
      });

      window.addEventListener('beforeunload', () => {
        // 停止播放
        synth.pause();
        synth.cancel();
      });
    </script>
  </body>
</html>

語音識別

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Web Speech API</title>
  </head>
  <body>
    <strong>Web Speech API</strong>
    <hr />
    <audio src="./music.m4a" controls></audio>
    <textarea class="text" cols="50" rows="10"></textarea><br /><br />
    <button type="button" class="btn-speech">語音轉文字</button>

    <script>
      // Web Speech API

      const text = document.querySelector('.text');

      // 按鈕控制
      const speechBtn = document.querySelector('.btn-speech');

      // 語音識別

      // 建立 SpeechRecognition 物件
      /** @type {SpeechRecognition} */
      const recognition = new webkitSpeechRecognition();
      // console.log('recognition => ', recognition);

      // 是否連續識別
      recognition.continuous = true;
      // 識別結果是否包含中間結果
      recognition.interimResults = true;
      // 識別語言
      recognition.lang = 'zh-CN'; // zh-CN, en-US

      speechBtn.addEventListener('click', () => {
        // 開始識別
        recognition.start();
      });

      recognition.onstart = (e) => {
        console.log('開始', e);
      };
      recognition.onaudiostart = (e) => {
        console.log('開始錄音');
      };
      recognition.onspeechstart = (e) => {
        console.log('開始說話');
      };
      // 識別結束
      recognition.onspeechend = (e) => {
        console.log('語音識別結束');
        recognition.stop();
      };
      recognition.onaudioend = (e) => {
        console.log('結束錄音');
      };
      recognition.onend = (e) => {
        console.log('結束');
        // 結束後，重新開始識別
        recognition.start();
      };

      // 識別結果
      recognition.onresult = (e) => {
        const resultList = Object.values(e.results);
        let str = '';
        resultList.forEach((result) => {
          str += result[0].transcript + '\n';
        });

        text.value = str;
        console.log('識別結果: ',e.resultIndex, str);
      };

      // 未識別出結果
      recognition.onnomatch = (e) => {
        console.log('No match', e);
      };

      // 識別錯誤
      recognition.onerror = (e) => {
        // not-allowed：使用者禁止訪問麥克風許可權 audio-capture: 麥克風未開啟 no-speech: 沒有檢測到語音 network: 網路連線問題
        console.log('識別錯誤原因: ', e.error);
        if (e.error === 'not-speech') {
          recognition.stop();
        }
      };
    </script>
  </body>
</html>

C# TTS-文字轉語音
2018-03-28
C#TTS
Coqui TTS合成語音
2024-12-03
UITTS
TTS 擂臺: 文字轉語音模型的自由搏擊場
2024-03-16
TTS模型
ChatTTS,語氣韻律媲美真人的開源TTS模型,文字轉語音界的新魁首,對標微軟Azure-tts
2024-05-31
TTS模型微軟
語音轉文字工具，語音轉文字怎樣轉？
2019-06-12
OpenVoiceV2本地部署教程,蘋果MacOs部署流程,聲音響度統一,文字轉語音,TTS
2024-05-10
蘋果MacTTS
開源語音合成庫 coqui TTS 使用記錄
2024-07-31
UITTS
口播神器,基於Edge,微軟TTS(text-to-speech)文字轉語音免費開源庫edge-tts實踐(Python3.10)
2023-03-07
微軟TTSPython
F5-TTS語音克隆漢化整合包1016
2024-10-16
TTS
AI 聲音：數字音訊、語音識別、TTS 簡介與使用示例
2024-11-28
AI音訊TTS
如何用Python語音合成，以及文字轉語音~
2022-09-23
Python
iOS---語音轉文字
2018-05-26
iOS
chrome語音文字互轉
2024-11-04
Chrome
[js常用]文字轉化成語音
2018-12-01
JS
aardio實現語音閱讀文字【包含選擇語音庫】
2024-08-02
AVFoundation 文字轉語音和音訊錄製播放
2019-04-19
音訊
蘋果手機文字轉語音方法
2019-01-04
蘋果
Windows部署語音轉文字專案_Whisper
2024-07-04
Windows
e語音【刪除文字右邊字元】
2024-07-05
字元
前端語音轉文字實踐總結
2022-05-19
前端
語音合成（TTS）技術在有道詞典筆中的應用實踐
2021-12-20
TTS
win10怎麼語音讀txt文字_win10如何讓小娜語音朗讀txt文字
2020-04-08
Win10
耳朵沒錯，是聲音太真了，位元組豆包語音合成成果Seed-TTS技術揭秘
2024-06-26
TTS
文字語音互相轉換系統設計
2024-04-24
gTTS: 強大的Python文字轉語音庫
2024-10-18
TTSPython
文字轉語音軟體 VPot v2411
2024-11-22
5 款不錯的開源語音識別/語音文字轉換系統
2019-06-22
小程式--語音合成tts 對接多平臺（訊飛，思必馳，百度）
2019-01-18
TTS
語音轉文字從裡面擷取出時間
2018-11-02
VoiceCraft: 文字生成任何人的語音技術
2024-03-30
Raft
如何在Python中將語音轉換為文字
2020-07-29
Python
AI 語音獨角獸 ElevenLabs C 輪融資估值超 30 億美元；港科大 Llasa TTS：15 秒聲音克隆支援中英雙語
2025-01-26
AITTS
構建一個語音轉文字的WebApi服務
2023-12-07
WebAPI
快速實現語音轉文字，還自帶翻譯
2019-06-25
Premiere Pro 2022離線語音轉文字教程，圖文！
2022-03-10
REM
搜狗輸入法語音轉文字型驗報告
2020-11-26
騰訊雲語音合成TTS的優勢和場景介紹以及優惠套餐推薦
2020-11-21
TTS
Fish Speech 更新V1.5：領先的多語言文字轉語音模型
2024-12-06
模型

文字到語音(tts)

Web Speech API

SpeechSynthesis: 語音服務的控制器介面, 獲取裝置上關於可用的合成聲音的資訊，開始、暫停語音，或除此之外的其他命令

SpeechSynthesisUtterance: 語音請求。 它包含語音服務應該閱讀的內容以及如何閱讀的資訊（例如語言，音高和音量）

SpeechRecognition: 語音識別

SpeechGrammar: 語音識別物件服務想要識別的一系列詞語或模式

相關文章

SpeechSynthesisUtterance: 語音請求。它包含語音服務應該閱讀的內容以及如何閱讀的資訊（例如語言，音高和音量）