關於BGE-M3接入LangChain時遇到的問題與解決方法

YTARO發表於2024-07-05

原文網址 : https://www.cnblogs.com/tarorat/p/18286378

本文基於https://github.com/datawhalechina/self-llm/blob/master/GLM-4/02-GLM-4-9B-chat%20langchain%20%E6%8E%A5%E5%85%A5.md提供的教程。由於使用本地部署的大模型，在繼承LangChain中的LLM類時需要重寫幾個函式。

但是在具體測試的時候出現了以下的錯誤

/root/miniconda3/lib/python3.12/site-packages/transformers/generation/utils.py:1659: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`.
  warnings.warn(
Traceback (most recent call last):
  File "/root/autodl-tmp/glm4LLM.py", line 63, in <module>
    print(llm.invoke("你是誰"))
          ^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/langchain_core/language_models/llms.py", line 276, in invoke
    self.generate_prompt(
  File "/root/miniconda3/lib/python3.12/site-packages/langchain_core/language_models/llms.py", line 633, in generate_prompt
    return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/langchain_core/language_models/llms.py", line 803, in generate
    output = self._generate_helper(
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/langchain_core/language_models/llms.py", line 670, in _generate_helper
    raise e
  File "/root/miniconda3/lib/python3.12/site-packages/langchain_core/language_models/llms.py", line 657, in _generate_helper
    self._generate(
  File "/root/miniconda3/lib/python3.12/site-packages/langchain_core/language_models/llms.py", line 1322, in _generate
    self._call(prompt, stop=stop, run_manager=run_manager, **kwargs)
  File "/root/autodl-tmp/glm4LLM.py", line 40, in _call
    generated_ids = self.model.generate(**model_inputs, **self.gen_kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/transformers/generation/utils.py", line 1758, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/transformers/generation/utils.py", line 2397, in _sample
    outputs = self(
              ^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/modules/transformers_modules/glm-4-9b-chat/modeling_chatglm.py", line 1005, in forward
    transformer_outputs = self.transformer(
                          ^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/modules/transformers_modules/glm-4-9b-chat/modeling_chatglm.py", line 887, in forward
    inputs_embeds = self.embedding(input_ids)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/modules/transformers_modules/glm-4-9b-chat/modeling_chatglm.py", line 823, in forward
    words_embeddings = self.word_embeddings(input_ids)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/sparse.py", line 163, in forward
    return F.embedding(
           ^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/torch/nn/functional.py", line 2264, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)

錯誤原因主要是因為input_ids（輸入資料）與model（模型）所在裝置不一致。

經過修改成下面的程式碼可以成功執行，主要修改了input_ids對應語句。

from langchain.llms.base import LLM
from typing import Any, List, Optional, Dict
from langchain.callbacks.manager import CallbackManagerForLLMRun
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

class ChatGLM4_LLM(LLM):
    # 基於本地 ChatGLM4 自定義 LLM 類
    tokenizer: AutoTokenizer = None
    model: AutoModelForCausalLM = None
    gen_kwargs: dict = None
        
    def __init__(self, model_name_or_path: str, gen_kwargs: dict = None):
        super().__init__()
        print("正在從本地載入模型...")
        self.tokenizer = AutoTokenizer.from_pretrained(
            model_name_or_path, trust_remote_code=True
        )
        self.model = AutoModelForCausalLM.from_pretrained(
            model_name_or_path,
            torch_dtype=torch.bfloat16,
            trust_remote_code=True,
            device_map="auto"
        ).eval()
        print("完成本地模型的載入")
        
        if gen_kwargs is None:
            gen_kwargs = {"max_length": 2500, "do_sample": True, "top_k": 1}
        self.gen_kwargs = gen_kwargs
        
    def _call(self, prompt: str, stop: Optional[List[str]] = None,
              run_manager: Optional[CallbackManagerForLLMRun] = None,
              **kwargs: Any) -> str:
        messages = [{"role": "user", "content": prompt}]
        model_inputs = self.tokenizer.apply_chat_template(
            messages, tokenize=True, return_tensors="pt", return_dict=True, add_generation_prompt=True
        )
        
        # 將input_ids移動到與模型相同的裝置
        device = next(self.model.parameters()).device
        model_inputs = {key: value.to(device) for key, value in model_inputs.items()}
        
        generated_ids = self.model.generate(**model_inputs, **self.gen_kwargs)
        generated_ids = [
            output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs['input_ids'], generated_ids)
        ]
        response = self.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
        return response
    
    @property
    def _identifying_params(self) -> Dict[str, Any]:
        """返回用於識別LLM的字典,這對於快取和跟蹤目的至關重要。"""
        return {
            "model_name": "glm-4-9b-chat",
            "max_length": self.gen_kwargs.get("max_length"),
            "do_sample": self.gen_kwargs.get("do_sample"),
            "top_k": self.gen_kwargs.get("top_k"),
        }

    @property
    def _llm_type(self) -> str:
        return "glm-4-9b-chat"

遇到問題的解決方法
2020-12-30
使用nodeAPI時遇到非同步問題的解決方法
2023-03-03
API非同步
Selenium爬蟲遇到超時TimeOut問題的解決方法
2018-12-06
爬蟲
Hanlp配置自定義詞典遇到的問題與解決方法
2019-06-17
HanLP
win10提示您已遇到關鍵問題的解決方法
2020-01-16
Win10
GO Modules的理解和遇到的問題解決方法
2021-06-25
Go
關於input的一些問題解決方法分享
2018-05-14
部署專案遇到的mysql問題以及解決方法
2019-03-15
MySql
關於Failed to resolve的問題解決
2024-06-11
AI
關於MQTT 使用遇到問題
2018-04-28
MQQT
mysql登入遇到ERROR 1045問題解決方法
2024-10-07
MySqlError
BIM,PIM接入GIS 需要解決的關鍵技術問題
2021-06-22
關於 Laravel 遷移遇到的問題
2020-01-09
Laravel
關於工作中遇到的問題
2020-05-30
關於操作駁回遇到的問題
2024-06-08
關於時間 PHP 處理包遇到的問題時間序列化差值問題
2020-06-18
PHP
解決Hexo關於圖片的問題
2020-11-14
Hexo
關於解決問題的幾個段位
2018-03-28
怎樣成為解決問題的高手？——關於問題解決的關鍵4步驟
2018-03-30
steam提示:在連線至steam伺服器時遇到問題解決方法
2022-05-21
伺服器
徹底解決關於CSocket類的Receive超時的問題（轉）
2018-07-12
MAC 安裝 VMAF 遇到的問題及解決方法記錄
2020-08-11
Mac
玩Deno遇到問題的解決方案
2018-06-02
關於 pytest Case 遇到重試的問題
2020-04-25
怎樣解決更新MacOS big sur時遇到的那些問題！
2020-11-22
Mac
基於CodeMirror開發線上編輯器時遇到的問題及解決方案
2024-05-02
關於移動端小圖示模糊問題的解決方法教程
2019-01-14
關於mysql查詢字符集不匹配問題的解決方法
2021-09-09
MySql
關於 LF will be replaced by CRLF 問題的解決方式
2019-08-09
解決代理超時問題的三種方法
2022-06-16
workerman開發過程中遇到的一些常見的問題與解決方法
2021-12-03
遊戲陪玩原始碼的輪詢鎖，使用時遇到的問題與解決方案
2021-12-06
遊戲原始碼
Chrome安裝sci-hub外掛遇到的問題解決方法
2020-09-25
Chrome
更新macOS Monterey後遇到的各種Bug問題及解決方法
2021-06-15
Mac
關於在執行java連線MongoDB時遇到的連線超時問題
2018-11-06
JavaMongoDB
nodejs 近期所遇到的問題及解決
2020-12-13
NodeJS
快取穿透問題與解決方法
2018-10-10
快取穿透
關於quartus ii軟體中註釋亂碼問題的解決方法
2018-10-18

關於BGE-M3接入LangChain時遇到的問題與解決方法

相關文章