talk-to-your-data
https://github.com/fanqingsong/talk-to-your-data
talk-to-your-data project
This project aid you to build a talk-to-your-data chatbot using openai LLM, LangChain, and Streamlit. Basically:
You clone the project
git clone https://github.com/emmakodes/talk-to-your-data.git
cd talk-to-your-data
create a new virtual environment called
.venv
python -m venv .venv
Activate the virtual environment
.venv\Scripts\activate
- Install the project requirements
pip install -r requirements.txt
Add your document to
mydocument
directory and deleteanimalsinresearch.pdf
already existing insidemydocument
directory.animalsinresearch.pdf
is my own document except you want to test with my own dataDelete the files in
vector_index
directory so as to hold the vectors of your own document.Start the app using the following command:
streamlit run app.py
效果
跟文件無關的問題
按照模型自己的知識回答。
文件中提到的問題
按文件回答。
模型思考過程:
參考
https://lmstudio.ai/docs/text-embeddings
https://dev.to/emmakodes_/how-to-build-a-talk-to-your-data-chatbot-using-openai-llm-langchain-and-streamlit-27po
CustomAPIEmbeddings 呼叫本地分詞巢狀模型
https://lmstudio.ai/docs/text-embeddings
https://api.python.langchain.com/en/latest/embeddings/langchain_core.embeddings.embeddings.Embeddings.html#langchain_core.embeddings.embeddings.Embeddings
https://api.python.langchain.com/en/latest/embeddings/langchain_community.embeddings.ollama.OllamaEmbeddings.html
https://python.langchain.com/v0.2/docs/integrations/text_embedding/ollama/#usage
https://python.langchain.com/v0.1/docs/modules/data_connection/text_embedding/
https://api.python.langchain.com/en/latest/_modules/langchain_community/embeddings/openai.html#OpenAIEmbeddings.embed_query
https://api.python.langchain.com/en/latest/embeddings/langchain_openai.embeddings.base.OpenAIEmbeddings.html
啟發於:
https://github.com/langchain-ai/langchain/discussions/19467
https://stackoverflow.com/questions/77217193/langchain-how-to-use-a-custom-embedding-model-locally
https://api.python.langchain.com/en/latest/embeddings/langchain_community.embeddings.localai.LocalAIEmbeddings.html
https://python.langchain.com/v0.2/docs/integrations/text_embedding/localai/
實現
import pprint import requests from typing import List from langchain_core.embeddings import Embeddings class CustomAPIEmbeddings(Embeddings): def __init__(self, model_name: str, api_url: str): self.model_name = model_name self.api_url = api_url def embed_documents(self, texts: List[str]) -> List[List[float]]: response = requests.post( self.api_url, headers={'Authorization': 'Bearer your_token_here'}, json={ "model": self.model_name, "input": texts, }, ) pprint.pprint(response.json()) data_list = response.json()['data'] # Adjust this based on the response format of your API ans = [one['embedding'] for one in data_list] return ans def embed_query(self, text: str) -> List[float]: """Call out to OpenAI's embedding endpoint for embedding query text. Args: text: The text to embed. Returns: Embedding for the text. """ return self.embed_documents([text])[0] # # response = requests.post( # self.api_url, # headers={'Authorization': 'Bearer your_token_here'}, # json={ # "model": self.model_name, # "input": text, # }, # ) # # ret = response.json() # Adjust this based on the response format of your API # pprint.pprint(ret) # # return ret['data'][0]['embedding'] if __name__ == '__main__': embeddings = CustomAPIEmbeddings( model_name="Xenova/text-embedding-ada-002", api_url="http://192.168.0.108:1234/v1/embeddings", # api_key="sss" ) query = "What is the cultural heritage of India?" query_embedding = embeddings.embed_query(query) pprint.pprint(query_embedding)
分詞模型 和 LLM 不是同一個模型。
https://www.aneasystone.com/archives/2023/07/doc-qa-using-embedding.html
https://stackoverflow.blog/2023/11/09/an-intuitive-introduction-to-text-embeddings/
https://openai.com/index/introducing-text-and-code-embeddings/