Semantic Kernel 的 Memory 有兩種實現,一個是 Semantic Kernel 內建的 Semantic Memory,一個是獨立的 Kernel Memory,Kernel Memory 是從 Semantic Kernel 進化而來。
關於 Semantic Memory 的介紹(來源):
Semantic Memory (SM) is a library for C#, Python, and Java that wraps direct calls to databases and supports vector search. It was developed as part of the Semantic Kernel (SK) project and serves as the first public iteration of long-term memory. The core library is maintained in three languages, while the list of supported storage engines (known as "connectors") varies across languages.
學習目標:透過 Semantic Memory 呼叫 OpenAI 的 api,使用 text-embedding-ada-002 模型生成文字的 embedding,儲存在 in-memory 向量資料庫中,然後進行語義搜尋。
學習材料:Semantic Kernel 原始碼倉庫中的示例程式 Example14_SemanticMemory.cs
建立 .NET 控制檯專案
dotnet new console
dotnet add package Microsoft.SemanticKernel
dotnet add package --prerelease Microsoft.SemanticKernel.Plugins.Memory
建立 ISemanticTextMemory 例項
使用 MemoryBuilder
基於 OpenAITextEmbeddingGenerationService
建立 ISemanticTextMemory
的例項 SemanticTextMemory
#pragma warning disable SKEXP0011
#pragma warning disable SKEXP0003
#pragma warning disable SKEXP0052
ISemanticTextMemory memory = new MemoryBuilder()
.WithOpenAITextEmbeddingGeneration("text-embedding-ada-002", apiKey)
.WithMemoryStore(new VolatileMemoryStore())
.Build();
#pragma warning restore SKEXP0052
#pragma warning restore SKEXP0003
#pragma warning restore SKEXP0011
注:上面程式碼中的 warning disable
是因為 MemoryBuilder
以及2個擴充套件方法都是 experimental feature
準備使用者生成 Embedding 的文字資料
var sampleData = new Dictionary<string, string>
{
["https://github.com/microsoft/semantic-kernel/blob/main/README.md"]
= "README: Installation, getting started, and how to contribute",
["https://github.com/microsoft/semantic-kernel/blob/main/dotnet/notebooks/02-running-prompts-from-file.ipynb"]
= "Jupyter notebook describing how to pass prompts from a file to a semantic plugin or function"
};
生成 Embedding 並儲存至 in-memory 向量資料庫
var i = 0;
foreach (var entry in sampleData)
{
await memory.SaveReferenceAsync(
collection: "SKGitHub",
externalSourceName: "GitHub",
externalId: entry.Key,
description: entry.Value,
text: entry.Value);
Console.Write($" #{++i} saved.");
}
在 SaveReferenceAsync
方法中呼叫了 IEmbeddingGenerationService
的 GenerateEmbeddingAsync
方法生成 embedding,詳見 SK 原始碼 SemanticTextMemory.cs#L60
var embedding = await this._embeddingGenerator.GenerateEmbeddingAsync(text, kernel, cancellationToken).ConfigureAwait(false);
注:embedding
值的型別是 ReadOnlyMemory<float>
我們這裡用的是 OpenAI,所以呼叫的是 OpenAITextEmbeddingGenerationService
的 GenerateEmbeddingsAsync
方法生成 embedding(詳見SK原始碼),最終呼叫的是 Azure.AI.OpenAI.OpenAIClient
的 GetEmbeddingsAsync
方法,詳見 Azure SDK for .NET 的原始碼 OpenAIClient.cs#L552
基於 Embedding 資料進行語義搜尋
var query = "How do I get started?";
var memoryResults = memory.SearchAsync("SKGitHub", query, limit: 1, minRelevanceScore: 0.5);
在 SearchAsync
方法中也呼叫了 GenerateEmbeddingsAsync
方法基於查詢文字生成 embedding,詳見 SemanticTextMemory.cs#L108
輸出語義搜尋的結果
await foreach (var memoryResult in memoryResults)
{
Console.Write($"Result:");
Console.Write(" URL: : " + memoryResult.Metadata.Id);
Console.Write(" Title : " + memoryResult.Metadata.Description);
Console.Write(" Relevance: " + memoryResult.Relevance);
}
執行控制檯程式
輸出結果:
#1 saved.
#2 saved.
Result:
URL: : https://github.com/microsoft/semantic-kernel/blob/main/README.md
Title : README: Installation, getting started, and how to contribute
Relevance: 0.8224089741706848
搜尋成功,學習完成,完整示例程式碼見 https://www.cnblogs.com/dudu/articles/18037216