前言
本文透過Codeblaze.SemanticKernel這個專案,學習如何實現ITextEmbeddingGenerationService介面,接入本地嵌入模型。
專案地址:https://github.com/BLaZeKiLL/Codeblaze.SemanticKernel
實踐
SemanticKernel初看以為只支援OpenAI的各種模型,但其實也提供了強大的抽象能力,可以透過自己實現介面,來實現接入不相容OpenAI格式的模型。
Codeblaze.SemanticKernel這個專案實現了ITextGenerationService、IChatCompletionService與ITextEmbeddingGenerationService介面,由於現在Ollama的對話已經支援了OpenAI格式,因此可以不用實現ITextGenerationService和IChatCompletionService來接入Ollama中的模型了,但目前Ollama的嵌入還沒有相容OpenAI的格式,因此可以透過實現ITextEmbeddingGenerationService介面,接入Ollama中的嵌入模型。
檢視ITextEmbeddingGenerationService介面:
代表了一種生成浮點型別文字嵌入的生成器。
再看看IEmbeddingGenerationService<string, float>介面:
[Experimental("SKEXP0001")]
public interface IEmbeddingGenerationService<TValue, TEmbedding> : IAIService where TEmbedding : unmanaged
{
Task<IList<ReadOnlyMemory<TEmbedding>>> GenerateEmbeddingsAsync(IList<TValue> data, Kernel? kernel = null, CancellationToken cancellationToken = default(CancellationToken));
}
再看看IAIService介面:
說明我們只要實現了
Task<IList<ReadOnlyMemory<TEmbedding>>> GenerateEmbeddingsAsync(IList<TValue> data, Kernel? kernel = null, CancellationToken cancellationToken = default(CancellationToken));
IReadOnlyDictionary<string, object?> Attributes { get; }
這個方法和屬性就行。
學習Codeblaze.SemanticKernel中是怎麼做的。
新增OllamaBase類:
public interface IOllamaBase
{
Task PingOllamaAsync(CancellationToken cancellationToken = new());
}
public abstract class OllamaBase<T> : IOllamaBase where T : OllamaBase<T>
{
public IReadOnlyDictionary<string, object?> Attributes => _attributes;
private readonly Dictionary<string, object?> _attributes = new();
protected readonly HttpClient Http;
protected readonly ILogger<T> Logger;
protected OllamaBase(string modelId, string baseUrl, HttpClient http, ILoggerFactory? loggerFactory)
{
_attributes.Add("model_id", modelId);
_attributes.Add("base_url", baseUrl);
Http = http;
Logger = loggerFactory is not null ? loggerFactory.CreateLogger<T>() : NullLogger<T>.Instance;
}
/// <summary>
/// Ping Ollama instance to check if the required llm model is available at the instance
/// </summary>
/// <param name="cancellationToken"></param>
public async Task PingOllamaAsync(CancellationToken cancellationToken = new())
{
var data = new
{
name = Attributes["model_id"]
};
var response = await Http.PostAsJsonAsync($"{Attributes["base_url"]}/api/show", data, cancellationToken).ConfigureAwait(false);
ValidateOllamaResponse(response);
Logger.LogInformation("Connected to Ollama at {url} with model {model}", Attributes["base_url"], Attributes["model_id"]);
}
protected void ValidateOllamaResponse(HttpResponseMessage? response)
{
try
{
response.EnsureSuccessStatusCode();
}
catch (HttpRequestException)
{
Logger.LogError("Unable to connect to ollama at {url} with model {model}", Attributes["base_url"], Attributes["model_id"]);
}
}
}
注意這個
public IReadOnlyDictionary<string, object?> Attributes => _attributes;
實現了介面中的屬性。
新增OllamaTextEmbeddingGeneration類:
#pragma warning disable SKEXP0001
public class OllamaTextEmbeddingGeneration(string modelId, string baseUrl, HttpClient http, ILoggerFactory? loggerFactory)
: OllamaBase<OllamaTextEmbeddingGeneration>(modelId, baseUrl, http, loggerFactory),
ITextEmbeddingGenerationService
{
public async Task<IList<ReadOnlyMemory<float>>> GenerateEmbeddingsAsync(IList<string> data, Kernel? kernel = null,
CancellationToken cancellationToken = new())
{
var result = new List<ReadOnlyMemory<float>>(data.Count);
foreach (var text in data)
{
var request = new
{
model = Attributes["model_id"],
prompt = text
};
var response = await Http.PostAsJsonAsync($"{Attributes["base_url"]}/api/embeddings", request, cancellationToken).ConfigureAwait(false);
ValidateOllamaResponse(response);
var json = JsonSerializer.Deserialize<JsonNode>(await response.Content.ReadAsStringAsync().ConfigureAwait(false));
var embedding = new ReadOnlyMemory<float>(json!["embedding"]?.AsArray().GetValues<float>().ToArray());
result.Add(embedding);
}
return result;
}
}
注意實現了GenerateEmbeddingsAsync方法。實現的思路就是向Ollama中的嵌入介面傳送請求,獲得embedding陣列。
為了在MemoryBuilder中能用還需要新增擴充套件方法:
#pragma warning disable SKEXP0001
public static class OllamaMemoryBuilderExtensions
{
/// <summary>
/// Adds Ollama as the text embedding generation backend for semantic memory
/// </summary>
/// <param name="builder">kernel builder</param>
/// <param name="modelId">Ollama model ID to use</param>
/// <param name="baseUrl">Ollama base url</param>
/// <returns></returns>
public static MemoryBuilder WithOllamaTextEmbeddingGeneration(
this MemoryBuilder builder,
string modelId,
string baseUrl
)
{
builder.WithTextEmbeddingGeneration((logger, http) => new OllamaTextEmbeddingGeneration(
modelId,
baseUrl,
http,
logger
));
return builder;
}
}
開始使用
public async Task<ISemanticTextMemory> GetTextMemory3()
{
var builder = new MemoryBuilder();
var embeddingEndpoint = "http://localhost:11434";
var cancellationTokenSource = new System.Threading.CancellationTokenSource();
var cancellationToken = cancellationTokenSource.Token;
builder.WithHttpClient(new HttpClient());
builder.WithOllamaTextEmbeddingGeneration("mxbai-embed-large:335m", embeddingEndpoint);
IMemoryStore memoryStore = await SqliteMemoryStore.ConnectAsync("memstore.db");
builder.WithMemoryStore(memoryStore);
var textMemory = builder.Build();
return textMemory;
}
builder.WithOllamaTextEmbeddingGeneration("mxbai-embed-large:335m", embeddingEndpoint);
實現了WithOllamaTextEmbeddingGeneration這個擴充套件方法,因此可以這麼寫,使用的是mxbai-embed-large:335m這個向量模型。
我使用WPF簡單做了個介面,來試試效果。
找了一個新聞嵌入:
文字向量化存入資料庫中:
現在測試RAG效果:
回答的效果也還可以。
大模型使用的是線上api的Qwen/Qwen2-72B-Instruct,嵌入模型使用的是本地Ollama中的mxbai-embed-large:335m。