SemanticKernel/C#:實現介面,接入本地嵌入模型

mingupupup發表於2024-08-06

前言

本文透過Codeblaze.SemanticKernel這個專案,學習如何實現ITextEmbeddingGenerationService介面,接入本地嵌入模型。

專案地址:https://github.com/BLaZeKiLL/Codeblaze.SemanticKernel

實踐

SemanticKernel初看以為只支援OpenAI的各種模型,但其實也提供了強大的抽象能力,可以透過自己實現介面,來實現接入不相容OpenAI格式的模型。

Codeblaze.SemanticKernel這個專案實現了ITextGenerationService、IChatCompletionService與ITextEmbeddingGenerationService介面,由於現在Ollama的對話已經支援了OpenAI格式,因此可以不用實現ITextGenerationService和IChatCompletionService來接入Ollama中的模型了,但目前Ollama的嵌入還沒有相容OpenAI的格式,因此可以透過實現ITextEmbeddingGenerationService介面,接入Ollama中的嵌入模型。

檢視ITextEmbeddingGenerationService介面:

image-20240806081346110

代表了一種生成浮點型別文字嵌入的生成器。

再看看IEmbeddingGenerationService<string, float>介面:

[Experimental("SKEXP0001")]
public interface IEmbeddingGenerationService<TValue, TEmbedding> : IAIService where TEmbedding : unmanaged
{
Task<IList<ReadOnlyMemory<TEmbedding>>> GenerateEmbeddingsAsync(IList<TValue> data, Kernel? kernel = null, CancellationToken cancellationToken = default(CancellationToken));
}

再看看IAIService介面:

image-20240806081733336

說明我們只要實現了

Task<IList<ReadOnlyMemory<TEmbedding>>> GenerateEmbeddingsAsync(IList<TValue> data, Kernel? kernel = null, CancellationToken cancellationToken = default(CancellationToken));

IReadOnlyDictionary<string, object?> Attributes { get; }

這個方法和屬性就行。

學習Codeblaze.SemanticKernel中是怎麼做的。

新增OllamaBase類:

 public interface IOllamaBase
{
Task PingOllamaAsync(CancellationToken cancellationToken = new());
}
public abstract class OllamaBase<T> : IOllamaBase where T : OllamaBase<T>
{
public IReadOnlyDictionary<string, object?> Attributes => _attributes;
private readonly Dictionary<string, object?> _attributes = new();
protected readonly HttpClient Http;
protected readonly ILogger<T> Logger;

protected OllamaBase(string modelId, string baseUrl, HttpClient http, ILoggerFactory? loggerFactory)
{
_attributes.Add("model_id", modelId);
_attributes.Add("base_url", baseUrl);

Http = http;
Logger = loggerFactory is not null ? loggerFactory.CreateLogger<T>() : NullLogger<T>.Instance;
}

/// <summary>
/// Ping Ollama instance to check if the required llm model is available at the instance
/// </summary>
/// <param name="cancellationToken"></param>
public async Task PingOllamaAsync(CancellationToken cancellationToken = new())
{
var data = new
{
name = Attributes["model_id"]
};

var response = await Http.PostAsJsonAsync($"{Attributes["base_url"]}/api/show", data, cancellationToken).ConfigureAwait(false);

ValidateOllamaResponse(response);

Logger.LogInformation("Connected to Ollama at {url} with model {model}", Attributes["base_url"], Attributes["model_id"]);
}

protected void ValidateOllamaResponse(HttpResponseMessage? response)
{
try
{
response.EnsureSuccessStatusCode();
}
catch (HttpRequestException)
{
Logger.LogError("Unable to connect to ollama at {url} with model {model}", Attributes["base_url"], Attributes["model_id"]);
}
}
}

注意這個

public IReadOnlyDictionary<string, object?> Attributes => _attributes;

實現了介面中的屬性。

新增OllamaTextEmbeddingGeneration類:

#pragma warning disable SKEXP0001
public class OllamaTextEmbeddingGeneration(string modelId, string baseUrl, HttpClient http, ILoggerFactory? loggerFactory)
: OllamaBase<OllamaTextEmbeddingGeneration>(modelId, baseUrl, http, loggerFactory),
ITextEmbeddingGenerationService
{
public async Task<IList<ReadOnlyMemory<float>>> GenerateEmbeddingsAsync(IList<string> data, Kernel? kernel = null,
CancellationToken cancellationToken = new())
{
var result = new List<ReadOnlyMemory<float>>(data.Count);

foreach (var text in data)
{
var request = new
{
model = Attributes["model_id"],
prompt = text
};

var response = await Http.PostAsJsonAsync($"{Attributes["base_url"]}/api/embeddings", request, cancellationToken).ConfigureAwait(false);

ValidateOllamaResponse(response);

var json = JsonSerializer.Deserialize<JsonNode>(await response.Content.ReadAsStringAsync().ConfigureAwait(false));

var embedding = new ReadOnlyMemory<float>(json!["embedding"]?.AsArray().GetValues<float>().ToArray());

result.Add(embedding);
}

return result;
}
}

注意實現了GenerateEmbeddingsAsync方法。實現的思路就是向Ollama中的嵌入介面傳送請求,獲得embedding陣列。

為了在MemoryBuilder中能用還需要新增擴充套件方法:

#pragma warning disable SKEXP0001
public static class OllamaMemoryBuilderExtensions
{
/// <summary>
/// Adds Ollama as the text embedding generation backend for semantic memory
/// </summary>
/// <param name="builder">kernel builder</param>
/// <param name="modelId">Ollama model ID to use</param>
/// <param name="baseUrl">Ollama base url</param>
/// <returns></returns>
public static MemoryBuilder WithOllamaTextEmbeddingGeneration(
this MemoryBuilder builder,
string modelId,
string baseUrl
)
{
builder.WithTextEmbeddingGeneration((logger, http) => new OllamaTextEmbeddingGeneration(
modelId,
baseUrl,
http,
logger
));

return builder;
}
}

開始使用

 public async Task<ISemanticTextMemory> GetTextMemory3()
{
var builder = new MemoryBuilder();
var embeddingEndpoint = "http://localhost:11434";
var cancellationTokenSource = new System.Threading.CancellationTokenSource();
var cancellationToken = cancellationTokenSource.Token;
builder.WithHttpClient(new HttpClient());
builder.WithOllamaTextEmbeddingGeneration("mxbai-embed-large:335m", embeddingEndpoint);
IMemoryStore memoryStore = await SqliteMemoryStore.ConnectAsync("memstore.db");
builder.WithMemoryStore(memoryStore);
var textMemory = builder.Build();
return textMemory;
}
  builder.WithOllamaTextEmbeddingGeneration("mxbai-embed-large:335m", embeddingEndpoint);

實現了WithOllamaTextEmbeddingGeneration這個擴充套件方法,因此可以這麼寫,使用的是mxbai-embed-large:335m這個向量模型。

我使用WPF簡單做了個介面,來試試效果。

找了一個新聞嵌入:

image-20240806090946822

文字向量化存入資料庫中:

image-20240806091040483

現在測試RAG效果:

image-20240806091137623

image-20240806091310159

image-20240806091404424

回答的效果也還可以。

大模型使用的是線上api的Qwen/Qwen2-72B-Instruct,嵌入模型使用的是本地Ollama中的mxbai-embed-large:335m。

相關文章