100tok/s生成速度,就問夠不夠fast?用過cursor的小夥伴一定對有個功能印象深刻,那就是fast apply功能。只要點一下,就可以把對話方塊中AI生成的程式碼快速地應用到編輯器的當前程式碼檔案裡, 然後下一步就是對比變更,accept或者reject程式碼塊,相比於要手動從對話方塊複製程式碼到編輯器裡貼上修改,這個方式非常高效方便,是cursor的殺手鐧功能.
現在可以透過vscode外掛continue使用本地的小模型來實現這個功能,這個模型就是Qwen2.5-Coder-1.5b。1.5B的GGUF量化模型在我本地電腦M2 Max上透過LMStudio來跑,測試速度大約是q8_0 100 tok/s,q4_0 140 tok/s,fp16 70 tok/s,7B版本的q4_0 40 tok/s。兼顧效能和速度的話,我還是選擇了1.5B的q8_0版本。
這件事起因是我看到一個專門用於fast apply的微調模型FastApply-1.5B-v1.0,是透過微調qwen2.5-coder-1.5B和7B模型實現的,專門用於程式碼合併fast apply功能的模型,準確率比原版有提升。
我試圖把它接入到continue裡,不知道continue的小夥伴可以看這個影片入門(continue開源AI程式碼程式設計助手-自定義api-SiliconFlow矽基流動與deepseek配置教程-嗶哩嗶哩)。可惜它的輸出格式是<updated-code>[Full-complete updated file]</updated-code>
,要透過修改continue原始碼來解析模型生成的程式碼,這太複雜了,我就放棄折騰,直接用原版qwen2.5-coder-1.5B好了。
經過我粗略對比,原版容易刪除註釋和換行空格,沒有那麼守規矩。微調版輸出更準確,但是原版能力也不差,可以使用,200行內的簡單程式碼合併輕輕鬆鬆,並且1.5B既能支援fast apply,也可以支援程式碼補全fim,一個模型兩個用途,本地執行非常划算。
下面是如何配置continue:
// ~/.continue/config.json
{
"models": [{
"title": "fastapply-1.5b-v1.0@f16",
"model": "qwen2.5-coder-1.5b-instruct@q8_0",
"apiBase": "http://192.168.8.110:5000/v1",
"provider": "lmstudio",
"contextLength": 4000,
"completionOptions": {
"maxTokens": 4000,
"stop": [
"<|endoftext|>"
],
"temperature": 0.01
}
}],
"tabAutocompleteModel": {
"title": "ollama_model",
"provider": "lmstudio",
"model": "qwen2.5-coder-1.5b-instruct@q8_0",
"template": "qwen",
"apiBase": "http://192.168.8.110:5000/v1"
},
"modelRoles": {
"applyCodeBlock": "fastapply-1.5b-v1.0@f16",
"inlineEdit": "fastapply-1.5b-v1.0@f16"
}
}
// ~/.continue/config.ts
export function modifyConfig(config: Config): Config {
const gptEditPrompt: PromptTemplate = (_, otherData) => {
// 原版enclosed within <updated-code> and </updated-code> tags
// system You are a coding assistant that helps merge code updates
// Do not include any additional text, explanations, placeholders, ellipses, or code fences.
// 為了方便相容改成markdown格式的
// enclosed within markdown \`\`\`your update code\`\`\`
const systemMessage =
`<|im_start|>system You are a coding assistant that helps fix code and merge code updates, ensuring every modification is fully integrated.<|im_end|>`;
const userMessage =
`<|im_start|>user Merge all changes from the <update> snippet into the <code> below. - Preserve the code's structure, order, comments, and indentation exactly. - Output only the updated code, enclosed within markdown \`\`\`your update code\`\`\`. - Do not include any additional text, explanations, placeholders, ellipses.`;
if (otherData ? .codeToEdit ? .trim().length === 0) {
return `${systemMessage}
${userMessage}
<code>${otherData.prefix}[BLANK]${otherData.suffix}</code>
<update>${otherData.userInput}</update>
Provide the complete updated code.<|im_end|>
<|im_start|>assistant `;
}
// const codeBlock = `${otherData.prefix}<code>${otherData.codeToEdit}$</code>{otherData.suffix}`; // 使用prefix, suffix
const codeBlock = `<code>${otherData.codeToEdit}</code>`;
const updateBlock = `<update>${otherData.userInput}</update>`;
return `${systemMessage}
${userMessage}
${codeBlock}
${updateBlock}
Provide the complete updated code.<|im_end|>
<|im_start|>assistant `;
};
let modelName = "fastapply-1.5b-v1.0@f16"
// Fix the model finding logic
let applyModel = config.models.find(model => model.title === modelName);
if (applyModel) {
applyModel.promptTemplates = {
edit: gptEditPrompt,
};
// console.log('done')
} else {
// console.warn('Model "fastapply-1.5b-v1.0@f16" not found in config.models');
}
return config;
}
我還向continue倉庫提了一個issue,希望能相容fastApply微調模型,歡迎跟蹤進度。