continue呼叫1.5B小模型實現程式碼fast-apply

索美不达米亚發表於2024-11-04

100tok/s生成速度,就問夠不夠fast?用過cursor的小夥伴一定對有個功能印象深刻,那就是fast apply功能。只要點一下,就可以把對話方塊中AI生成的程式碼快速地應用到編輯器的當前程式碼檔案裡, 然後下一步就是對比變更,accept或者reject程式碼塊,相比於要手動從對話方塊複製程式碼到編輯器裡貼上修改,這個方式非常高效方便,是cursor的殺手鐧功能.

現在可以透過vscode外掛continue使用本地的小模型來實現這個功能,這個模型就是Qwen2.5-Coder-1.5b。1.5B的GGUF量化模型在我本地電腦M2 Max上透過LMStudio來跑,測試速度大約是q8_0 100 tok/s,q4_0 140 tok/s,fp16 70 tok/s,7B版本的q4_0 40 tok/s。兼顧效能和速度的話,我還是選擇了1.5B的q8_0版本。

這件事起因是我看到一個專門用於fast apply的微調模型FastApply-1.5B-v1.0,是透過微調qwen2.5-coder-1.5B和7B模型實現的,專門用於程式碼合併fast apply功能的模型,準確率比原版有提升。

效能最佳化

我試圖把它接入到continue裡,不知道continue的小夥伴可以看這個影片入門(continue開源AI程式碼程式設計助手-自定義api-SiliconFlow矽基流動與deepseek配置教程-嗶哩嗶哩)。可惜它的輸出格式是<updated-code>[Full-complete updated file]</updated-code>,要透過修改continue原始碼來解析模型生成的程式碼,這太複雜了,我就放棄折騰,直接用原版qwen2.5-coder-1.5B好了。

經過我粗略對比,原版容易刪除註釋和換行空格,沒有那麼守規矩。微調版輸出更準確,但是原版能力也不差,可以使用,200行內的簡單程式碼合併輕輕鬆鬆,並且1.5B既能支援fast apply,也可以支援程式碼補全fim,一個模型兩個用途,本地執行非常划算。

下面是如何配置continue:

// ~/.continue/config.json
{
  "models": [{
    "title": "fastapply-1.5b-v1.0@f16",
    "model": "qwen2.5-coder-1.5b-instruct@q8_0",
    "apiBase": "http://192.168.8.110:5000/v1",
    "provider": "lmstudio",
    "contextLength": 4000,
    "completionOptions": {
      "maxTokens": 4000,
      "stop": [
        "<|endoftext|>"
      ],
      "temperature": 0.01
    }
  }],
  "tabAutocompleteModel": {
    "title": "ollama_model",
    "provider": "lmstudio",
    "model": "qwen2.5-coder-1.5b-instruct@q8_0",
    "template": "qwen",
    "apiBase": "http://192.168.8.110:5000/v1"
  },
  "modelRoles": {
    "applyCodeBlock": "fastapply-1.5b-v1.0@f16",
    "inlineEdit": "fastapply-1.5b-v1.0@f16"
  }
}
// ~/.continue/config.ts
export function modifyConfig(config: Config): Config {
    const gptEditPrompt: PromptTemplate = (_, otherData) => {
        // 原版enclosed within <updated-code> and </updated-code> tags
        // system You are a coding assistant that helps merge code updates
        // Do not include any additional text, explanations, placeholders, ellipses, or code fences.
        // 為了方便相容改成markdown格式的
        // enclosed within markdown \`\`\`your update code\`\`\`
        const systemMessage =
            `<|im_start|>system You are a coding assistant that helps fix code and merge code updates, ensuring every modification is fully integrated.<|im_end|>`;
        const userMessage =
            `<|im_start|>user Merge all changes from the <update> snippet into the <code> below. - Preserve the code's structure, order, comments, and indentation exactly. - Output only the updated code, enclosed within markdown \`\`\`your update code\`\`\`. - Do not include any additional text, explanations, placeholders, ellipses.`;

        if (otherData ? .codeToEdit ? .trim().length === 0) {
            return `${systemMessage}
${userMessage}
<code>${otherData.prefix}[BLANK]${otherData.suffix}</code>
<update>${otherData.userInput}</update>
Provide the complete updated code.<|im_end|>
<|im_start|>assistant `;
        }

        // const codeBlock = `${otherData.prefix}<code>${otherData.codeToEdit}$</code>{otherData.suffix}`;  // 使用prefix, suffix
        const codeBlock = `<code>${otherData.codeToEdit}</code>`;
        const updateBlock = `<update>${otherData.userInput}</update>`;

        return `${systemMessage}
${userMessage}
${codeBlock}
${updateBlock}
Provide the complete updated code.<|im_end|>
<|im_start|>assistant `;
    };

    let modelName = "fastapply-1.5b-v1.0@f16"
    // Fix the model finding logic
    let applyModel = config.models.find(model => model.title === modelName);
    if (applyModel) {
        applyModel.promptTemplates = {
            edit: gptEditPrompt,
        };
        // console.log('done')
    } else {
        // console.warn('Model "fastapply-1.5b-v1.0@f16" not found in config.models');
    }
    return config;
}

我還向continue倉庫提了一個issue,希望能相容fastApply微調模型,歡迎跟蹤進度。

相關文章