今天這篇部落格轉載自我們的全棧工程師朱唯唯。在使用 Ollma 進行大模型載入時,她嘗試使用了 JuiceFS 進行模型共享,JuiceFS 的資料預熱和分散式快取功能顯著提升了載入效率,最佳化了效能瓶頸問題。
01 背景
隨著 AI 技術的發展,大模型已經潛移默化地影響著我們的生活。商業 LLM 始終因為收費、黑盒、資料安全等原因,跟使用者之間彷彿隔著一道天塹。越來越多的大模型選擇了開源,讓使用者能夠更加方便且放心的去使用自己的大模型。
Ollma 是一個簡化大模型部署和執行的工具,一方面透過提供類 Docker 的使用方式,執行一個大模型例項就和啟動一個容器一樣簡單,另一方面,透過提供 OpenAI 相容的 API,磨平大模型之間的使用差異。
為了避免使用 “人工智障”,我們會選擇儘量大規模引數的模型,但眾所周知,模型引數越大,雖然擁有更出色的表現。但也具有更大的體積,比如 Llama 3.1 70B 模型大小為 40GB。
在當下,一個和業務功能強相關的大檔案管理是一件很頭疼的事情。一般無非兩個方案,一個是模型製品化,另一個是共享儲存。
- 模型製品化:將大模型本身打入製品交付物中,無論是 Docker 映象還是 OS 快照,力求透過 IaaS 或者 PaaS 的能力完成大模型的版本管理和分發;
- 共享儲存:共享儲存的思路就比較簡單,直接將大模型放在一個共享檔案系統中,按需拉取。
模型製品化更像是熱啟動,透過複用平臺層製品分發的能力,在例項就緒時,大模型就已經在本地了,但其瓶頸在於大檔案的分發,軟體工程發展到現在的階段,大製品的分發手段依然有限。
共享儲存更像冷啟動,在例項啟動時,雖然可以看到遠端的模型檔案,但需要遠端載入執行。雖然共享儲存是一個很符合直覺的方式,但十分考驗共享儲存,搞不好共享儲存本身就是整個載入階段的瓶頸。
但是,如果一個共享儲存本身也支援資料預熱、分散式快取等熱啟動手段,那情況就另說了,而 JuiceFS 就是這麼一個專案。
本文將透過一個 Demo,介紹 JuiceFS 共享儲存的部分,基於 JuiceFS 提供的分散式檔案系統的能力,使得 Ollama 模型檔案,一次拉取,到處執行。
02 一次拉取
本文以 Linux 機器為例,演示如何拉取模型。
準備 JuiceFS 檔案系統
Ollama 預設會將模型資料放在 /root/.ollama
下,所以這裡將 JuiceFS 掛載在 /root/.ollama
下:
$ juicefs mount weiwei /root/.ollama --subdir=ollama
這樣 Ollama 拉取的模型資料就會放在 JuiceFS 的檔案系統中了。
拉取模型安裝
Ollama:
curl -fsSL https://ollama.com/install.sh | sh
拉取模型,這裡以 llama 3.1 8B 舉例:
$ ollama pull llama3.1
pulling manifest
pulling 8eeb52dfb3bb... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 4.7 GB
pulling 11ce4ee3e170... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.7 KB
pulling 0ba8f0e314b4... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 12 KB
pulling 56bb8bd477a5... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 96 B
pulling 1a4c3c319823... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 485 B
verifying sha256 digest
writing manifest
removing any unused layers
success
Ollama 允許使用者用 Modelfile 建立自己的模型,寫法與 Dockerfile 型別。這裡以 llama 3.1 為基礎,設定系統 prompt:
$ cat <<EOF > Modelfile
> FROM llama3.1
# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# set the system message
SYSTEM """
You are a literary writer named Rora. Please help me polish my writing.
"""
> EOF
$
$ ollama create writer -f ./Modelfile
transferring model data
using existing layer sha256:8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe
using existing layer sha256:11ce4ee3e170f6adebac9a991c22e22ab3f8530e154ee669954c4bc73061c258
using existing layer sha256:0ba8f0e314b4264dfd19df045cde9d4c394a52474bf92ed6a3de22a4ca31a177
creating new layer sha256:1dfe258ba02ecec9bf76292743b48c2ce90aefe288c9564c92d135332df6e514
creating new layer sha256:7fa4d1c192726882c2c46a2ffd5af3caddd99e96404e81b3cf2a41de36e25991
creating new layer sha256:ddb2d799341563f3da053b0da259d18d8b00b2f8c5951e7c5e192f9ead7d97ad
writing manifest
success
檢視模型列表:
$ ollama list
NAME ID SIZE MODIFIED
writer:latest 346a60dbd7d4 4.7 GB 17 minutes ago
llama3.1:latest 91ab477bec9d 4.7 GB 4 hours ago
03 到處執行
現在 JuiceFS 檔案系統中就已經包含了用 Ollama 拉取的大模型了,所以在其他地方只要掛載上 JuiceFS,就可以直接執行了。本文分別演示在 Linux、Mac、Kubernetes 中用 Ollama 執行大模型。
Linux
在已經掛載了 JuiceFS 的機器上可以直接執行:
$ ollama run writer
>>> The flower is beautiful
A lovely start, but let's see if we can't coax out a bit more poetry from your words. How about this:
"The flower unfolded its petals like a gentle whisper, its beauty an unassuming serenade that drew the eye and stirred the soul."
Or, perhaps a slightly more concise version:
"In the flower's delicate face, I find a beauty that soothes the senses and whispers secrets to the heart."
Your turn! What inspired you to write about the flower?
Mac
掛載 JuiceFS:
weiwei@hdls-mbp ~ juicefs mount weiwei .llama --subdir=ollama
.OK, weiwei is ready at /Users/weiwei/.llama.
點選連結安裝:https://ollama.com/download/Ollama-darwin.zip
這裡需要注意的是,剛才拉取模型時,是以 root 儲存的,所以在 Mac 上需要切換到 root 才能執行 ollama。
如果使用手動建立的 writer 模型,有個問題,新建的模型的 layer 寫入的時候許可權是 600,只有手動將其設定為 644 才可以在 Mac 上執行。這是 Ollama 的一個 bug,筆者已經向 Ollama 提了 PR
(https://github.com/ollama/ollama/pull/6386)。但截止目前還沒有釋出新版本。臨時解決方法如下:
hdls-mbp:~ root# cd /Users/weiwei/.ollama/models/blobs
hdls-mbp:blobs root# ls -alh . | grep rw-------
-rw------- 1 root wheel 14B 8 15 23:04 sha256-804a1f079a1166190d674bcfb0fa42270ec57a4413346d20c5eb22b26762d132
-rw------- 1 root wheel 559B 8 15 23:04 sha256-db7eed3b8121ac22a30870611ade28097c62918b8a4765d15e6170ec8608e507
hdls-mbp:blobs root#
hdls-mbp:blobs root# chmod 644 sha256-804a1f079a1166190d674bcfb0fa42270ec57a4413346d20c5eb22b26762d132 sha256-db7eed3b8121ac22a30870611ade28097c62918b8a4765d15e6170ec8608e507
hdls-mbp:blobs root#
hdls-mbp:blobs root#
hdls-mbp:blobs root#
hdls-mbp:blobs root# ollama list
NAME ID SIZE MODIFIED
writer:latest 346a60dbd7d4 4.7 GB About an hour ago
llama3.1:latest 91ab477bec9d 4.7 GB 4 hours ago
執行 writer 模型,並讓其幫我們潤色文字:
hdls-mbp:weiwei root# ollama run writer
>>> The tree is very tall
A great start, but let's see if we can make it even more vivid and engaging.
Here's a revised version:
"The tree stood sentinel, its towering presence stretching towards the sky like a verdant giant, its branches dancing
in the breeze with an elegance that seemed almost otherworldly."
Or, if you'd prefer something simpler yet still evocative, how about this:
"The tree loomed tall and green, its trunk sturdy as a stone pillar, its leaves a soft susurrus of sound in the gentle
wind."
Which one resonates with you? Or do you have any specific ideas or feelings you want to convey through your writing
that I can help shape into a compelling phrase?
Kubernetes
JuiceFS 提供了 CSI Driver,使得使用者可以在 Kubernetes 中直接使用 PV,支援靜態配置和動態配置。由於我們是直接使用檔案系統裡已有的檔案,所以這裡使用靜態配置。
準備 PVC 和 PV:
apiVersion: v1
kind: PersistentVolume
metadata:
name: ollama-vol
labels:
juicefs-name: ollama-vol
spec:
capacity:
storage: 10Pi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
csi:
driver: csi.juicefs.com
volumeHandle: ollama-vol
fsType: juicefs
nodePublishSecretRef:
name: ollama-vol
namespace: kube-system
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ollama-vol
namespace: default
spec:
accessModes:
- ReadWriteMany
volumeMode: Filesystem
resources:
requests:
storage: 10Gi
selector:
matchLabels:
juicefs-name: ollama-vol
部署 Ollama:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama
labels:
app: ollama
spec:
replicas: 1
selector:
matchLabels:
app: ollama
template:
metadata:
labels:
app: ollama
spec:
containers:
- image: registry.cn-hangzhou.aliyuncs.com/hdls/ollama:0.3.5
env:
- name: OLLAMA_HOST
value: "0.0.0.0"
ports:
- name: ollama
containerPort: 11434
args:
- "serve"
name: ollama
volumeMounts:
- mountPath: /root/.ollama
name: shared-data
subPath: ollama
volumes:
- name: shared-data
persistentVolumeClaim:
claimName: ollama-vol
---
apiVersion: v1
kind: Service
metadata:
name: ollama-svc
spec:
selector:
app: ollama
ports:
- name: http
protocol: TCP
port: 11434
targetPort: 11434
由於 Ollama deployment 部署了一個 Ollama server,可以用 api 的方式訪問:
$ curl http://192.168.203.37:11434/api/generate -d '{
"model": "writer",
"prompt": "The sky is blue",
"stream": false
}'
{"model":"writer","created_at":"2024-08-15T14:35:43.593740142Z","response":"A starting point, at least! Let's see... How about we add some depth to this sentence? Here are a few suggestions:\n\n* Instead of simply stating that the sky is blue, why not describe how it makes you feel? For example: \"As I stepped outside, the cerulean sky seemed to stretch out before me like an endless canvas, its vibrant hue lifting my spirits and washing away the weight of the world.\"\n* Or, we could add some sensory details to bring the scene to life. Here's an example: \"The morning sun had just risen over the horizon, casting a warm glow across the blue sky that seemed to pulse with a gentle light – a softness that soothed my skin and lulled me into its tranquil rhythm.\"\n* If you're going for something more poetic, we could try to tap into the symbolic meaning of the sky's color. For example: \"The blue sky above was like an open door, inviting me to step through and confront the dreams I'd been too afraid to chase – a reminder that the possibilities are endless, as long as we have the courage to reach for them.\"\n\nWhich direction would you like to take this?","done":true,"done_reason":"stop","context":[128006,9125,128007,1432,2675,527,264,32465,7061,7086,432,6347,13,5321,1520,757,45129,856,4477,627,128009,128006,882,128007,271,791,13180,374,6437,128009,128006,78191,128007,271,32,6041,1486,11,520,3325,0,6914,596,1518,1131,2650,922,584,923,1063,8149,311,420,11914,30,5810,527,264,2478,18726,1473,9,12361,315,5042,28898,430,279,13180,374,6437,11,3249,539,7664,1268,433,3727,499,2733,30,1789,3187,25,330,2170,358,25319,4994,11,279,10362,1130,276,13180,9508,311,14841,704,1603,757,1093,459,26762,10247,11,1202,34076,40140,33510,856,31739,323,28786,3201,279,4785,315,279,1917,10246,9,2582,11,584,1436,923,1063,49069,3649,311,4546,279,6237,311,2324,13,5810,596,459,3187,25,330,791,6693,7160,1047,1120,41482,927,279,35174,11,25146,264,8369,37066,4028,279,6437,13180,430,9508,311,28334,449,264,22443,3177,1389,264,8579,2136,430,779,8942,291,856,6930,323,69163,839,757,1139,1202,68040,37390,10246,9,1442,499,2351,2133,369,2555,810,76534,11,584,1436,1456,311,15596,1139,279,36396,7438,315,279,13180,596,1933,13,1789,3187,25,330,791,6437,13180,3485,574,1093,459,1825,6134,11,42292,757,311,3094,1555,323,17302,279,19226,358,4265,1027,2288,16984,311,33586,1389,264,27626,430,279,24525,527,26762,11,439,1317,439,584,617,279,25775,311,5662,369,1124,2266,23956,5216,1053,499,1093,311,1935,420,30],"total_duration":13635238079,"load_duration":39933548,"prompt_eval_count":35,"prompt_eval_duration":55817000,"eval_count":240,"eval_duration":13538816000}
04 總結
Ollama 是一款簡化了本地執行大模型的工具,將大模型拉取到本地,再使用簡單的命令即可在本地執行自己的大模型。JuiceFS 可以充當大模型 Registry 的底層儲存,由於其分散式的特徵,使得使用者可以在某處拉取一次模型後,在其他地方即可直接使用,從而實現了一次拉取,到處執行。
希望這篇內容能夠對你有一些幫助,如果有其他疑問歡迎加入 JuiceFS 社群與大家共同交流。