fastchat vs vLLM

lightsong發表於2024-07-20

原文網址 : https://www.cnblogs.com/lightsong/p/18312952

vLLM

https://github.com/vllm-project/vllm

https://docs.vllm.ai/en/latest/

推理和服務，但是更加偏向推理。

vLLM is a fast and easy-to-use library for LLM inference and serving.

vLLM is fast with:

State-of-the-art serving throughput

Efficient management of attention key and value memory with PagedAttention

Continuous batching of incoming requests

Fast model execution with CUDA/HIP graph

Quantization: GPTQ, AWQ, SqueezeLLM, FP8 KV Cache

Optimized CUDA kernels

Performance benchmark: We include a performance benchmark that compares the performance of vllm against other LLM serving engines (TensorRT-LLM, text-generation-inference and lmdeploy).

vLLM is flexible and easy to use with:

Seamless integration with popular Hugging Face models

High-throughput serving with various decoding algorithms, including parallel sampling, beam search, and more

Tensor parallelism and pipeline parallelism support for distributed inference

Streaming outputs

OpenAI-compatible API server

Support NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs and GPUs, PowerPC CPUs

(Experimental) Prefix caching support

(Experimental) Multi-lora support

vLLM seamlessly supports most popular open-source models on HuggingFace, including:

Transformer-like LLMs (e.g., Llama)

Mixture-of-Expert LLMs (e.g., Mixtral)

Multi-modal LLMs (e.g., LLaVA)

Find the full list of supported models here.

FastChat

https://github.com/lm-sys/FastChat

對模型的訓練、服務、評估負責，

流行的還是使用其服務功能，即部署功能（分散式部署，提供webui 和 resetapi），切後端可以整合vLLM加速推理。

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

FastChat

| Demo | Discord | X |

FastChat is an open platform for training, serving, and evaluating large language model based chatbots.

FastChat powers Chatbot Arena (https://chat.lmsys.org/), serving over 10 million chat requests for 70+ LLMs.

Chatbot Arena has collected over 500K human votes from side-by-side LLM battles to compile an online LLM Elo leaderboard.

FastChat's core features include:

The training and evaluation code for state-of-the-art models (e.g., Vicuna, MT-Bench).

A distributed multi-model serving system with web UI and OpenAI-compatible RESTful APIs.

https://rudeigerc.dev/posts/llm-inference-with-fastchat/

VS

https://fastchat.mintlify.app/vllm_integration

https://github.com/lm-sys/FastChat/issues/1775

vLLM與PagedAttention：全面概述
2024-07-09
vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention
2024-07-16
AST
LLM 大模型學習必知必會系列(十二)：VLLM效能飛躍部署實踐：從推理加速到高效部署的全方位最佳化[更多內容：XInference/FastChat等框架]
2024-05-31
大模型AST框架
Qwen2-72B的vLLM部署
2024-08-24
大模型推理指南：使用 vLLM 實現高效推理
2024-11-21
大模型
Berkeley vLLM：算力減半、吞吐增十倍
2024-05-12
Playwright VS Selenium VS Puppeteer VS Cypress
2021-01-04
vs 2017 vs code
2018-04-08
Airflow vs. Luigi vs. Argo vs. MLFlow vs. KubeFlow
2024-08-04
AIUIGo
Axum vs Actix vs Rocket
2024-04-27
RDBMS VS XML VS NoSQL
2020-03-28
XMLSQL
如何解除安裝VS 2017之前版本比如VS 2013、VS2015、 VS vNext？
2018-05-12
[譯]await VS return VS return await
2018-11-23
AI
The SQL vs NoSQL Difference: MySQL vs MongoDB
2019-04-13
MySqlMongoDB
HashSet vs. TreeSet vs. LinkedHashSet
2018-08-23
Redux vs Mobx系列(-)：immutable vs mutable
2018-03-15
Redux
spring vs yii2 vs Laravel
2019-08-16
SpringLaravel
coca 搭配 in vs on vs at | page1
2024-10-08
coca 搭配 in vs on vs at | page3
2024-10-09
JavaScript 的 4 種陣列遍歷方法： for VS forEach() VS for/in VS for/of
2019-03-11
JavaScript陣列
ABAP vs Java，蛙泳 vs 自由泳
2018-10-20
Java
When to use var vs let vs const in JavaScript
2019-01-22
JavaScript
Tomcat vs Jetty vs Undertow效能對比
2020-09-15
TomcatJetty
微軟常用執行庫合集下載(vs2008(sp)/vs2010(sp)/vs2012/vs2013/vs2015/vs2017)包含32位/64位
2018-11-02
微軟
測試速度比較：Selenium vs Playwright vs Cypress vs Puppeteer vs TestCafe
2022-03-08
javascript — == vs ===
2019-02-16
JavaScript
vs 2017
2018-04-12
PostgreSQL DBA(6) - SeqScan vs IndexScan vs Bit...
2018-09-27
SQLIndex
PostgreSQL DBA(131) - Develop(numeric vs float vs int)
2019-11-18
SQLdev
計數排序vs基數排序vs桶排序
2019-03-03
排序
iOS:原生應用 VS Flutter VS GICXMLLayout 比較
2019-01-25
iOSFlutterXML
PostgreSQL DBA(121) - pgAdmin(HA：PAF vs repmgr vs Patroni)
2019-11-05
SQL
如何實現 “defer”：Go vs Java vs C/CPP
2020-03-04
GoJava
Flutter VS React Native VS Native，誰才是效能之王
2020-04-18
FlutterReact Native
Go vs Java vs C# 語法對比
2022-02-13
GoJavaC#
資料湖 vs 倉庫 vs 資料庫
2022-01-16
資料庫
我將從VS Code切換到VS Codium
2022-06-15
資料質量管理工具預研——Griffin VS Deequ VS Great expectations VS Qualitis
2022-07-18

fastchat vs vLLM

vLLM

FastChat

FastChat

VS

相關文章