GPT-4o |
|
88.7 |
/ |
/ |
90.5 |
76.6 |
/ |
/ |
OpenAI |
Claude3-Opus |
0.0 |
86.8 |
/ |
/ |
95.0 |
60.1 |
/ |
9.43 |
Anthropic |
GPT-4 |
1750.0 |
86.4 |
68.7 |
/ |
87.1 |
42.5 |
/ |
9.32 |
OpenAI |
Llama3-400B-Instruct-InTraining |
4000.0 |
86.1 |
/ |
/ |
94.1 |
57.8 |
/ |
/ |
Facebook AI研究實驗室 |
Llama3-400B-InTraining |
4000.0 |
84.8 |
/ |
/ |
/ |
/ |
/ |
/ |
Facebook AI研究實驗室 |
Qwen2-72B |
727.0 |
84.2 |
91.0 |
/ |
89.5 |
51.1 |
82.4 |
/ |
阿里巴巴 |
Gemini-ultra |
0.0 |
83.7 |
/ |
/ |
88.9 |
53.2 |
/ |
/ |
DeepMind |
Qwen2-72B-Instruct |
72.0 |
82.3 |
83.8 |
/ |
91.1 |
59.7 |
/ |
9.12 |
阿里巴巴 |
Llama3-70B-Instruct |
700.0 |
82.0 |
/ |
/ |
93.0 |
50.4 |
/ |
/ |
Facebook AI研究實驗室 |
Gemini 1.5 Pro |
0.0 |
81.9 |
/ |
/ |
91.7 |
58.5 |
/ |
/ |
Google Deep Mind |
GLM4 |
0.0 |
81.5 |
/ |
/ |
87.6 |
47.9 |
82.3 |
/ |
智譜AI |
Grok-1.5 |
|
81.3 |
/ |
/ |
90.0 |
50.6 |
/ |
/ |
xAI |
Mistral Large |
0.0 |
81.2 |
/ |
/ |
81.0 |
45.0 |
/ |
8.66 |
MistralAI |
YAYI2-30B |
300.0 |
80.5 |
80.9 |
62.0 |
71.2 |
/ |
/ |
/ |
中科聞歌 |
Qwen1.5-110B |
1100.0 |
80.4 |
/ |
/ |
85.4 |
49.6 |
74.8 |
8.88 |
阿里巴巴 |
Llama3-70B |
700.0 |
79.5 |
/ |
/ |
/ |
/ |
/ |
/ |
Facebook AI研究實驗室 |
Gemini-pro |
1000.0 |
79.13 |
/ |
/ |
86.5 |
/ |
/ |
/ |
DeepMind |
Claude3-Sonnet |
0.0 |
79.0 |
/ |
/ |
92.3 |
43.1 |
/ |
9.18 |
Anthropic |
DeepSeek-V2-236B |
2360.0 |
78.5 |
81.7 |
/ |
79.2 |
43.6 |
78.9 |
/ |
DeepSeek-AI |
PaLM 2 |
3400.0 |
78.3 |
/ |
/ |
80.7 |
/ |
/ |
/ |
Google Research |
Phi-3-medium 14B-preview |
140.0 |
78.2 |
/ |
48.4 |
90.3 |
/ |
/ |
8.91 |
Microsoft |
Mixtral-8×22B-MoE |
1410.0 |
77.75 |
/ |
/ |
78.6 |
41.8 |
/ |
/ |
MistralAI |
Qwen1.5-72B-Chat |
720.0 |
77.5 |
84.1 |
/ |
79.5 |
34.1 |
65.5 |
8.67 |
阿里巴巴 |
Qwen-72B |
720.0 |
77.4 |
83.3 |
62.5 |
78.9 |
/ |
/ |
/ |
阿里巴巴 |
Yi-1.5-34B |
340.0 |
77.1 |
/ |
71.1 |
82.7 |
41.0 |
76.4 |
/ |
零一萬物 |
Qwen2-57B-A14B |
570.0 |
76.5 |
87.7 |
/ |
80.7 |
43.0 |
67.0 |
/ |
阿里巴巴 |
Yi-34B |
340.0 |
76.3 |
81.4 |
/ |
/ |
/ |
/ |
/ |
零一萬物 |
Yi-34B-200K |
340.0 |
76.1 |
81.9 |
/ |
/ |
/ |
/ |
/ |
零一萬物 |
Phi-3-small 7B |
70.0 |
75.3 |
/ |
45.0 |
88.9 |
/ |
/ |
8.7 |
Microsoft |
Claude3-Haiku |
0.0 |
75.2 |
/ |
/ |
88.9 |
38.9 |
/ |
/ |
Anthropic |
Gemma2-27B |
270.0 |
75.0 |
/ |
/ |
75.0 |
/ |
/ |
/ |
Google Deep Mind |
GLM-4-9B |
90.0 |
74.7 |
/ |
/ |
84.0 |
30.4 |
/ |
/ |
智譜AI |
DBRX Instruct |
1320.0 |
73.7 |
/ |
/ |
72.8 |
/ |
/ |
8.39 |
databricks |
Qwen1.5-32B |
320.0 |
73.4 |
83.5 |
/ |
77.4 |
36.1 |
/ |
8.3 |
阿里巴巴 |
Grok-1 |
3140.0 |
73.0 |
/ |
/ |
62.9 |
/ |
/ |
/ |
xAI |
GLM-4-9B-Chat |
90.0 |
72.4 |
75.6 |
/ |
79.6 |
50.6 |
/ |
8.35 |
智譜AI |
Apollo-7B |
70.0 |
71.86 |
/ |
/ |
/ |
/ |
/ |
/ |
個人 |
DeepSeek-V2-236B-Chat |
2360.0 |
71.1 |
65.2 |
/ |
84.4 |
32.6 |
71.7 |
/ |
DeepSeek-AI |
XVERSE-65B |
650.0 |
70.8 |
/ |
61.8 |
60.3 |
/ |
/ |
/ |
元象XVERSE |
Mixtral-8×7B-MoE |
450.0 |
70.6 |
/ |
/ |
74.4 |
28.4 |
/ |
8.3 |
MistralAI |
Qwen2-7B |
70.0 |
70.3 |
83.2 |
/ |
79.9 |
44.2 |
62.6 |
/ |
阿里巴巴 |
GPT-3.5 |
1750.0 |
70.0 |
54.4 |
/ |
57.1 |
/ |
/ |
8.39 |
OpenAI |
Yi-1.5-9B |
90.0 |
69.5 |
/ |
62.7 |
73.7 |
32.6 |
72.4 |
/ |
零一萬物 |
PaLM |
5400.0 |
69.3 |
/ |
/ |
56.5 |
/ |
/ |
/ |
Google Research |
LLaMA2 70B |
700.0 |
68.9 |
/ |
54.2 |
56.8 |
/ |
/ |
/ |
Facebook AI研究實驗室 |
Phi-3-mini 3.8B |
38.0 |
68.8 |
/ |
37.5 |
82.5 |
/ |
/ |
8.38 |
Microsoft |
Yi-9B |
90.0 |
68.4 |
/ |
/ |
52.3 |
15.9 |
/ |
/ |
零一萬物 |
Llama3-8B-Instruct |
80.0 |
68.4 |
/ |
/ |
79.6 |
30.0 |
/ |
/ |
Facebook AI研究實驗室 |
Aquila2-34B |
340.0 |
67.79 |
63.07 |
/ |
58.4 |
/ |
/ |
/ |
北京智源人工智慧研究院 |
Jamba-v0.1 |
520.0 |
67.4 |
/ |
/ |
59.9 |
/ |
45.4 |
/ |
A21 Labs |
Llama3-8B |
80.0 |
66.6 |
/ |
/ |
/ |
/ |
/ |
/ |
Facebook AI研究實驗室 |
Qwen-14B |
140.0 |
66.3 |
72.1 |
/ |
61.3 |
/ |
/ |
/ |
阿里巴巴 |
Grok-0 |
330.0 |
65.7 |
/ |
/ |
56.8 |
/ |
/ |
/ |
xAI |
Gemma 7B |
70.0 |
64.3 |
/ |
41.7 |
46.4 |
24.3 |
55.1 |
/ |
Google Research |
Yi-6B-200K |
60.0 |
64.0 |
73.5 |
/ |
/ |
/ |
/ |
/ |
零一萬物 |
Starling-7B-LM-Beta |
70.0 |
63.9 |
/ |
/ |
/ |
/ |
/ |
8.09 |
Nexusflow |
LLaMA 65B |
650.0 |
63.4 |
38.8 |
47.6 |
50.9 |
/ |
/ |
/ |
Facebook AI研究實驗室 |
Yi-6B |
60.0 |
63.2 |
72.0 |
/ |
/ |
/ |
/ |
/ |
零一萬物 |
LLaMA2 34B |
340.0 |
62.6 |
/ |
43.4 |
42.2 |
/ |
/ |
/ |
Facebook AI研究實驗室 |
Qwen1.5-MoE-A2.7B |
143.0 |
62.5 |
/ |
/ |
61.5 |
/ |
/ |
7.17 |
阿里巴巴 |
StableLM2-12B |
120.0 |
62.09 |
/ |
/ |
56.03 |
/ |
/ |
8.15 |
Stability AI |
ChatGLM3-6B-Base |
60.0 |
61.4 |
69.0 |
53.7 |
72.3 |
/ |
/ |
/ |
智譜AI |
StableLM2-12B-Chat |
120.0 |
61.14 |
/ |
/ |
57.7 |
/ |
/ |
8.15 |
Stability AI |
XVERSE-13B-Chat |
130.0 |
60.2 |
53.1 |
48.3 |
/ |
/ |
/ |
/ |
元象XVERSE |
XVERSE-MoE-A4.2B |
258.0 |
60.2 |
60.5 |
48.0 |
51.2 |
/ |
/ |
/ |
元象XVERSE |
Mistral 7B |
73.0 |
60.1 |
/ |
43.0 |
52.1 |
/ |
/ |
/ |
MistralAI |
DeciLM-7B |
70.4 |
59.76 |
/ |
/ |
47.38 |
/ |
/ |
/ |
Deci |
Baichuan2-13B-Base |
130.0 |
59.17 |
58.1 |
48.17 |
52.77 |
/ |
/ |
/ |
百川智慧 |
MiniCPM-MoE-8x2B |
136.0 |
58.9 |
58.11 |
/ |
61.5 |
10.52 |
39.22 |
/ |
OpenBMB |
LLaMA 33B |
330.0 |
57.8 |
/ |
41.7 |
35.6 |
/ |
/ |
/ |
Facebook AI研究實驗室 |
Qwen-7B |
70.0 |
56.7 |
59.6 |
/ |
51.6 |
/ |
/ |
/ |
阿里巴巴 |
Phi-2 |
27.0 |
56.7 |
/ |
/ |
61.1 |
/ |
/ |
/ |
Microsoft |
Qwen2-1.5B |
15.0 |
56.5 |
70.6 |
/ |
58.5 |
21.7 |
37.2 |
/ |
阿里巴巴 |
ChatGLM2 12B |
120.0 |
56.18 |
61.6 |
/ |
40.94 |
/ |
/ |
/ |
智譜AI |
XVERSE-13B |
130.0 |
55.1 |
54.7 |
41.4 |
/ |
/ |
/ |
/ |
元象XVERSE |
LLaMA2 13B |
130.0 |
54.84 |
/ |
39.1 |
28.7 |
/ |
/ |
/ |
Facebook AI研究實驗室 |
Baichuan2-7B-Base |
70.0 |
54.16 |
54.0 |
42.73 |
24.49 |
/ |
/ |
/ |
百川智慧 |
GPT-3 |
1750.0 |
53.9 |
/ |
/ |
/ |
/ |
/ |
/ |
OpenAI |
MiniCPM-2B-DPO |
24.0 |
53.46 |
51.13 |
/ |
53.83 |
10.24 |
36.87 |
7.25 |
面壁智慧 |
Baichuan 13B - Chat |
130.0 |
52.1 |
51.5 |
/ |
26.6 |
/ |
/ |
/ |
百川智慧 |
Baichuan 13B - Base |
130.0 |
51.62 |
52.4 |
/ |
26.6 |
/ |
/ |
/ |
百川智慧 |
InternLM 7B |
70.0 |
51.0 |
53.4 |
37.6 |
31.2 |
/ |
/ |
/ |
上海人工智慧實驗室 |
InternLM Chat 7B 8K |
70.0 |
50.8 |
53.2 |
42.5 |
31.2 |
/ |
/ |
/ |
上海人工智慧實驗室 |
ChatGLM2-6B |
62.0 |
47.86 |
51.7 |
/ |
32.37 |
/ |
/ |
/ |
智譜AI |
LLaMA 13B |
130.0 |
46.94 |
/ |
33.9 |
17.8 |
/ |
/ |
/ |
Facebook AI研究實驗室 |
Stable LM Zephyr 3B |
30.0 |
45.9 |
30.34 |
/ |
52.54 |
12.2 |
37.86 |
6.64 |
Stability AI |
Qwen2-0.5B |
4.0 |
45.4 |
58.2 |
/ |
58.5 |
10.7 |
28.4 |
/ |
阿里巴巴 |
LLaMA2 7B |
70.0 |
45.3 |
/ |
29.3 |
14.6 |
/ |
/ |
/ |
Facebook AI研究實驗室 |
Qwen-1.8B |
18.0 |
45.3 |
/ |
/ |
32.3 |
/ |
/ |
/ |
阿里巴巴 |
GLM-130B |
1300.0 |
44.8 |
44.0 |
/ |
/ |
/ |
/ |
/ |
智譜AI |
Ziya-LLaMA-13B-Pretrain-v1 |
130.0 |
43.9 |
30.2 |
27.2 |
/ |
/ |
/ |
/ |
IDEA研究院 |
OpenLLaMA 13B |
130.0 |
42.4 |
24.7 |
24.0 |
/ |
/ |
/ |
/ |
Berkeley Artificial Intelligence Research |
Baichuan 7B |
70.0 |
42.3 |
42.8 |
34.44 |
9.7 |
/ |
/ |
/ |
百川智慧 |
Gemma 2B |
20.0 |
42.3 |
/ |
24.2 |
17.7 |
11.8 |
35.2 |
/ |
Google Research |
Gemma 2B - It |
20.0 |
42.3 |
/ |
24.2 |
17.7 |
11.8 |
35.2 |
/ |
Google Research |
Stable LM 2 - 1.6B |
16.0 |
38.93 |
/ |
/ |
17.82 |
/ |
/ |
/ |
Stability AI |
RecurrentGemma-2B |
27.0 |
38.4 |
/ |
23.8 |
13.4 |
11.8 |
/ |
/ |
Google Research |
Phi-1.5 |
13.0 |
37.6 |
/ |
/ |
40.2 |
/ |
/ |
/ |
Microsoft |
DeepSeek Coder-6.7B Instruct |
67.0 |
37.2 |
/ |
/ |
62.8 |
28.6 |
46.9 |
/ |
DeepSeek-AI |
ChatGLM-6B |
62.0 |
36.9 |
38.9 |
/ |
4.82 |
/ |
/ |
/ |
智譜AI |
LLaMA 7B |
70.0 |
35.1 |
27.1 |
23.9 |
11.0 |
/ |
/ |
/ |
Facebook AI研究實驗室 |
MOSS |
160.0 |
27.4 |
33.13 |
26.8 |
/ |
/ |
/ |
/ |
OpenLMLab |
OPT |
1750.0 |
25.2 |
25.0 |
24.2 |
/ |
/ |
/ |
/ |
Facebook AI研究實驗室 |
Pythia |
120.0 |
25.1 |
26.2 |
25.3 |
/ |
/ |
/ |
/ |
EleutherAI |
TinyLlama |
11.0 |
24.3 |
25.02 |
/ |
2.27 |
/ |
/ |
/ |
新加坡科技與設計大學 |
CodeGemma-7B |
70.0 |
/ |
/ |
/ |
44.2 |
19.9 |
/ |
/ |
Google Research |
CodeGemma-7B-IT |
70.0 |
/ |
/ |
/ |
41.2 |
20.9 |
/ |
/ |
Google Research |
CodeGemma-2B |
20.0 |
/ |
/ |
/ |
41.2 |
20.9 |
/ |
/ |
Google Research |
WizardLM-2-70B |
70.0 |
/ |
/ |
/ |
/ |
/ |
/ |
8.92 |
Microsoft |
WizardLM-2-7B |
70.0 |
/ |
/ |
/ |
/ |
/ |
/ |
8.28 |
Microsoft |
WizardLM-2 8x22B |
1760.0 |
/ |
/ |
/ |
/ |
/ |
/ |
9.12 |
Microsoft |
CPM-Bee |
100.0 |
/ |
54.1 |
/ |
/ |
/ |
/ |
/ |
面壁智慧 |
Aquila-7B |
70.0 |
/ |
25.5 |
25.58 |
/ |
/ |
/ |
/ |
北京智源人工智慧研究院 |
Phi-1 |
13.0 |
/ |
/ |
/ |
/ |
/ |
/ |
/ |
Microsoft |