【基礎島·第6關】OpenCompass 評測 InternLM-1.8B 實踐

陈佳佳|Tech發表於2024-09-27

目錄
  • 1. 概覽
  • 2. 環境配置
    • 2.1 建立開發機和conda環境
    • 2.2 安裝——面向GPU的環境安裝
  • 3. 資料準備
    • 3.1 評測資料集
    • 3.2 InternLM和ceval 相關的配置檔案
  • 4. 啟動測評
    • 4.1 使用命令列配置引數法進行評測
    • 4.2 使用配置檔案修改引數法進行評測

1. 概覽

在 OpenCompass 中評估一個模型通常包括以下幾個階段:配置 -> 推理 -> 評估 -> 視覺化。

配置:這是整個工作流的起點。您需要配置整個評估過程,選擇要評估的模型和資料集。此外,還可以選擇評估策略、計算後端等,並定義顯示結果的方式。
推理與評估:在這個階段,OpenCompass 將會開始對模型和資料集進行並行推理和評估。推理階段主要是讓模型從資料集產生輸出,而評估階段則是衡量這些輸出與標準答案的匹配程度。這兩個過程會被拆分為多個同時執行的“任務”以提高效率。
視覺化:評估完成後,OpenCompass 將結果整理成易讀的表格,並將其儲存為 CSV 和 TXT 檔案。

2. 環境配置

2.1 建立開發機和conda環境

在建立開發機介面選擇映象為 Cuda11.7-conda,並選擇 GPU 為10% A100。,建立開發機

2.2 安裝——面向GPU的環境安裝

conda create -n opencompass python=3.10
conda activate opencompass
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia -y

# 注意:一定要先 cd /root
cd /root
git clone -b 0.2.4 https://github.com/open-compass/opencompass
cd opencompass
pip install -e .


apt-get update
apt-get install cmake
pip install -r requirements.txt
pip install protobuf

3. 資料準備

3.1 評測資料集

解壓測評資料集

cp /share/temp/datasets/OpenCompassData-core-20231110.zip /root/opencompass/
unzip OpenCompassData-core-20231110.zip

3.2 InternLM和ceval 相關的配置檔案

列出所有跟 InternLM 及 C-Eval 相關的配置

python tools/list_configs.py internlm ceval

將會看到:
+----------------------------------------+----------------------------------------------------------------------+
| Model | Config Path |
|----------------------------------------+----------------------------------------------------------------------|
| hf_internlm2_1_8b | configs/models/hf_internlm/hf_internlm2_1_8b.py |
| hf_internlm2_20b | configs/models/hf_internlm/hf_internlm2_20b.py |
| hf_internlm2_7b | configs/models/hf_internlm/hf_internlm2_7b.py |
| hf_internlm2_base_20b | configs/models/hf_internlm/hf_internlm2_base_20b.py |
| hf_internlm2_base_7b | configs/models/hf_internlm/hf_internlm2_base_7b.py |
| hf_internlm2_chat_1_8b | configs/models/hf_internlm/hf_internlm2_chat_1_8b.py |
| hf_internlm2_chat_1_8b_sft | configs/models/hf_internlm/hf_internlm2_chat_1_8b_sft.py |
| hf_internlm2_chat_20b | configs/models/hf_internlm/hf_internlm2_chat_20b.py |
| hf_internlm2_chat_20b_sft | configs/models/hf_internlm/hf_internlm2_chat_20b_sft.py |
| hf_internlm2_chat_20b_with_system | configs/models/hf_internlm/hf_internlm2_chat_20b_with_system.py |
| hf_internlm2_chat_7b | configs/models/hf_internlm/hf_internlm2_chat_7b.py |
| hf_internlm2_chat_7b_sft | configs/models/hf_internlm/hf_internlm2_chat_7b_sft.py |
| hf_internlm2_chat_7b_with_system | configs/models/hf_internlm/hf_internlm2_chat_7b_with_system.py |
| hf_internlm2_chat_math_20b | configs/models/hf_internlm/hf_internlm2_chat_math_20b.py |
| hf_internlm2_chat_math_20b_with_system | configs/models/hf_internlm/hf_internlm2_chat_math_20b_with_system.py |
| hf_internlm2_chat_math_7b | configs/models/hf_internlm/hf_internlm2_chat_math_7b.py |
| hf_internlm2_chat_math_7b_with_system | configs/models/hf_internlm/hf_internlm2_chat_math_7b_with_system.py |
| hf_internlm_20b | configs/models/hf_internlm/hf_internlm_20b.py |
| hf_internlm_7b | configs/models/hf_internlm/hf_internlm_7b.py |
| hf_internlm_chat_20b | configs/models/hf_internlm/hf_internlm_chat_20b.py |
| hf_internlm_chat_7b | configs/models/hf_internlm/hf_internlm_chat_7b.py |
| hf_internlm_chat_7b_8k | configs/models/hf_internlm/hf_internlm_chat_7b_8k.py |
| hf_internlm_chat_7b_v1_1 | configs/models/hf_internlm/hf_internlm_chat_7b_v1_1.py |
| internlm_7b | configs/models/internlm/internlm_7b.py |
| ms_internlm_chat_7b_8k | configs/models/ms_internlm/ms_internlm_chat_7b_8k.py |
+----------------------------------------+----------------------------------------------------------------------+
+--------------------------------+-------------------------------------------------------------------+
| Dataset | Config Path |
|--------------------------------+-------------------------------------------------------------------|
| ceval_clean_ppl | configs/datasets/ceval/ceval_clean_ppl.py |
| ceval_contamination_ppl_810ec6 | configs/datasets/contamination/ceval_contamination_ppl_810ec6.py |
| ceval_gen | configs/datasets/ceval/ceval_gen.py |
| ceval_gen_2daf24 | configs/datasets/ceval/ceval_gen_2daf24.py |
| ceval_gen_5f30c7 | configs/datasets/ceval/ceval_gen_5f30c7.py |
| ceval_ppl | configs/datasets/ceval/ceval_ppl.py |
| ceval_ppl_1cd8bf | configs/datasets/ceval/ceval_ppl_1cd8bf.py |
| ceval_ppl_578f8d | configs/datasets/ceval/ceval_ppl_578f8d.py |
| ceval_ppl_93e5ce | configs/datasets/ceval/ceval_ppl_93e5ce.py |
| ceval_zero_shot_gen_bd40ef | configs/datasets/ceval/ceval_zero_shot_gen_bd40ef.py |
| configuration_internlm | configs/datasets/cdme/internlm2-chat-7b/configuration_internlm.py |
| modeling_internlm2 | configs/datasets/cdme/internlm2-chat-7b/modeling_internlm2.py |
| tokenization_internlm | configs/datasets/cdme/internlm2-chat-7b/tokenization_internlm.py |
+--------------------------------+-------------------------------------------------------------------+

4. 啟動測評

4.1 使用命令列配置引數法進行評測

開啟 opencompass資料夾下configs/models/hf_internlm/的hf_internlm2_chat_1_8b.py ,貼入以下程式碼

from opencompass.models import HuggingFaceCausalLM


models = [
    dict(
        type=HuggingFaceCausalLM,
        abbr='internlm2-1.8b-hf',
        path="/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b",
        tokenizer_path='/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b',
        model_kwargs=dict(
            trust_remote_code=True,
            device_map='auto',
        ),
        tokenizer_kwargs=dict(
            padding_side='left',
            truncation_side='left',
            use_fast=False,
            trust_remote_code=True,
        ),
        max_out_len=100,
        min_out_len=1,
        max_seq_len=2048,
        batch_size=8,
        run_cfg=dict(num_gpus=1, num_procs=1),
    )
]

確保按照上述步驟正確安裝 OpenCompass 並準備好資料集後,可以透過以下命令評測 InternLM2-Chat-1.8B 模型在 C-Eval 資料集上的效能。由於 OpenCompass 預設並行啟動評估過程,我們可以在第一次執行時以 --debug 模式啟動評估,並檢查是否存在問題。在 --debug 模式下,任務將按順序執行,並實時列印輸出

#環境變數配置
export MKL_SERVICE_FORCE_INTEL=1
#或
export MKL_THREADING_LAYER=GNU

執行評估:

python run.py --datasets ceval_gen --models hf_internlm2_chat_1_8b --debug

執行時報錯:

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.1 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "/root/opencompass/run.py", line 1, in <module>
    from opencompass.cli.main import main
  File "/root/opencompass/opencompass/cli/main.py", line 9, in <module>
    from opencompass.partitioners import MultimodalNaivePartitioner
  File "/root/opencompass/opencompass/partitioners/__init__.py", line 1, in <module>
    from .mm_naive import *  # noqa: F401, F403
  File "/root/opencompass/opencompass/partitioners/mm_naive.py", line 8, in <module>
    from .base import BasePartitioner
  File "/root/opencompass/opencompass/partitioners/base.py", line 9, in <module>
    from opencompass.utils import (dataset_abbr_from_cfg, get_logger,
  File "/root/opencompass/opencompass/utils/__init__.py", line 4, in <module>
    from .collect_env import *  # noqa
  File "/root/opencompass/opencompass/utils/collect_env.py", line 2, in <module>
    from mmengine.utils.dl_utils import collect_env as collect_base_env
  File "/root/.conda/envs/opencompass/lib/python3.10/site-packages/mmengine/utils/dl_utils/__init__.py", line 3, in <module>
    from .collect_env import collect_env
  File "/root/.conda/envs/opencompass/lib/python3.10/site-packages/mmengine/utils/dl_utils/collect_env.py", line 10, in <module>
    import torch
  File "/root/.conda/envs/opencompass/lib/python3.10/site-packages/torch/__init__.py", line 1382, in <module>
    from .functional import *  # noqa: F403
  File "/root/.conda/envs/opencompass/lib/python3.10/site-packages/torch/functional.py", line 7, in <module>
    import torch.nn.functional as F
  File "/root/.conda/envs/opencompass/lib/python3.10/site-packages/torch/nn/__init__.py", line 1, in <module>
    from .modules import *  # noqa: F403
  File "/root/.conda/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/__init__.py", line 35, in <module>
    from .transformer import TransformerEncoder, TransformerDecoder, \
  File "/root/.conda/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/transformer.py", line 20, in <module>
    device: torch.device = torch.device(torch._C._get_default_device()),  # torch.device('cpu'),
/root/.conda/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at /opt/conda/conda-bld/pytorch_1702400410390/work/torch/csrc/utils/tensor_numpy.cpp:84.)
  device: torch.device = torch.device(torch._C._get_default_device()),  # torch.device('cpu'),
Traceback (most recent call last):
  File "/root/opencompass/run.py", line 1, in <module>
    from opencompass.cli.main import main
  File "/root/opencompass/opencompass/cli/main.py", line 14, in <module>
    from opencompass.utils.run import (exec_mm_infer_runner, fill_eval_cfg,
  File "/root/opencompass/opencompass/utils/run.py", line 7, in <module>
    from opencompass.datasets.custom import make_custom_dataset_config
  File "/root/opencompass/opencompass/datasets/__init__.py", line 1, in <module>
    from .advglue import *  # noqa: F401, F403
  File "/root/opencompass/opencompass/datasets/advglue.py", line 4, in <module>
    from datasets import Dataset, concatenate_datasets
  File "/root/.conda/envs/opencompass/lib/python3.10/site-packages/datasets/__init__.py", line 18, in <module>
    from .arrow_dataset import Dataset
  File "/root/.conda/envs/opencompass/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 59, in <module>
    import pandas as pd
  File "/root/.conda/envs/opencompass/lib/python3.10/site-packages/pandas/__init__.py", line 22, in <module>
    from pandas.compat import is_numpy_dev as _is_numpy_dev  # pyright: ignore # noqa:F401
  File "/root/.conda/envs/opencompass/lib/python3.10/site-packages/pandas/compat/__init__.py", line 18, in <module>
    from pandas.compat.numpy import (
  File "/root/.conda/envs/opencompass/lib/python3.10/site-packages/pandas/compat/numpy/__init__.py", line 4, in <module>
    from pandas.util.version import Version
  File "/root/.conda/envs/opencompass/lib/python3.10/site-packages/pandas/util/__init__.py", line 2, in <module>
    from pandas.util._decorators import (  # noqa:F401
  File "/root/.conda/envs/opencompass/lib/python3.10/site-packages/pandas/util/_decorators.py", line 14, in <module>
    from pandas._libs.properties import cache_readonly
  File "/root/.conda/envs/opencompass/lib/python3.10/site-packages/pandas/_libs/__init__.py", line 13, in <module>
    from pandas._libs.interval import Interval
  File "pandas/_libs/interval.pyx", line 1, in init pandas._libs.interval
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject.   

根據gpt的hint直接降級安裝:
pip install numpy==1.24.3

評估結果:
image

4.2 使用配置檔案修改引數法進行評測

//todo

相關文章