Ascend C 自定義PRelu運算元

华为云开发者联盟發表於2024-04-08

本文分享自華為雲社群《Ascend C 自定義PRelu運算元》,作者: jackwangcumt。

1 PRelu運算元概述

PReLU是 Parametric Rectified Linear Unit的縮寫,首次由何凱明團隊提出,和LeakyReLU非常類似,是Relu的改進版本,在幾乎沒有增加額外引數的前提下既可以提升模型的擬合能力,又能減小過擬合風險。PReLU的數學表示式我們可以參考pytorch中PReLU的描述(https://pytorch.org/docs/2.1/generated/torch.nn.PReLU.html#prelu):

QQ截圖20240407151901.png

2 Ascend C自定義運算元

基於Ascend C進行自定義運算元開發之前,需要成功基於昇騰裝置安裝相關的驅動、韌體以及開發者套件。我之前安裝的開發者套件版本過低,編譯執行官方的Sample部分示例會報錯,因此,需要重新安裝一個8.0新版本,依次用root執行如下命令:

# 解除安裝 cann-toolkit_7.0.RC1
root@atlas500ai:/home/kzroot/mysoft# ./Ascend-cann-toolkit_7.0.RC1_linux-aarch64.run --uninstall
# 清空遺留檔案
rm -rf /usr/local/Ascend/ascend-toolkit/*
# 安裝 cann-toolkit_8.0.RC1.alpha002
./Ascend-cann-toolkit_8.0.RC1.alpha002_linux-aarch64.run --install --install-for-all --quiet
#安裝依賴protobuf
pip3 install protobuf==3.20.0

在一個目錄下新建單運算元工程描述檔案 PReluCustom.json ,內容參考如下:

[
    {
        "op": "PReluCustom",
        "language": "cpp",
        "input_desc": [
            {
                "name": "x",
                "param_type": "required",
                "format": [
                    "ND"
                ],
                "type": [
                    "float"
                ]
            }
        ],
        "output_desc": [
            {
                "name": "y",
                "param_type": "required",
                "format": [
                    "ND"
                ],
                "type": [
                    "float"
                ]
            }
        ],
        "attr": [
            {
                "name": "alpha",
                "param_type": "optional",
                "type": "float",
                "default_value": "0.002"
            }
        ]
    }
]

用開發者套件中內建的運算元工程生成工具msopgen ,透過描述檔案自動生成單運算元工程程式碼目錄:

/usr/local/Ascend/ascend-toolkit/8.0.RC1.alpha002/python/site-packages/bin/msopgen gen -i ./PReluCustom.json  
-c ai_core-Ascend310P3 -lan cpp -out ./PReluCustom

執行成功後,會基於C++語言生成單運算元工程程式碼目錄PReluCustom,其中包含的CMakePresets.json檔案,有幾個重要的配置項,特別是開發者套件安裝的路徑ASCEND_CANN_PACKAGE_PATH,需要根據本地情況進行修改,我這裡是 /usr/local/Ascend/ascend-toolkit/latest 否則會出現編譯錯誤,我這裡修改的部分程式碼如下:

{
    "version": 1,
    "cmakeMinimumRequired": {
        "major": 3,
        "minor": 19,
        "patch": 0
    },
    "configurePresets": [
        {
            "name": "default",
            "displayName": "Default Config",
            "description": "Default build using Unix Makefiles generator",
            "generator": "Unix Makefiles",
            "binaryDir": "${sourceDir}/build_out",
            "cacheVariables": {
                "CMAKE_BUILD_TYPE": {
                    "type": "STRING",
                    "value": "Release"
                },
                "ENABLE_SOURCE_PACKAGE": {
                    "type": "BOOL",
                    "value": "True"
                },
                "ENABLE_BINARY_PACKAGE": {
                    "type": "BOOL",
                    "value": "True"
                },
                "ASCEND_COMPUTE_UNIT": {
                    "type": "STRING",
                    "value": "ascend310p"
                },
                "ENABLE_TEST": {
                    "type": "BOOL",
                    "value": "True"
                },
                "vendor_name": {
                    "type": "STRING",
                    "value": "customize"
                },
                "ASCEND_CANN_PACKAGE_PATH": {
                    "type": "PATH",
                    "value": "/usr/local/Ascend/ascend-toolkit/latest"
                },
                "ASCEND_PYTHON_EXECUTABLE": {
                    "type": "STRING",
                    "value": "python3"
                },
                "CMAKE_INSTALL_PREFIX": {
                    "type": "PATH",
                    "value": "${sourceDir}/build_out"
                },
                "ENABLE_CROSS_COMPILE": {
                    "type": "BOOL",
                    "value": "False"
                },
                "CMAKE_CROSS_PLATFORM_COMPILER": {
                    "type": "PATH",
                    "value": "/usr/bin/aarch64-linux-gnu-g++"
                }
            }
        }
    ]
}

其中的vendor_name 可以根據自己的情況進行修改,預設的運算元部署後會放於customize 目錄下,這裡可以修改,比如改成jackwangcumt。而且單運算元工程每次部署會進行覆蓋,因此,這裡需要注意一下。生成的p_relu_custom.cpp檔案,重點的運算元計算為:

    __aicore__ inline void Compute(int32_t progress)
    {
        // deque input tensors from VECIN queue
        LocalTensor<float> xLocal = inQueueX.DeQue<float>();
        LocalTensor<float> yLocal = outQueueY.AllocTensor<float>();
        LocalTensor<float> tmpTensor1 = tmpBuffer1.Get<float>();
        float inputVal = 0.0;
        Maxs(tmpTensor1, xLocal, inputVal, this->tileLength); // x >= 0  --> x
        // x < 0 
        Mins(xLocal, xLocal, inputVal, this->tileLength);
        Muls(xLocal, xLocal, this->alpha, this->tileLength);
        Add(yLocal, xLocal, tmpTensor1, this->tileLength);
        outQueueY.EnQue<float>(yLocal);
        // free input tensors for reuse
        inQueueX.FreeTensor(xLocal);
    }

這裡透過內建的原生運算元來分別處理輸入x<0和x>=0兩個部分的資料處理,再透過Add將兩個部分合並,得到最終的資料。在op_host目錄下的p_relu_custom_tiling.h程式碼如下所示:

#include "register/tilingdata_base.h"

namespace optiling {
BEGIN_TILING_DATA_DEF(TilingData)
  TILING_DATA_FIELD_DEF(uint32_t, totalLength);
  TILING_DATA_FIELD_DEF(uint32_t, tileNum);
  TILING_DATA_FIELD_DEF(float, alpha);
END_TILING_DATA_DEF;

REGISTER_TILING_DATA_CLASS(PReluCustom, TilingData)
}

p_relu_custom.cpp 核心程式碼如下所示:

#include "p_relu_custom_tiling.h"
#include "register/op_def_registry.h"
namespace optiling {

const uint32_t BLOCK_DIM = 8;
const uint32_t TILE_NUM = 16 ; // 這個數可能影響測試是否透過

static ge::graphStatus TilingFunc(gert::TilingContext* context)
{

        TilingData tiling;
        uint32_t totalLength = context->GetInputTensor(0)->GetShapeSize();
        const gert::RuntimeAttrs *attrs = context->GetAttrs();
        const float *alpha = attrs->GetAttrPointer<float>(0);

        context->SetBlockDim(BLOCK_DIM);
        tiling.set_totalLength(totalLength);
        tiling.set_tileNum(TILE_NUM);
        tiling.set_alpha(*alpha);

        tiling.SaveToBuffer(context->GetRawTilingData()->GetData(), context->GetRawTilingData()->GetCapacity());
        context->GetRawTilingData()->SetDataSize(tiling.GetDataSize());

        size_t *currentWorkspace = context->GetWorkspaceSizes(1);
        currentWorkspace[0] = 0;

        return ge::GRAPH_SUCCESS;
}
}
namespace ge {
static ge::graphStatus InferShape(gert::InferShapeContext* context)
{
    const gert::Shape* x1_shape = context->GetInputShape(0);
    gert::Shape* y_shape = context->GetOutputShape(0);
    *y_shape = *x1_shape;
    return GRAPH_SUCCESS;
}
}
namespace ops {
class PReluCustom : public OpDef {
public:
    explicit PReluCustom(const char* name) : OpDef(name)
    {
        this->Input("x")
            .ParamType(REQUIRED)
            .DataType({ge::DT_FLOAT})
            .Format({ge::FORMAT_ND})
            .UnknownShapeFormat({ge::FORMAT_ND});
        this->Output("y")
            .ParamType(REQUIRED)
            .DataType({ge::DT_FLOAT})
            .Format({ge::FORMAT_ND})
            .UnknownShapeFormat({ge::FORMAT_ND});
        this->Attr("alpha").AttrType(OPTIONAL).Float(0.002);

        this->SetInferShape(ge::InferShape);

        this->AICore()
            .SetTiling(optiling::TilingFunc);
        this->AICore().AddConfig("ascend310p");

    }
};

OP_ADD(PReluCustom);
}

執行如下命令,編譯運算元工程:

root@atlas500ai:/home/kzroot/mysoft/myAscendC/PReluSample/PReluCustom# bash build.sh 

QQ截圖20240407154618.png

Self-extractable archive "custom_opp_ubuntu_aarch64.run" successfully created. 則表明編譯成功。執行如下命令進行運算元部署:

PReluCustom# ./build_out/custom_opp_ubuntu_aarch64.run

QQ截圖20240407155028.png

3 Ascend C自定義運算元驗證

基於Ascend C 自定義運算元需要進行正確性驗證,這裡新建一個AclNNInvocation目錄(可以參考官方示例中的相關內容),目錄結構如下所示:

QQ截圖20240407155301.png

其中的gen_data.py用於生成測試的輸入和輸出資料,verity_result.py用於驗證精度。gen_data.py內容如下所示:

import numpy as np
import os

def gen_golden_data_simple():
    alpha = np.array(0.002, dtype=np.float32)
    input_x = np.random.uniform(-100, 100, [8, 200, 1024]).astype(np.float32)
    golden = np.where(input_x >= 0, input_x, input_x * alpha).astype(np.float32)
    os.system("mkdir -p input")
    os.system("mkdir -p output")
    input_x.tofile("./input/input_x.bin")
    golden.tofile("./output/golden.bin")

if __name__ == "__main__":
    gen_golden_data_simple()

src目錄下的CMakeLists.txt有一個環境變數可能需要修改,即 set(CUST_PKG_PATH "${INC_PATH}/opp/vendors/customize/op_api") ,預設是不需要修改的,他需要和vendor_name一致。執行如下命令進行測試:

PReluSample/AclNNInvocation# bash run.sh

QQ截圖20240407155800.png

點選關注,第一時間瞭解華為雲新鮮技術~

相關文章