Bring DNNL to TVM: JSON Codegen/Runtime
現在實現將中繼圖序列化為JSON表示的DNNL codegen,然後實現DNNL JSON runtime來反序列化和執行該圖。請注意,如果嘗試實現codegen來生成C相容的程式,可能需要直接進入下一節。
要使TVM中的DNNL JSON codegen/runtime在本例中工作,請確保DNNL在計算機上可用,並在中使用set(USE_DNNL_CODEGEN ON)構建TVM配置檔案製作。
DNNL codegen在src/relay/backend/contrib/dnnl/codegen.cc。 因為在這個檔案中的兩個表單中都實現了DNNL codegen,所以在跟蹤程式碼時,可以將注意力集中在USE_JSON_RUNTIME巨集所涵蓋的部分。
runtime::Module DNNLCompiler(const ObjectRef& ref) {
// “ref” should be the paritioned Relay function with kCompiler=dnnl.
auto func = Downcast(ref);
// Get the function name as the symbol to match in runtime.
auto func_name = GetExtSymbol(func);
// Serialize the function to a JSON string (introduce later).
DNNLJSONSerializer serializer(func_name, func);
std::string graph_json = serializer.GetJSON();
// The constant tensor names that have been bound to the module.
// All constant tensors will be serialzied along with the JSON graph
// when export_library is invoked.
auto params = serializer.GetParams();
// The function to create DNNL JSON runtime (introduce later).
const auto* pf = runtime::Registry::Get(“runtime.DNNLJSONRuntimeCreate”);
CHECK(pf != nullptr) << “Cannot find JSON runtime module to create”;
// Create a DNNL runtime module that can run the serialized function.
auto mod = (*pf)(func_name, graph_json, params);
return mod;
每個 runtime模組只負責一箇中繼函式,這意味著您可能在一個single .so檔案中有多個DNNL runtime模組。
我們從BYOC JSON codegen (src/relay/backend/contrib/codegen_json/codegen_json.h)派生而來。DNNL JSON serializer中的特殊程式嘗試序列化對可由DNNL JSON runtime解釋的JSON節點的複合函式呼叫。假設我們有一個與模式匹配的複合函式dnnl.conv2d_relu公司,則BYOC JSON codegen將生成以下JSON節點:
op: “kernel”,
name: “dnnl.conv2d_relu”,
inputs: [[0, 0, 0], [1, 0, 0]],
attrs: {
PartitionedFromPattern: [“nn.conv2d_nn.relu_”],
shape: [1, 32, 14, 14]
問題是在runtime仍然需要Conv2D屬性,比如padding和stripes,但是BYOC JSON序列化程式只附加複合函式的屬性,而不附加body運算元。另一方面,定製的DNNL JSON序列化程式在複合函式中附加第一個也是唯一一個Conv2D的屬性,以生成以下JSON節點:
op: “kernel”,
name: “dnnl.conv2d_relu”,
inputs: [[0, 0, 0], [1, 0, 0]],
attrs: {
shape: [1, 32, 14, 14],
data_layout: [“NCHW”],
kernel_layout: [“OIHW”],
strides: [1, 1],
padding: [1, 1, 1, 1]
從DNNL JSON序列化程式可以看出,只要JSON runtime能夠解釋,就可以定製序列化程式以生成JSON格式的任何表單。
實現一個DNNL JSON runtime來解釋和執行序列化的JSON圖。把它放在src/runtime/contrib/dnnl/dnnl_json_runtime.cc。
同樣,首先註冊兩個api來建立 runtime,這樣就可以在任何地方使用。這個runtime.DNNLJSONRuntimeCreate序列化後在上一部分中使用,並且runtime.module.loadbinary_dnnl_json可以在載入.so back時使用。
// Create a DNNL JSON runtime to interpret and execute the given JSON graph.
runtime::Module DNNLJSONRuntimeCreate(String symbol_name, String graph_json,
const Array& const_names) {
auto n = make_object(symbol_name, graph_json, const_names);
return runtime::Module(n);
Now we explain DNNL JSON runtime implementation. The basic class structure is:
class DNNLJSONRuntime : public JSONRuntimeBase {
const char* type_key() const { return “dnnl_json”; }
void Init(const Array& consts) override {
// Initialize the DNNL graph engine.
// Setup constants entries for weights.
CHECK_EQ(consts.size(), const_idx_.size())
<< "The number of input constants must match the number of required.";
void Run() override {
// 1. Fill in the input buffers.
// 2. Invoke the engine through intepreting the stream.
// 3. Read and fill output buffers.
Init函式負責通過解釋JSON圖形字串來構建DNNL引擎(BuildEngine請參閱L93),並將常量權重填充到相應的資料輸入緩衝區(SetupConstant在JSON runtime基類中實現,只需在Init中呼叫它)。
由於DNNL JSON runtime中的rest實現太過DNNL特定,因此在本文中我們將停止討論。要強調的是,雖然DNNL JSON runtime是一個很好的參考,但是JSON runtime可以完全定製以滿足需求。
Bring DNNL to TVM: C Source Codegen
現在讓我們實現DNNL codegen,它生成C原始碼,它呼叫dnnlapi來執行中繼圖表。註釋如果試圖實現一個codegen來生成JSON格式的其他圖形表示,那麼可能需要閱讀DNNL to TVM: JSON Codegen/Runtime並跳過這一節。
要使TVM中的DNNL C原始碼生成在本例中工作,確保DNNL在計算機上可用,並在中使用set(USE_DNNL_CODEGEN C_SRC)構建TVM配置檔案製作.
DNNL codegen在src/relay/backend/contrib/dnnl/codegen.cc。由於在這個檔案中的兩個表單中都實現了DNNL codegen,所以在跟蹤程式碼時,可以將注意力集中在USE_JSON_RUNTIME runtime巨集未涵蓋的部分。
runtime::Module DNNLCompiler(const ObjectRef& ref) {
DNNLModuleCodegen dnnl;
return dnnl.CreateCSourceModule(ref);
每個 runtime模組只負責一箇中繼函式,這意味著您可能在single .so檔案中有多個DNNL runtime模組。
runtime::Module CreateCSourceModule(const ObjectRef& ref) override {
// Include headers
// …skip…
code_stream_ << “#include <dnnl/dnnl_kernel.h>\n”;
// …skip…
// "ref" should be the paritioned Relay function with kCompiler=dnnl.
auto res = GenDNNLFunc(Downcast<Function>(ref));
// "code" is the generated C code with DNNL APIs.
std::string code = code_stream_.str();
// "res" is a tuple of constant weights (symbols, values).
// All constant tensors will be serialzied along with the generated C code
// when export_library is invoked.
String sym = std::get<0>(res);
Array<String> variables = std::get<1>(res);
// Create a CSource module with all above artifacts.
const auto* pf = runtime::Registry::Get("runtime.CSourceModuleCreate");
CHECK(pf != nullptr) << "Cannot find csource module to create the external runtime module";
return (*pf)(code, "c", sym, variables);
接下來,實現GenDNNLFunc(L365),用DNN API生成可編譯的C程式碼,如下所示。有關TVM C source runtime模組相容函式介面的說明,請參閱嵌入的註釋。
// The example Relay graph: conv2d -> add -> relu.
#include <tvm/runtime/c_runtime_api.h>
#include <tvm/runtime/container.h>
#include <tvm/runtime/packed_func.h>
#include <dlpack/dlpack.h>
#include <dnnl/dnnl_kernel.h>
using namespace tvm::runtime;
using namespace tvm::runtime::contrib;
// Execute the conv2d->add->relu graph with DNNL.
extern “C” void dnnl_0_(float* dnnl_0_i0, float* dnnl_0_i1,
float* dnnl_0_i2, float* out0) {
// Allocate intermediate buffers.
float* buf_0 = (float*)std::malloc(4 * 4608);
float* buf_1 = (float*)std::malloc(4 * 4608);
float* buf_2 = (float*)std::malloc(4 * 4608);
// Pre-implemented op-based DNNL functions.
dnnl_conv2d(dnnl_0_i0, dnnl_0_i1, buf_0, 1, 32, 14, 14, 32, 1, 0, 0, 3, 3, 1, 1);
dnnl_add(buf_0, dnnl_0_i2, buf_1, 1, 32, 12, 12);
dnnl_relu(buf_1, buf_2, 1, 32, 12, 12);
// Copy the final output to the corresponding buffer.
std::memcpy(out0, buf_2, 4 * 4608);
// The wrapper function with all arguments in DLTensor type.
extern “C” int dnnl_0_wrapper_(DLTensor* arg0,
DLTensor* arg1,
DLTensor* arg2,
DLTensor* out0) {
// Cast all DLTensor to primitive type buffers and invoke the above
// execution function.
return 0;
// The TVM macro to generate TVM runtime compatible function “dnnl_0”
// from our generated “dnnl_0_wrapper_”.
TVM_DLL_EXPORT_TYPED_FUNC(dnnl_0, dnnl_0_wrapper_);
C Source Compilation
def update_lib(lib):
# Include the path of src/runtime/contrib/dnnl/dnnl.cc
test_dir = os.path.dirname(os.path.realpath(os.path.expanduser(file)))
source_dir = os.path.join(test_dir, “…”, “…”, “…”)
contrib_path = os.path.join(source_dir, “src”, “runtime”, “contrib”)
# Setup the gcc flag to compile DNNL code.
kwargs = {}
kwargs["options"] = ["-O2", "-std=c++14", "-I" + contrib_path]
tmp_path = util.tempdir()
lib_name = 'lib.so'
lib_path = tmp_path.relpath(lib_name)
# The generated C code with DNNL APIs is compiled to a binary lib.so.
lib.export_library(lib_path, fcompile=False, **kwargs)
# Load the lib.so back to a runtime module.
lib = runtime.load_module(lib_path)
return lib
with tvm.transform.PassContext(opt_level=3):
json, lib, param = relay.build(mod, target=target, params=params)
lib = update_lib(lib)
rt_mod = tvm.contrib.graph_runtime.create(json, lib, ctx)
Bring DNNL to TVM: Build TVM with DNNL Codegen/Runtime
最後,建立cmake/modules/contrib/DNNL.cmake在構建TVM時包括DNNL codegen。為了演示,DNNL codegen在同一個cmake檔案中有兩個實現。只能根據需要專注於其中的一個。
cmake檔案就緒後,現在使用者可以在其構建中指定set(USE_DNNL_CODEGEN ON)的build/config.cmake配置檔案製作啟用DNNL codegen。
