[原始碼解析]PyTorch如何實現前向傳播(1) --- 基礎類(上)

羅西的思考發表於2021-10-18

原文網址 : https://www.cnblogs.com/rossiXYZ/p/15421453.html

原始碼PyTorch

[原始碼解析]PyTorch如何實現前向傳播(1) --- 基礎類(上)

0x00 摘要

本系列將通過大概十篇左右文章來分析 PyTorch 的自動微分功能如何實現。本文是前向傳播的第一篇，介紹自動微分（梯度計算）所涉及的部分 PyTorch 基礎類。因為字數太多（1萬兩千字），所以拆分成上下兩篇。

系列前幾篇連線如下：

深度學習利器之自動微分(1)

深度學習利器之自動微分(2)

深度學習利器之自動微分(3) --- 示例解讀

0x01 總體邏輯

為了行文完整，我們從前文結尾處摘取了總體邏輯關係如下。

如果從計算圖角度來看前向計算的過程，就是在構建圖和執行圖。"構建圖"描述的是節點運算之間的關係。"執行圖"則是在會話中執行這個運算關係，就是張量在計算圖之中進行前向傳播的過程。

前向計算依賴一些基礎類，在具體分析前向傳播之前，我們先要看看這些基礎類之間的邏輯關係。從DAG角度來分析 PyTorch 這個系統，其具體邏輯如下。

圖表示計算任務。PyTorch把計算都當作是一種有向無環圖，或者說是計算圖，但這是一種虛擬的圖，程式碼中沒有真實的資料結構。
計算圖由節點（Node）和邊（Edge）組成。
節點（Node）代表了運算操作。
- 一個節點通過邊來獲得 0 個或多個 Tensor，節點執行計算之後會產生 0 個或多個 Tensor。
- 節點的成員變數 next_functions 是一個 tuple 列表，此列表就代表本節點要輸出到哪些其他 Function。列表個數就是這個 grad_fn 的 Edge 數目，列表之中每一個 tuple 對應一條 Edge 資訊，內容就是 (Edge.function, Edge.input_nr)。
邊（Edge）就是運算操作之間的流向關係。
- Edge.function ：表示此 Edge 需要輸出到哪一個其他 Function。
- Edge.input_nr ：指定本 Edge 是 Function 的第幾個輸入。
使用張量（ Tensor） 表示資料，就是在節點間流動的資料，如果沒有資料，計算圖就沒有任何意義。

具體可以參見下圖：

+---------------------+              +----------------------+
| SubBackward0        |              | PowBackward0         |
|                     |      Edge    |                      |  Edge
|   next_functions  +-----+--------> |     next_functions +----------> ...
|                     |   |          |                      |
+---------------------+   |          +----------------------+
                          |
                          |
                          |          +----------------------+
                          |  Edge    | MulBackward0         |
                          +--------> |                      |  Edge
                                     |     next_functions +----------> ...
                                     |                      |
                                     +----------------------+

0x02 廢棄類

我們先看看幾個已經廢棄的類，這些類雖然廢棄了，但是程式碼中依然有大量使用，網上也有大量文章與之相關，所以我們有必要先研究一下，我們在文章中可能會混用，還希望大家諒解。

2.1 Variable

早期版本之中，有Tensor和Variable兩種資料結構來儲存資料，Tensor只負責多維陣列的運算。自動微分的職責是Variable完成的。Variable包含了與autograd有關的屬性，可以是計算圖中的葉子節點，也可以是計算時候產生的中間變數。

在0.4.0版本之後，Tensor和Variable 的功能進行了合併，自動微分的使用就更加簡單了。現在，Variable 其實就是Tensor，只是為了向後相容，才保留這個名字。

Variable (deprecated)
^^^^^^^^^^^^^^^^^^^^^

.. warning::
    The Variable API has been deprecated: Variables are no longer necessary to
    use autograd with tensors. Autograd automatically supports Tensors with
    ``requires_grad`` set to ``True``. Below please find a quick guide on what
    has changed:

    - ``Variable(tensor)`` and ``Variable(tensor, requires_grad)`` still work as expected,
      but they return Tensors instead of Variables.
    - ``var.data`` is the same thing as ``tensor.data``.

Variable 的定義在：torch/csrc/autograd/variable.h，我們可以看看註釋中 "Gradient Edges" 的相關部分。可以看出來，"Variable" 具有"gradient_edge"的概念，這是自動梯度計算圖的邊，在反向傳播之中用來把變數和梯度函式的特定輸入聯絡起來。

更準確地說，這個梯度函式可以是兩個函式之一：

grad_fn，如果variable 在圖的內部。這是產生梯度變數的梯度函式。
grad_accumulator，如果變數是一個葉子節點，它將一個標量梯度值累加到它的'grad'變數之中。

namespace torch { namespace autograd {

/// `Variable` is exactly the same as `Tensor` (i.e. we have `using Variable = at::Tensor`).
/// This means you can perform all the usual mathematical and other
/// operations you can perform on `Tensor`s also on `Variable`s.
///
/// The only reason we are keeping the `Variable` class is backward compatibility
/// with external user's legacy C++ frontend code. Our intention is to eliminate
/// the `Variable` class in the near future.
using Variable = at::Tensor;

} // namespace autograd
} // namespace torch

///                              Gradient Edges
///~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/// Furthermore, `Variable`s have the notion of a `gradient_edge`, which is the
/// edge in the autograd graph that connects the variable to a particular input
/// of the gradient function that will be invoked with the variable during the
/// backward pass. More precisely, this gradient function can be one of two
/// things:
/// 1. A `grad_fn`, if the variable is in the interior of the graph. This is the
///    gradient of the function that produced the variable.
/// 2. A `grad_accumulator`, if the variable is a leaf, which accumulates a
///    scalar gradient value into its `grad` variable.

2.2 Function

我們結合前面 Variable 的概念來看，Function 指的是在計算圖中某個節點所進行的運算，比如加減乘除卷積等等。每當對Tensor施加一個運算的時候，就會產生一個Function物件，它記錄運算的輸入，記錄運算的發生，產生運算的結果。Tensor使用.grad_fn屬性記錄這個計算圖的入口。

Function 內部有 forward() 和 backward() 兩個方法，分別應用於正向、反向傳播。反向傳播過程中，autograd引擎會按照逆序，通過Function的backward依次計算梯度。

在最新的程式碼中，Function 已經被 Node 類替代，這樣是為了更好的表達節點這個概念。但是因為舊程式碼中依然使用了 Function，所以我們可能會混用這兩個概念。

Function 定義如下：

/// To use custom autograd operations, implement a Function subclass with
/// static forward and backward functions:
///
/// `forward` can take as many arguments as you want and should return either a
/// variable list or a Variable. Use of any direct Variable arguments will be
/// registered in the graph but no vectors/sets or any other data structures
/// will be traversed. You can use c10::optional<Tensor> as one of the arguments
/// and it will be registered as a variable in the graph if the argument has a
/// value. It should take a pointer to `torch::autograd::AutogradContext` as the
/// first argument. Variables can be saved in the `ctx` using
/// `ctx->save_for_backward`
/// (see `torch::autograd::AutogradContext::save_for_backward`) and other data
/// can be saved in the `ctx->saved_data` map
/// (see `torch::autograd::AutogradContext::saved_data`)
/// in the form of `<std::string, at::IValue>` pairs.
///
/// `backward` should take a pointer to `torch::autograd::AutogradContext`
/// and a variable list containing as many Variables as there were outputs from
/// `forward` as arguments. It should return as many Variables as there were
/// inputs with each of them containing the gradient w.r.t. its corresponding
/// input. Variables saved in `forward` can be accessed with
/// `ctx->get_saved_variables` (see
/// `torch::autograd::AutogradContext::get_saved_variables`) and other saved
/// data can be accessed from `ctx->saved_data`.

template <class T>
struct TORCH_API Function {
  // We need to use a different template parameter than T here because T will
  // inherit from Function, and when Function<T> is instantiated, T::forward
  // is not declared yet.
  // The enable_if check is to ensure that the user doesn't explicitly provide
  // the parameter X.
  template<typename X=T, typename... Args>
  static auto apply(Args&&... args) -> std::enable_if_t<std::is_same<X,T>::value, forward_t<X,Args...>>;
};

0x03 Tensor

前面提到，計算圖構成了前向/反向傳播的結構基礎，而Tensor張量是 PyTorch 中構建計算圖的基礎之一。

Tensor是PyTorch實現多維陣列計算和自動微分的關鍵資料結構。

Tensor類似於numpy的ndarray，可以對Tensor進行各種數學運算；
當設定.requires_grad = True ，在Tensor之上進行的各種操作就會被記錄下來，用於後續梯度計算。

3.1 定義 in python

我們看看第一個例子中執行時的Tensor，其中可以看到Tensor的成員變數。

Q = {Tensor} 
 data = {Tensor} tensor(-12.)
 device = {device} cpu
 dtype = {dtype} torch.float32
 grad = {NoneType} None
 grad_fn = {SubBackward0} 
  metadata = {dict: 0} {}
  next_functions = {tuple: 2} 
   0 = {tuple: 2} (<MulBackward0 object at 0x000001F9547A5848>, 0)
   1 = {tuple: 2} (<PowBackward0 object at 0x000001F9547A53C8>, 0)
   __len__ = {int} 2
  requires_grad = {bool} True
 is_cuda = {bool} False
 is_leaf = {bool} False
 is_meta = {bool} False
 is_mkldnn = {bool} False
 is_mlc = {bool} False
 is_quantized = {bool} False
 is_sparse = {bool} False
 is_sparse_csr = {bool} False
 is_vulkan = {bool} False
 is_xpu = {bool} False
 layout = {layout} torch.strided
 name = {NoneType} None
 names = {tuple: 0} ()
 ndim = {int} 0
 output_nr = {int} 0
 requires_grad = {bool} True
 shape = {Size: 0} torch.Size([])

我們看看其中的部分成員變數：

data：該張量的資料。
dtype ：該張量的資料型別。
device：存放該張量的裝置型別，比如 CPU 或者是 GPU。
grad：儲存資料data對應的梯度，和資料data的形狀一樣。
- PyTorch會自動追蹤和記錄對與張量的所有操作，當前向計算完成後呼叫.backward()方法會自動計算梯度並且將計算結果儲存到grad屬性中。
- requires_grad = False時，grad為None。
- 梯度值不會自動清空，每次在backward計算時都需要將前一時刻的梯度歸零，否則梯度值會一直累加。
grad_fn：指向一個Function物件。
- 這個Function物件用來在反向傳播時候計算輸入的梯度。
- 若本張量是非葉節點，則 Function 是向葉節點方向操作的反向傳播函式，比如例子裡 O 節點對應的函式就是MulBackward，即乘法操作的反向函式；
- 若本張量是葉節點且requires_grad為True，則 grad_fn 是None。
- grad_fn 有一個屬性 next_functions，這是一個二維 tuple，形式為( (函式1，整數1)，(函式2，整數2), ..., (函式n，整數n) )。後續我們會詳細解釋。
is_leaf：記錄該張量是否是葉子節點。
- 使用者顯式初始化的張量是葉子節點。
- 所有requires_grad=False的張量按照慣例也是葉子節點。
- is_leaf 屬性只有在需要求導的時候才有意義。對於任意一個張量來說，我們可以用 tensor.is_leaf 來判斷它是否是葉子張量（leaf tensor）。在反向傳播過程中，只有 is_leaf=True 的時候，需要求導張量的導數結果才會被保留下來。
- 對於葉子節點來說，其 grad_fn 屬性都為空；而對於非葉子結點來說，因為它們是通過一些操作生成的，所以其 grad_fn 不為空。
requires_grad : 設定為True則表示該Tensor需要求導，用於判斷該tensor是否需要被跟蹤並計算梯度。
- requires_grad屬性預設為False，也就是Tensor變數預設是不需要求導的。
- 如果一個節點的requires_grad是True，那麼所有依賴它的節點的requires_grad也會是True。換言之，如果一個節點依賴的所有節點都不需要求導，那麼它的requires_grad也會是False。因此在反向傳播過程中，該節點所在的子圖會被排除在計算過程之外。

Python的定義其實只是C++世界定義的一個對映，我們接下來就看看在C++如何定義。

3.2 查詢定義

我們逐級找找 Tensor的定義。

首先來到：torch_C_VariableFunctions.pyi

def tensor(data: Any, dtype: Optional[_dtype]=None, device: Union[_device, str, None]=None, requires_grad: _bool=False) -> Tensor: ...

然後來到: torch/_tensor.py

3.2.1 Tensor

可以看到Tensor 的基類是 torch._C._TensorBase。

class Tensor(torch._C._TensorBase):

3.2.2 _TensorBase

_TensorBase 是動態生成的，程式碼在比如python_stubs\xxx\torch\_C\_TensorBase.py

class _TensorBase(object):

我們在 torch/_C/__init__.pyi.in可以看到，torch._C._TensorBase 其實就是在 C++世界中定義的，但是需要匯出到 python世界。

# Defined in torch/csrc/autograd/python_variable.cpp
class _TensorBase(metaclass=_TensorMeta):
    requires_grad: _bool
    shape: Size
    data: Tensor
    names: List[str]
    device: _device
    dtype: _dtype
    layout: _layout
    real: Tensor
    imag: Tensor
    T: Tensor
    ndim: _int
    output_nr: _int
    _version: _int
    _base: Optional[Tensor]
    _cdata: _int
    grad_fn: Any
    _grad_fn: Any
    _grad: Optional[Tensor]
    _backward_hooks: Optional[Dict[_int, Callable[[Tensor], Optional[Tensor]]]]
    ${tensor_method_hints}

3.3 轉換

本文只是簡略看看如何從C++世界轉換到Python世界，在此處不做深入研究。

3.3.1 Python 匯入

程式碼中引入 PyTorch 是通過 import torch 完成的。Import torch 的時候，按照Python規範，位於torch/__init__.py中的邏輯就會被執行，torch/__init__.py 的關鍵就是torch._C，程式碼如下：

from torch._C import *

torch._C是C++編譯出來的共享庫檔案，比如linux下的so檔案。

Tensor類就是繼承自torch._C._TensorBase。匯入了 torch._C就匯入了torch._C._TensorBase，然後 torch.Tensor 就有了繼承的基礎。具體如下：

+---------------------------+
|      import torch         |
+------------+--------------+
             |
             |
             v
+------------+--------------+
| torch/__init__.py         |
|                           |
|    from torch._C impor *  |
|                           |
+------------+--------------+
             |
             |
             v
+------------+--------------+
|  torch._C._TensorBase     |
+---------------------------+

所以我們接下來要看看 torch._C 是怎麼來從 C++ 世界中匯出到 python的。

3.3.2 C++ 匯出 & 初始化

接下來我們看看C++世界如何匯出了TensorBase。

要在Python中能夠import torch._C，則必須要使用Python的擴充套件規範來匯出這個符號。

3.3.2.1 共享庫入口

對於一個 Python module，共享庫需要實現 PyInit_modulename 符號來作為import時候的邏輯入口。對於PyTorch來說這個modulename 是_C。在torch/csrc/stub.cpp中實現了PyInit__C這個函式。

#include <Python.h>
extern PyObject* initModule();
PyMODINIT_FUNC PyInit__C()
{
  return initModule();
}

如果使用 JIT，則我們直接看 torch/csrc/deploy/interpreter/interpreter_impl.cpp，這裡省略了眾多程式碼。

struct ConcreteInterpreterImpl : public torch::deploy::InterpreterImpl {
  ConcreteInterpreterImpl() {
    PyImport_AppendInittab("torch._C", initModule);     
}

這就是直譯器的程式碼，裡面也呼叫了 initModule。

3.3.2.2 initModule

initModule函式是對python環境中的torch module進行初始化。其定義在 torch/csrc/Module.cpp，此處省略了眾多程式碼。

PyObject* initModule() {
  THPSize_init(module);
  THPDtype_init(module);
  THPDTypeInfo_init(module);
  THPLayout_init(module);
  THPMemoryFormat_init(module);
  THPQScheme_init(module);
  THPDevice_init(module);
  THPStream_init(module);
  ASSERT_TRUE(THPVariable_initModule(module)); // 繼續分析這裡，其中會設定_TensorBase
  ASSERT_TRUE(THPFunction_initModule(module));
  ASSERT_TRUE(THPEngine_initModule(module));
}

initModule 呼叫 THPVariable_initModule，程式碼在 torch/csrc/autograd/python_variable.cpp，這裡會設定_TensorBase。

bool THPVariable_initModule(PyObject *module)
{
  THPVariableMetaType.tp_base = &PyType_Type;
  if (PyType_Ready(&THPVariableMetaType) < 0)
    return false;
  Py_INCREF(&THPVariableMetaType);
  PyModule_AddObject(module, "_TensorMeta",   (PyObject *)&THPVariableMetaType);

  static std::vector<PyMethodDef> methods;
  THPUtils_addPyMethodDefs(methods, torch::autograd::variable_methods);
  THPUtils_addPyMethodDefs(methods, extra_methods);
  THPVariableType.tp_methods = methods.data();
  if (PyType_Ready(&THPVariableType) < 0)
    return false;
  Py_INCREF(&THPVariableType);
    
  // 設定_TensorBase
  PyModule_AddObject(module, "_TensorBase",   (PyObject *)&THPVariableType);
  torch::autograd::initTorchFunctions(module);
  torch::autograd::initTensorImplConversion(module);
  return true;
}

3.3.2.3 註冊TensorBase

執行THPVariable_initModule的時候，使用如下程式碼來將 THPVariableType 註冊成為torch._C._TensorBase。所以torch._C._TensorBase就是c++中的 THPVariableType。

PyModule_AddObject(module, "_TensorBase",   (PyObject *)&THPVariableType);

我們來看看 THPVariableType。裡面定義了很多函式。

PyTypeObject THPVariableType = {
  PyVarObject_HEAD_INIT(&THPVariableMetaType, 0)
  "torch._C._TensorBase",                      /* tp_name */
  sizeof(THPVariable),                         /* tp_basicsize */
  0,                                           /* tp_itemsize */
  (destructor)THPVariable_dealloc,             /* tp_dealloc */
  // 省略......
  nullptr,                                     /* tp_methods */
  nullptr,                                     /* tp_members */
      
  THPVariable_properties,                      /* tp_getset */  // 重點在這裡，註冊了函式
      
  // 省略......
  THPVariable_pynew,                           /* tp_new */
};

現在我們註冊了torch._C._TensorBase這個Python類，下面就要往這個類上註冊一些函式。

tp_getset 是Python虛擬機器類機制裡面的一個函式集，就是一個 THPVariable_properties。以下是 _TenseBase 的函式集，我們可以看到 grad_fn 和 grad 這兩個熟悉的面孔。

static struct PyGetSetDef THPVariable_properties[] = {
  {"T", (getter)THPVariable_get_T, nullptr, nullptr, nullptr},
  {"_cdata", (getter)THPVariable_get_cdata, nullptr, nullptr, nullptr},
  {"_version", (getter)THPVariable_get_version, nullptr, nullptr, nullptr},
  {"grad_fn", (getter)THPVariable_get_grad_fn, nullptr, nullptr, nullptr},
  {"_grad_fn", (getter)THPVariable_get_grad_fn, (setter)THPVariable_set_grad_fn, nullptr, nullptr},
  {"is_leaf", (getter)THPVariable_is_leaf, nullptr, nullptr, nullptr},
  {"data", (getter)THPVariable_get_data, (setter)THPVariable_set_data, nullptr, nullptr},
  {"_grad", (getter)THPVariable_get_grad, (setter)THPVariable_set_grad, nullptr, nullptr}, // Allows the python class to override .grad
  {"grad", (getter)THPVariable_get_grad, (setter)THPVariable_set_grad, nullptr, nullptr},
  {"_base", (getter)THPVariable_get_base, nullptr, nullptr, nullptr},
  {"volatile", (getter)THPVariable_get_volatile, (setter)THPVariable_set_volatile, nullptr, nullptr},
  {"output_nr", (getter)THPVariable_get_output_nr, nullptr, nullptr, nullptr},
  {"requires_grad", (getter)THPVariable_get_requires_grad, (setter)THPVariable_set_requires_grad, nullptr, nullptr},
  {"_backward_hooks", (getter)THPVariable_get_backwards_hooks, (setter)THPVariable_set_backwards_hooks, nullptr, nullptr},
  {"name", (getter)THPVariable_get_name, nullptr, nullptr, nullptr},
  {"shape", (getter)THPVariable_get_shape, nullptr, nullptr, nullptr},
  {"is_cuda", (getter)THPVariable_is_cuda, nullptr, nullptr, nullptr},
  {"is_xpu", (getter)THPVariable_is_xpu, nullptr, nullptr, nullptr},
  {"is_sparse", (getter)THPVariable_is_sparse, nullptr, nullptr, nullptr},
  {"is_sparse_csr", (getter)THPVariable_is_sparse_csr, nullptr, nullptr, nullptr},
  {"is_mkldnn", (getter)THPVariable_is_mkldnn, nullptr, nullptr, nullptr},
  {"is_mlc", (getter)THPVariable_is_mlc, nullptr, nullptr, nullptr},
  {"is_vulkan", (getter)THPVariable_is_vulkan, nullptr, nullptr, nullptr},
  {"is_complex", (getter)THPVariable_is_complex, nullptr, nullptr, nullptr},
  {"is_quantized", (getter)THPVariable_is_quantized, nullptr, nullptr, nullptr},
  {"is_meta", (getter)THPVariable_is_meta, nullptr, nullptr, nullptr},
  {"dtype", (getter)THPVariable_dtype, nullptr, nullptr, nullptr},
  {"layout", (getter)THPVariable_layout, nullptr, nullptr, nullptr},
  {"device", (getter)THPVariable_device, nullptr, nullptr, nullptr},
  {"ndim", (getter)THPVariable_get_ndim, nullptr, nullptr, nullptr},
  {"names", (getter)THPVariable_get_names, (setter)THPVariable_set_names, nullptr, nullptr},
  {"real", (getter)THPVariable_get_real, (setter)THPVariable_set_real, nullptr, nullptr},
  {"imag", (getter)THPVariable_get_imag, (setter)THPVariable_set_imag, nullptr, nullptr},
  {nullptr}
};

這個初始化邏輯和對映邏輯如下：

                       Python     +     C++                    +---------------+
                                  |                            |               |
+---------------------------+     |                            |   PyInit__C   |
|      import torch         |     |                            |               |
+------------+--------------+     |                            +-------+-------+
             |                    |                                    |
             |                    |                                    |
             v                    |                                    |
+------------+--------------+     |                                    v
| torch/__init__.py         |     |                            +-------+-------+
|                           |     |                            |  initModule   |
|    from torch._C impor *  |     |                            +-------+-------+
|                           |     |                                    |
+------------+--------------+     |                                    |
             |                    |                                    |
             |                    |                                    v
             |                    |                     +--------------+----------------+
             |                    |                     |                               |
             |                    |                     | THPVariable_initModule(module)|
             |                    |                     |                               |
             |                    |                     +--------------+----------------+
             |                    |                                    |
             |                    |                                    |
             |                    |                                    |
             |                    |                                    v
             |                    |    +-------------------------------+---------------------------------------+
             |                    |    |                                                                       |
             |                    |    | PyModule_AddObject(module, "_TensorBase",(PyObject *)&THPVariableType)|
             |                    |    |                                                                       |
             |                    |    +-------------------------------+---------------------------------------+
             |                    |                                    |
             |                    |                                    |
             |                    |                                    |
             |                    |                                    v
             |                    |                        +-----------+--------------+    +------------------------------------------------------+
             |                    |                        | THPVariableType          |    | THPVariable_properties+                              |
             v                    |                        |                          |    |                                                      |
+------------+--------------+     |                        |                          |    |                                                      |
|  torch._C._TensorBase     | <----------------------->    |              tp_getset -----> |  { grad, grad_fn, T, _cdata, is_leaf, output_nr ...} |
+---------------------------+     |                        |                          |    |                                                      |
                                  |                        +--------------------------+    +------------------------------------------------------+
                                  +

手機如下：

3.4 next_functions 設定

因為 next_functions 是精髓，而 next_functions 是在 autograd 之中設定，於是我們需要看看初始化autograd 過程。然後才能知道如何設定 next_functions。

3.5 初始化autograd

我們以 AccumulateGrad 為例來看看如何初始化。

首先看看 AccumulateGrad 的定義，這裡省略了 AccumulateGrad 部分成員函式。從構建函式可看出來，一個AccumulateGrad例項必須用一個Variable構建，內部成員變數就是Variable variable。apply呼叫接收一個Variable list 例項，這和Variable grad_accumulator_相關。

struct TORCH_API AccumulateGrad : public Node {
  explicit AccumulateGrad(Variable variable_);
  variable_list apply(variable_list&& grads) override;
  Variable variable;
};

舊版本之中，定義如下：

struct AccumulateGrad : public Function {
  explicit AccumulateGrad(Variable variable_);
  variable_list apply(variable_list&& grads) override;
  Variable variable;
};

接下來看看如何初始化 AccumulateGrad。

3.5.1 擴充套件

在initModule()函式初始化完畢之後，import torch 的初始化工作還沒有結束。python的初始化指令碼還要繼續處理很多模組，比如torch/__init__.py 檔案中有：

# Check to see if we can load C extensions, and if not provide some guidance
# on what the problem might be.
try:
    # _initExtension is chosen (arbitrarily) as a sentinel.
    from torch._C import _initExtension

_initExtension 會呼叫到 _C._initExtension(manager_path())。_C._initExtension對應的是 THPModule_initExtension。

static PyMethodDef TorchMethods[] = {
  {"_initExtension",  THPModule_initExtension, METH_O, nullptr}, 
  // ....
}

THPModule_initExtension 函式會呼叫THPAutograd_initFunctions，該方法初始化了自動微分系統。

// Callback for python part. Used for additional initialization of python classes
static PyObject * THPModule_initExtension(PyObject *_unused, PyObject *shm_manager_path)
{
  // 省略程式碼
    
  THPQInt8Storage_postInit(module);
  THPQInt32Storage_postInit(module);
  THPBFloat16Storage_postInit(module);
  THPComplexDoubleStorage_postInit(module);
  THPComplexFloatStorage_postInit(module);
    
  THPAutograd_initFunctions();  // 這裡呼叫,初始化了微分系統
    
  // 省略程式碼    
}

THPAutograd_initFunctions 就是在 _TensorBase 基礎之上，再加入新的屬性或者函式集。**這裡會呼叫了addClass 方法，把 AccumulateGrad 和 accumulate_grad_properties 聯絡在一起 **。

void THPAutograd_initFunctions()
{
  THPObjectPtr module(PyModule_New("torch._C._functions"));
  if (!module) throw python_error();

  static PyTypeObject AccumulateGradClass;
  addClass<AccumulateGrad, NoCtor>(module, AccumulateGradClass, "AccumulateGrad", accumulate_grad_properties); // AccumulateGrad 相關
    
  static PyTypeObject CopyBackwardsClass;
  addClass<CopyBackwards, NoCtor>(module, CopyBackwardsClass, "CopyBackwards");   
      
  // 省略其他
}

3.5.2 addClass

addClass 會呼叫到 registerCppFunction 註冊 type（ function_properties），我們這裡引數 function_properties 就是 accumulate_grad_properties，type 就是 AccumulateGradClass。

template<typename C, typename T>
static void addClass(PyObject* module, PyTypeObject& type, const char* name,
  PyGetSetDef* function_properties=nullptr, PyMethodDef* function_methods=nullptr)
{
  // 這裡設定了 accumulate_grad_properties
  createForwardFunctionPyTypeObject<T>(type, name, function_properties, function_methods); 
  Py_INCREF(&type);
  PyModule_AddObject(module, name, (PyObject*)&type);
  // 註冊了 type
  registerCppFunction(typeid(C), &type); 
}

這裡有兩組操作，一個是 createForwardFunctionPyTypeObject，一個是 registerCppFunction。我們逐一看看。我們先看 registerCppFunction，然後看 createForwardFunctionPyTypeObject。

3.5.2.1 accumulate_grad_properties

前面提到，addClass 方法，把 AccumulateGrad 和 accumulate_grad_properties 聯絡在一起。具體來說，就是通過 createForwardFunctionPyTypeObject 把 accumulate_grad_properties 聯絡起來。

accumulate_grad_properties 定義在 torch/csrc/autograd/functions/init.cpp

static struct PyGetSetDef accumulate_grad_properties[] = {
  THP_FUNCTION_DEFAULT_PROPERTIES,
  {(char*)"variable", accumulateGradVar, nullptr, nullptr, nullptr},
  {nullptr}
};

THP_FUNCTION_DEFAULT_PROPERTIES 的定義在 torch/csrc/autograd/python_cpp_function.h

#define THP_FUNCTION_DEFAULT_PROPERTIES \
  {(char*)"next_functions", (getter)THPCppFunction_next_functions, nullptr, nullptr, nullptr}, \
  {(char*)"requires_grad", (getter)THPCppFunction_requires_grad, nullptr, nullptr, nullptr}, \
  {(char*)"metadata", (getter)THPCppFunction_metadata, nullptr, nullptr, nullptr}

PyObject* THPCppFunction_next_functions(THPCppFunction* self, PyObject* hook);
PyObject* THPCppFunction_metadata(THPCppFunction *self, void *_unused);
PyObject* THPCppFunction_requires_grad(THPCppFunction* self, void *_unused);

所以，accumulate_grad_properties 就是擴充了 THP_FUNCTION_DEFAULT_PROPERTIES 和 accumulateGradVar。

static struct PyGetSetDef accumulate_grad_properties[] = {
  // 這裡是我們關注的
  {(char*)"next_functions", (getter)THPCppFunction_next_functions, nullptr, nullptr, nullptr}, 
  {(char*)"requires_grad", (getter)THPCppFunction_requires_grad, nullptr, nullptr, nullptr}, 
  {(char*)"metadata", (getter)THPCppFunction_metadata, nullptr, nullptr, nullptr}
  {(char*)"variable", accumulateGradVar, nullptr, nullptr, nullptr},
  {nullptr}
};

具體邏輯如下，這裡面就有 THPCppFunction_next_functions:

+-----------------------------------------------------------------------+
|accumulate_grad_properties                                             |
|                                                                       |
|                                                                       |
|                                                                       |
|              "variable", accumulateGradVar                            |
|                                                                       |
|                                                                       |
|              "next_functions", (getter)THPCppFunction_next_functions  |
|                                                                       |
|                                                                       |
|              "requires_grad", (getter)THPCppFunction_requires_grad    |
|                                                                       |
|                                                                       |
|              "metadata", (getter)THPCppFunction_metadata              |
|                                                                       |
+-----------------------------------------------------------------------+

3.5.2.3 createForwardFunctionPyTypeObject

createForwardFunctionPyTypeObject 是用來設定accumulate_grad_properties，具體函式如下：

template<typename Ctor>
PyTypeObject* createForwardFunctionPyTypeObject(PyTypeObject& type, const char* name,
  PyGetSetDef* function_properties=nullptr, PyMethodDef* function_methods=nullptr)
{
  type.tp_new = &CppFunction_pynew<Ctor>;
  return _initFunctionPyTypeObject(type, name, function_properties, function_methods);
}

_initFunctionPyTypeObject 就是把 function_properties 設定到 tp_getset 之上。

PyTypeObject* _initFunctionPyTypeObject(PyTypeObject& type, const char* name,
  PyGetSetDef* function_properties, PyMethodDef* function_methods)
{
  type.tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC;
  type.tp_name = name;
  type.tp_basicsize = sizeof(THPCppFunction);
  type.tp_call = THPCppFunction_call;
  type.tp_methods = function_methods ? function_methods : default_methods;
  // 這裡把 function_properties 設定到 tp_getset 之上
  type.tp_getset = function_properties ? function_properties : default_properties;
  type.tp_dealloc = THPCppFunction_dealloc;
  type.tp_traverse = THPCppFunction_traverse;
  type.tp_clear = THPCppFunction_clear;
  if (PyType_Ready(&type) < 0) {
    auto msg = std::string("Unable to instantiate PyTypeObject for ") + name;
    throw std::runtime_error(msg);
  }
  return &type;
}

所以就把 THPCppFunction_next_functions 新增到了 AccumulateGradClass 的 next_functions 之上。即 AccumulateGradClass 有一個函式集，其中 next_functions 對應了 THPCppFunction_next_functions。

+---------------------+
| AccumulateGradClass |
|                     |
|       tp_getset     |
|           +         |
|           |         |
+---------------------+
            |
            |
            v
+-----------+-----------------------------------------------------------+
|accumulate_grad_properties                                             |
|                                                                       |
|                                                                       |
|                                                                       |
|              "variable", accumulateGradVar                            |
|                                                                       |
|                                                                       |
|              "next_functions", (getter)THPCppFunction_next_functions  |
|                                                                       |
|                                                                       |
|              "requires_grad", (getter)THPCppFunction_requires_grad    |
|                                                                       |
|                                                                       |
|              "metadata", (getter)THPCppFunction_metadata              |
|                                                                       |
+-----------------------------------------------------------------------+

我們回憶一下前面提到的 _TenseBase 來對比：

tp_getset 是Python虛擬機器類機制裡面的一個函式集，就是一個 THPVariable_properties。

AccumulateGradClass 設定的是 accumulate_grad_properties。
_TenseBase 設定的是 THPVariable_properties。

以下是 _TenseBase 的函式集（我們省略了很多）。

static struct PyGetSetDef THPVariable_properties[] = {
  {"grad_fn", (getter)THPVariable_get_grad_fn, nullptr, nullptr, nullptr},
  {"_grad_fn", (getter)THPVariable_get_grad_fn, (setter)THPVariable_set_grad_fn, nullptr, nullptr},
  {"is_leaf", (getter)THPVariable_is_leaf, nullptr, nullptr, nullptr},
  {"data", (getter)THPVariable_get_data, (setter)THPVariable_set_data, nullptr, nullptr},
  {"_grad", (getter)THPVariable_get_grad, (setter)THPVariable_set_grad, nullptr, nullptr}, // Allows the python class to override .grad
  {"grad", (getter)THPVariable_get_grad, (setter)THPVariable_set_grad, nullptr, nullptr},
  {"_base", (getter)THPVariable_get_base, nullptr, nullptr, nullptr},
  {"output_nr", (getter)THPVariable_get_output_nr, nullptr, nullptr, nullptr},
  {"requires_grad", (getter)THPVariable_get_requires_grad, (setter)THPVariable_set_requires_grad, nullptr, nullptr},
  {"_backward_hooks", (getter)THPVariable_get_backwards_hooks,  
  .....
};

至此，業務邏輯如下：

                                Python    +   C++
                                          |
+--------------------------------------+  |   +---------------------------+
| torch/__init__.py                    |  |   |                           |
|                                      |  |   |  THPModule_initExtension  |
|  from torch._C import _initExtension |  |   |                           |
|                                      |  |   +--------------+------------+
+-------------------+------------------+  |                  |
                    |                     |                  |
                    |                     |                  v
                    |                     |  +---------------+--------------+
                    |                     |  |                              |
                    |                     |  |  THPAutograd_initFunctions() |
                    |                     |  |                              |
                    |                     |  +---------------+--------------+
                    |                     |                  |
                    |                     |                  |
                    |                     |                  v
                    |                     |  +---------------+-------------------------------------------+
                    |                     |  |                                                           |
                    |                     |  | addClass<AccumulateGrad, NoCtor>(module,                  |
                    |  import             |  | 	                             AccumulateGradClass,        |
                    |                     |  | 	                             "AccumulateGrad",           |
                    |                     |  | 	                             accumulate_grad_properties) |
                    |                     |  |                                                           |
                    |                     |  +--------------+--------------------------------------------+
                    |                     |                 |
                    |                     |                 |  register
                    v                     |                 v
                                          |                                                               +----------------------------------------------------------+
        +----------------------+          |     +--------------------+       +---------------------+      |accumulate_grad_properties                                |
        |                      |          |     |                    |       | AccumulateGradClass |      |                                                          |
        |   AccumulateGrad     | <------------> |   AccumulateGrad   +-----> |                     |      |  "variable", accumulateGradVar                           |
        |                      |          |     |                    |       |       tp_getset +------->  |                                                          |
        |                      |          |     |                    |       |                     |      |  "next_functions", (getter)THPCppFunction_next_functions |
        +----------------------+          |     +--------------------+       |                     |      |                                                          |
                                          |                                  +---------------------+      |  "requires_grad", (getter)THPCppFunction_requires_grad   |
                                          |                                                               |                                                          |
                                          |                                                               |  "metadata", (getter)THPCppFunction_metadata             |
                                          |                                                               |                                                          |
                                          |                                                               +----------------------------------------------------------+

手機如下：

3.5.2.4 next_functions

THPCppFunction_next_functions 定義在 torch/csrc/autograd/python_cpp_function.cpp，其就是遍歷 next_edges_，然後提取出一個tuple列表，每個tuple 內容就是 (Edge.function, Edge.input_nr)，最後作為 next_functions 進行返回。

PyObject* THPCppFunction_next_functions(THPCppFunction* self, PyObject* hook)
{
  const auto num_next = self->cdata->num_outputs();
  THPObjectPtr py_functions(PyTuple_New(num_next));
  if (!py_functions) return nullptr;
  for (size_t i = 0; i < num_next; ++i) { // 遍歷
    auto& c_tuple = self->cdata->next_edge(i); // 獲取 Edge
    THPObjectPtr tuple(PyTuple_New(2));
    if (!tuple) return nullptr;
    PyObject *py_fn = functionToPyObject(c_tuple.function); // py_fn 就是 Edge.function
    if (!py_fn) return nullptr;
    PyTuple_SET_ITEM(tuple.get(), 0, py_fn);
    PyObject *py_idx = THPUtils_packUInt32(c_tuple.input_nr); // py_idx 就是 Edge.input_nr
    if (!py_idx) return nullptr;
    PyTuple_SET_ITEM(tuple.get(), 1, py_idx);
    // tuple 就是 (py_fn, py_idx)，就是 (Edge.function, Edge.input_nr)
    PyTuple_SET_ITEM(py_functions.get(), i, tuple.release()); // 設定 py_functions的第幾個item
  }
  return py_functions.release(); // 返回tuple
}

next_edge 定義在 torch/csrc/autograd/function.h，其是 Node 的成員函式，而返回的是 Edge 列表，而 AccumulateGrad 就是 Node 的派生類。

struct TORCH_API Node : std::enable_shared_from_this<Node> {
    
  const Edge& next_edge(size_t index) const noexcept {
    return next_edges_[index];
  }
    
  edge_list next_edges_;   // 前向過程中的輸入variable，在前向過程中與該運算元相關聯的邊
}

Edge 定義如下：

struct Edge {
  /// The function this `Edge` points to.
  std::shared_ptr<Node> function; // 指向目標的Node

  /// The identifier of a particular input to the function.
  uint32_t input_nr; //指定本Edge是function的第幾個輸入 
};

3.5.3 next_functions 性質

所以我們以 AccumulateGrad 為例總結以下。

grad_fn 有一個屬性 next_functions ，這是一個二維的tuple，形式為( (函式1，整數1)，(函式2，整數2), ..., (函式N，整數N) )。
next_functions 是一個 tuple 列表，列表個數就是這個 grad_fn 的 Edge 數目，列表之中每一個 tuple 對應一條 Edge 資訊，內容就是 (Edge.function, Edge.input_nr)。這個列表是由 THPCppFunction_next_functions 生成的。
AccumulateGrad 的 next_functions 指向的就是一個 tuple 列表（就是下圖中的 2），這個列表來自 AccumulateGradClass（就是下圖中的 1）。反向傳播時候，順著這個 next_functions 就可以逐次計算梯度。

大致如下：

+-----------------+   +-----------------------+        +----------------------+    +---------------------+
|  Tensor         |   | SubBackward0          |        | PowBackward0         |    | AccumulateGrad      |
|                 |   |                       |        |                      |    |                     |
|       grad_fn +---->+     next_functions  +-----+--> |     next_functions +----> |    next_functions +----> {}
|                 |   |                       |   |    |                      |    |                     |
+-----------------+   +-----------------------+   |    +----------------------+    +---------------------+
                                                  |
                                                  |
                                                  |    +----------------------+    +----------------------+    +---------------------+
                                                  |    | MulBackward0         |    | PermuteBackward      |    | AccumulateGrad      |
                                                  +--> |                      |    |                      |    |                     |
                                                       |     next_functions +----> |     next_functions +----> |    next_functions +-----+
                                                       |                      |    |                      |    |                     |   |
+---------------------+                               ++-------------------- -+    +----------------------+    +---------------------+   |
| AccumulateGradClass |                                                                                                                  |
|                     |                                                                                                                  |
|       tp_getset     |                                                                                                 2. point to the tuple list
|           +         |                                                                                                                  |
|           |         |                                                                                                                  |
+---------------------+                                                                                                                  |
            |                                                                                                                            v
            |
            v                                                            +-----> { (function 1, int 1), (function 2, int 2) ... (function n, int n) }
+-----------+-----------------------------------------------------+      |
|accumulate_grad_properties                                       |      |
|                                                                 |      |
|       "variable", accumulateGradVar                             |      |
|                                                                 |      |
|       "next_functions", (getter)THPCppFunction_next_functions +--------+
|                                                                 |  1. generate the tuple list
|       "requires_grad", (getter)THPCppFunction_requires_grad     |
|                                                                 |
|       "metadata", (getter)THPCppFunction_metadata               |
|                                                                 |
+-----------------------------------------------------------------+

手機如下：