Sunny.Xia的深度學習(四)MMOE多工學習模型實戰演練

若之辰發表於2020-12-20

原文網址 : https://blog.csdn.net/sunny_xsc1994/article/details/111445126

本專欄文章會在本部落格和知乎專欄——Sunny.Xia的深度學習同步更新，對於評論博主若未能夠及時回覆的，可以知乎私信。未經本人允許，請勿轉載，謝謝。

一、什麼是MMOE？

三張圖分別是多工模型的不同結構，具體介紹可以參見多工學習之MMOE模型，該文章裡也提供了一個簡單的demo助於讀者瞭解，很詳細就不過多贅述了。

論文地址：https://github.com/ruozhichen/deep_learning_papers/tree/master/pctr

模型(c)即是本文所要介紹的MMOE模型，圖中的三個Expert可以理解為相互獨立的三個子網路，Tower A和Tower B即為兩個任務。三個Expert的結果會通過加權和作為Tower的輸入，而權重則是由Gate來提供。相比圖(b)區別就在於，這裡每個任務都有自己的Gate，輸出各個Expert的權重大小。整個模型表示式如下所示：

$y_{k}=h^{k}(f^{k}(x)) \ where\ f^{k}(x)=\sum_{i=1}^{n}g^{k}(x)_{i}f_{i}(x)$

其中k表示有k個任務， $h^{k}$ 表示第k個Tower的網路；n為Expert的個數， $f_{i}(x)$ 為第i個Expert輸出的結果； $g^{k}(x)_{i}$ 為第k個gate中，第i個Expert對應的權重。

$g^{k}(x)=softmax(W_{gk}x)$

可以看到，Gate網路輸出的結果是一個softmax，也就意味著n個Expert對應的權重加起來之和為1。

二、實戰演練

本文程式碼地址：https://github.com/ruozhichen/deep_learning/blob/master/model/MMOE.py (master分支)

tensorflow-gpu: 1.14 python: 3.7 直接python MMOE.py執行即可。

2.1引數初始化

    # experts
    # feature_dim * experts_units * experts_num
    experts_weight = tf.get_variable(name='experts_weight',
                                     dtype=tf.float32,
                                     shape=(input_layer.get_shape()[1], experts_units, experts_num),
                                     initializer=tf.contrib.layers.xavier_initializer())
    experts_bias = tf.get_variable(name='expert_bias',
                                   dtype=tf.float32,
                                   shape=(experts_units, experts_num),
                                   initializer=tf.contrib.layers.xavier_initializer())

    # gates
    # tasks_num * experts_units * experts_num
    gate_weights = [tf.get_variable(name='gate%d_weight' % i,
                                   dtype=tf.float32,
                                   shape=(input_layer.get_shape()[1], experts_num),
                                   initializer=tf.contrib.layers.xavier_initializer())
                    for i in range(FLAGS.tasks_num)]
    # tasks_num * experts_num
    gate_biases = [tf.get_variable(name='gate%d_bias' % i,
                                 dtype=tf.float32,
                                 shape=(experts_num,),
                                 initializer=tf.contrib.layers.xavier_initializer())
                   for i in range(FLAGS.tasks_num)]

input_layer為特徵向量，大小為N * feature_dim，N為樣本個數。

experts_units為隱含層的維度，可以理解為最終每個Task的輸入的維度就是experts_units。

experts_num為experts的個數。

另外注意的是這裡gate_weights和gate_biases是一個陣列，對於每一個task都初始化了相應的gate網路引數。

2.2 expert_outputs

    with tf.variable_scope("MMOE-part"):
        # axes=k 表示取a的後k維跟b的前k維進行矩陣相乘
        experts_output = tf.tensordot(a=input_layer, b=experts_weight, axes=1) # N * experts_units * experts_num
        if use_experts_bias:
            experts_output = tf.add(experts_output, experts_bias)

input_layer的大小是N * feature_dim，experts_weight的大小是feature_dim * experts_units * experts_num。

experts_output中的每一個元素，其實就是對某個樣本的feature_dim個值進行加權求和得到的，這樣的加權求和進行了experts_units * experts_num次。

再加上有N個樣本，因此最後的大小就是N * experts_units * experts_num。

2.2 gate_outputs

        gates_output = []
        for i in range(FLAGS.tasks_num):
            # N * experts_num
            res = tf.matmul(input_layer, gate_weights[i])
            if use_gate_bias:
                res = tf.add(res, gate_biases[i])
            gates_output.append(res)
        # tasks_num * N * experts_num
        gate_outputs = tf.nn.softmax(gates_output)

gate_outputs即對應每一個task，生成每個樣本在各個Experts的權重。這裡gate_outputs是一個list，每個元素的大小為N * experts_num。

2.3 final_results

        final_outputs = []
        for i in range(FLAGS.tasks_num):
            # N * 1 * experts_num
            expanded_gate_output = tf.expand_dims(gate_outputs[i], axis=1)
            # N * experts_units * experts_num
            weighted_expert_output = tf.multiply(experts_output, expanded_gate_output)
            # N * experts_units
            task_output = tf.reduce_sum(weighted_expert_output, axis=2)
            final_outputs.append(task_output)
        # 本專案提供的資料不支援多工學習，故這裡將多工輸出的結果進行拼接後，作為DNN的輸入。
        deep_inputs = tf.concat(final_outputs, axis=1)  # N * ( 2 * experts_unit)

這裡weighted_expert_output和task_output兩步實現的就是對Experts的結果進行加權求和，最終大小為N * experts_units，然後作為每個task的輸入。

由於本程式碼的測試樣例不適合多工學習，所以最後實際上是將這幾個task_output進行了拼接，最後送到DNN網路中去。

測試樣例的構造可以參考Sunny.Xia的深度學習(一)DeepFM附程式碼實戰講解。

深度學習之tensorflow2實戰：多輸出模型
2022-11-23
深度學習模型
深度學習模型
2018-12-07
深度學習模型
回顧·機器學習/深度學習工程實戰
2019-02-21
機器學習深度學習
深度學習之PyTorch實戰（4）——遷移學習
2023-03-26
深度學習PyTorch遷移學習
深度學習、強化學習核心技術實戰
2021-03-21
深度學習強化學習
多工學習模型之DBMTL介紹與實現
2022-03-10
模型
多工學習模型之ESMM介紹與實現
2022-11-23
模型
初創公司如何訓練大型深度學習模型
2021-12-10
深度學習模型
NLP與深度學習（五）BERT預訓練模型
2021-09-30
深度學習模型
基於pytorch的深度學習實戰
2018-10-12
PyTorch深度學習
基於TensorFlow的深度學習實戰
2018-04-25
深度學習
深度學習中的Normalization模型
2018-08-29
深度學習ORM模型
深度學習--實戰 LeNet5
2023-04-24
深度學習
《深度學習：21天實戰Caffe》
2019-12-17
深度學習
深度學習：TensorFlow入門實戰
2021-09-16
深度學習
適合 Kubernetes 初學者的一些實戰練習 (四)
2022-04-03
Docker部署深度學習模型
2024-06-05
Docker深度學習模型
實戰 | 基於深度學習模型VGG的影象識別（附程式碼）
2018-03-30
深度學習模型
《深度學習Python》核心技術實戰
2021-04-03
深度學習Python
深度學習DeepLearning核心技術實戰
2021-03-21
深度學習
深度學習訓練過程中的學習率衰減策略及pytorch實現
2022-03-29
深度學習PyTorch
架構學習-多工
2024-04-13
架構
深度學習和幾何（演講提要）
2018-12-04
深度學習
基於Theano的深度學習框架keras及配合SVM訓練模型
2020-04-06
深度學習框架Keras模型
並行多工學習論文閱讀（一）：多工學習速覽
2021-10-29
並行
深度強化學習核心技術實戰
2021-03-20
強化學習
深度學習-行人重識別實戰(2020)
2021-01-04
深度學習
實時深度學習
2020-12-31
深度學習
【Python機器學習實戰】決策樹與整合學習（四）——整合學習（2）GBDT
2021-09-03
Python機器學習
深度學習的seq2seq模型
2019-02-21
深度學習模型
深度學習模型調優方法（Deep Learning學習記錄）
2020-08-05
深度學習模型
多工學習分散式化及聯邦學習
2022-03-01
分散式聯邦學習
深度學習|基於MobileNet的多目標跟蹤深度學習演算法
2022-11-09
深度學習演算法
深度學習學習框架
2018-08-02
深度學習框架
深度學習+深度強化學習+遷移學習【研修】
2021-03-25
深度學習強化學習遷移學習
gRPC學習之四：實戰四類服務方法
2021-08-17
RPC
深度學習之PyTorch實戰（5）——對CrossEntropyLoss損失函式的理解與學習
2023-04-09
深度學習PyTorchROS函式
深度學習中的序列模型演變及學習筆記（含RNN/LSTM/GRU/Seq2Seq/Attention機制）
2020-05-15
深度學習模型筆記RNN