CTR學習筆記&程式碼實現5-深度ctr模型 DeepCrossing -> DCN

風雨中的小七發表於2020-05-15

原文網址 : https://www.cnblogs.com/gogoSandy/p/12892973.html

之前總結了PNN,NFM,AFM這類兩兩向量乘積的方式，這一節我們換新的思路來看特徵互動。DeepCrossing是最早在CTR模型中使用ResNet的前輩，DCN在ResNet上進一步創新，為高階特徵互動提供了新的方法並支援任意階數的特徵交叉。

以下程式碼針對Dense輸入更容易理解模型結構，針對spare輸入的程式碼和完整程式碼 ?
https://github.com/DSXiangLi/CTR

Deep Crossing

Deep Crossing結構比較簡單，和最原始的Embedding+MLP的模型結果相比，差異在於之後跟的不是全連線層而是殘差層。模型結構如下

CTR學習筆記&程式碼實現5-深度ctr模型 DeepCrossing -> DCN

簡單說說殘差網路，基本的網路結構如下

CTR學習筆記&程式碼實現5-深度ctr模型 DeepCrossing -> DCN

\[a^{l} = a^{l-1} + F(a^{l-1}, w^l) \]

殘差網路解決了什麼，為什麼有效？這篇部落格講得很清楚，核心是解決網路退化的問題,既隨著網路深度增加，網路的表現先是逐漸增加至飽和，然後迅速下降。這裡的下降並非指過擬合。理論上如果20層的網路是最優解，那30層的網路會包含20層的網路，後面10層只需做恆等對映\(a^{l} = a^{l-1}\)即可，因此更多懷疑是MLP不易擬合恆等對映。而上述殘差網路因為做了identity mapping,當\(F(a^{l-1}, w^l)=0\)時，就直接沿用上一層資料也就是進行了恆等變換。

那把ResNet放到CTR模型裡又有什麼特殊的優勢呢？老實說感覺像是把那個時期比較牛的框架直接拿來用。。。不過能想到的一種是MLP學習的是高階泛化特徵，而ResNet做的identity mapping會保留更多的原始低階特徵資訊，有點類似Wide&Deep又不完全是，因為輸入已經是Embedding而不是原始的離散特徵了。真棒又強行解釋了一波。。。

程式碼實現

def residual_layer(x0, unit, dropout_rate, batch_norm, mode):
    # f(x): input_size -> unit -> input_size
    # output = relu(f(x) + x)
    input_size = x0.get_shape().as_list()[-1]

    # input_size -> unit
    x1 = tf.layers.dense(x0, units = unit, activation = 'relu')
    if batch_norm:
        x1 = tf.layers.batch_normalization( x1, center=True, scale=True,
                                               trainable=True,
                                               training=(mode == tf.estimator.ModeKeys.TRAIN) )
    if dropout_rate > 0:
        x1 = tf.layers.dropout( x1, rate=dropout_rate,
                               training=(mode == tf.estimator.ModeKeys.TRAIN) )
    # unit -> input_size
    x2 = tf.layers.dense(x1, units = input_size )
    # stack with original input and apply relu
    output = tf.nn.relu(tf.add(x2, x0))

    return output


@tf_estimator_model
def model_fn(features, labels, mode, params):
    dense_feature = build_features()
    dense = tf.feature_column.input_layer(features, dense_feature)

    # stacked residual layer
    with tf.variable_scope('Residual_layers'):
        for i, unit in enumerate(params['hidden_units']):
            dense = residual_layer( dense, unit,
                                    dropout_rate = params['dropout_rate'],
                                    batch_norm = params['batch_norm'], mode = mode)
            add_layer_summary('residual_layer{}'.format(i), dense)

    with tf.variable_scope('output'):
        y = tf.layers.dense(dense, units=1)
        add_layer_summary( 'output', y )

    return y

Deep&Cross

Deep&Cross帶著Wide&Deep的風格，在保留全聯接的Deep部分的同時，Deep&Cross借鑑了上述ResNet的思路，創新了顯式的高階特徵互動方式。之前的模型要麼像DeepFM直接依賴全連線層來捕捉高階特徵互動，要麼像PNN，NFM，AFM先基於向量兩兩做內/外/element-wise乘積學習二階互動特徵，再依賴全聯接層來學習更高階的互動資訊。兩兩互動式的方法很難擴充套件到更高階，因為會存在維度爆炸的問題。

模型細節

DCN的輸入是Embedding和連續特徵拼接而成的Dense輸入，因為不像PNN，AFM等需要兩兩向量內積，因此對每個特徵Embedding的維度是否一致沒有要求，然後Cross部分和Deep部分共享輸入，進行聯合訓練，最終把兩個part進行拼接後預測ctr。模型結構如下

CTR學習筆記&程式碼實現5-深度ctr模型 DeepCrossing -> DCN

Deep部分沒啥好說的和DeepFM，Wide&Deep一樣就是多個全聯接層用來學習泛化特徵。Cross部分由多層的cross_layer組成，輸入有N個特徵，為簡化Embedding維度統一是為K，每層cross_layer的計算如下

CTR學習筆記&程式碼實現5-深度ctr模型 DeepCrossing -> DCN

\[\begin{align} x_{l+1} &= x_0x_l^Tw_l + b_l + x_l \\ (, N*K) &= (,N*K,1) * (,1,N*K) * (, N*k) + (, N*k) + (, N*k) \\ \end{align} \]

1. 特徵共享：控制複雜度
特徵共享的存在，保證了Cross每增加一層,新增的引數都是\(O(NK)\)

\[\begin{align} &\begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} * \begin{pmatrix} x_1 & x_2 & x_3 \end{pmatrix} * \begin{pmatrix} w_1^{(1)} \\ w_2^{(1)} \\ w_3^{(1)} \end{pmatrix}\\ & = x_1*w_1^{(1)} * \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} + x_2*w_2^{(1)} * \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} +x_3*w_3^{(1)} * \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} \\ &=\begin{pmatrix} x_1x_1 & x_1x_2 & x_1x_3 \\ x_2x_1 & x_2x_2 & x_2x_3 \\ x_3x_1 & x_3x_2 & x_3x_3 \\ \end{pmatrix} * \begin{pmatrix} w_1^{(1)} \\ w_2^{(1)} \\ w_3^{(1)} \end{pmatrix}\\ \end{align} \]

FM視角(式4): FM是每個離散特徵共享一個隱向量v，向量互動的權重為隱向量內積，但這種操作只侷限於兩兩互動。而Cross是Embedding的每一個元素和其餘所有元素互動時共享一個權重w。（這裡感覺cross直接用原始的one-hot也是可以的，只不過用Embedding可以進一步降低複雜度）
OPNN視角(式5): OPNN兩兩向量做外積得到\(N^2\)個\(K^2\)外積矩陣，拼在一起其實就是Cross不區分Field直接做外積得到的大外積矩陣。不過不像OPNN採用簡單粗暴的sum_pooling來解決維度爆炸的問題，Cross採用每行共享一個權重的方式來降維。保留更多資訊的同時保證了Cross-layer的複雜度不會隨層數上升而上升, 每層的維度都是最初的\(NK\), 複雜度也是\(O(NK)\)

2. 多項式核心：任意階數特徵互動
為簡化我們先忽略截距項，看下兩層的cross-layer

\[\begin{align} x_1 &= x_0^2 *w_0 + x_0 \\ x_2 &= x_0 * x_1 *w_1 + x_1 \\ &=x_0 *(x_0 * x_0 *w_0 + x_0 ) *w_1 + x_1\\ &=x_0^3 *w_0 *w_1 + x_0^2 *w_1 + x_1 \end{align} \]

會發現ResNet加上cross，類似於對輸入向量進行了多項式計算，Cross的部分每深一層，就可以捕捉更高一階的特徵互動資訊。因此高階特徵互動資訊的捕捉不再簡單依賴MLP而是人為可控。同時ResNet的存在也保證了不會隨著Cross的加深而導致模型過於泛化，因為最初的輸入特徵始終保留。

DCN已經很優秀，只能想到可以吐槽的點

對記憶資訊的學習可能會有不足，雖然有ResNet但輸入已經是Embedding特徵，多少已經是泛化後的特徵表達，不知道再加入Wide部分是不是會有提升。

程式碼實現

在上面引數共享討論的兩種視角，剛好對應到cross layer的兩種計算方式。按照原始順序Embedding先做外積再加權求和（特徵共享中的OPNN視角），會需要儲存巨大的臨時矩陣，程式碼如下

def cross_op_raw(xl, x0, weight, feature_size):
    # (x0 * xl) * w
    # (batch,feature_size) - > (batch, feature_size * feature_size)
    outer_product = tf.matmul(tf.reshape(x0, [-1, feature_size,1]),
                              tf.reshape(xl, [-1, 1, feature_size])
                              ) 
    # (batch,feature_size*feature_size) ->(batch, feature_size)
    interaction = tf.tensordot(outer_product, weight, axes=1) 
    return interaction

而通過調整向量乘積的順序\((x_0 * x_l) *w \to x_0 * (x_l * w)\)我們可以避免外積矩陣的運算（特徵共享中的FM視角），也就是paper中提到的利用\(x_0x_l^T\)是秩為1的矩陣特性。

def cross_op_better(xl, x0, weight, feature_size):
    # x0 * (xl * w)
    # (batch, 1, feature_size) * (feature_size) -> (batch,1)
    transform = tf.tensordot( tf.reshape( xl, [-1, 1, feature_size] ), weight, axes=1 )
    # (batch, feature_size) * (batch, 1) -> (batch, feature_size)
    interaction = tf.multiply( x0, transform )
    return interaction

完整程式碼如下

def cross_layer(x0, cross_layers, cross_op = 'better'):
    xl = x0
    if cross_op == 'better':
        cross_func = cross_op_better
    else:
        cross_func = cross_op_raw

    with tf.variable_scope( 'cross_layer' ):
        feature_size = x0.get_shape().as_list()[-1]  # feature_size = n_feature * embedding_size
        for i in range( cross_layers):
            weight = tf.get_variable( shape=[feature_size],
                                      initializer=tf.truncated_normal_initializer(), name='cross_weight{}'.format( i ) )
            bias = tf.get_variable( shape=[feature_size],
                                    initializer=tf.truncated_normal_initializer(), name='cross_bias{}'.format( i ) )

            interaction = cross_func(xl, x0, weight, feature_size)

            xl = interaction + bias + xl  # add back original input -> (batch, feature_size)
            add_layer_summary( 'cross_{}'.format( i ), xl )
    return xl
@tf_estimator_model
def model_fn_dense(features, labels, mode, params):
    dense_feature = build_features()
    dense_input = tf.feature_column.input_layer(features, dense_feature)

    # deep part
    dense = stack_dense_layer(dense_input, params['hidden_units'],
                              params['dropout_rate'], params['batch_norm'],
                              mode, add_summary = True)

    # cross part
    xl = cross_layer(dense_input, params['cross_layers'], params['cross_op'])

    with tf.variable_scope('stack'):
        x_stack = tf.concat( [dense, xl], axis=1 )

    with tf.variable_scope('output'):
        y = tf.layers.dense(x_stack, units =1)
        add_layer_summary( 'output', y )

    return y

CTR學習筆記&程式碼實現系列?
CTR學習筆記&程式碼實現1-深度學習的前奏LR->FFM
CTR學習筆記&程式碼實現2-深度ctr模型 MLP->Wide&Deep
CTR學習筆記&程式碼實現3-深度ctr模型 FNN->PNN->DeepFM
CTR學習筆記&程式碼實現4-深度ctr模型 NFM/AFM

資料

Gang Fu,Mingliang Wang, 2017, Deep & Cross Network for Ad Click Predictions
Ying Shan, T. Ryan Hoens, 2016, Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features
https://blog.csdn.net/Dby_freedom/article/details/86502623
https://zhuanlan.zhihu.com/p/80226180

CTR學習筆記&程式碼實現6-深度ctr模型後浪 xDeepFM/FiBiNET
2020-06-01
筆記模型
從FM推演各深度CTR預估模型(附程式碼)
2018-07-16
模型
從FM推演各深度CTR預估模型（附開原始碼）
2018-07-21
模型原始碼
頂會中深度學習用於CTR預估的論文及程式碼集錦 (3)
2019-08-15
深度學習
頂會中深度學習用於CTR預估的論文及程式碼集錦 (1)
2019-08-05
深度學習
原生javasript實現Ctr+c複製Ctr+v貼上
2018-09-04
Java
資深演算法專家解讀CTR預估業務中的深度學習模型
2018-07-12
演算法深度學習模型
【深度學習】TensorFlow實現線性迴歸，程式碼演示。全md文件筆記（程式碼文件已分享）
2024-02-27
深度學習筆記
【深度學習】深度學習md筆記總結第1篇：深度學習課程,要求【附程式碼文件】
2024-03-19
深度學習筆記
Python學習筆記—程式碼
2019-02-16
Python筆記
深度學習筆記一
2018-03-09
深度學習筆記
深度學習keras筆記
2020-12-17
深度學習Keras筆記
Laravel核心程式碼學習 — 模型關聯底層程式碼實現
2019-03-02
Laravel模型
Laravel核心程式碼學習 -- 模型關聯底層程式碼實現
2018-06-01
Laravel模型
深度學習框架Pytorch學習筆記
2023-02-27
深度學習框架PyTorch筆記
深度學習 DEEP LEARNING 學習筆記（一）
2020-07-24
深度學習筆記
深度學習 DEEP LEARNING 學習筆記（二）
2020-07-24
深度學習筆記
實戰 | 基於深度學習模型VGG的影象識別（附程式碼）
2018-03-30
深度學習模型
【學習筆記】初次學習斜率最佳化的程式碼及筆記
2024-05-05
筆記
大模型學習進階 5-大模型測評
2024-06-16
大模型
推薦模型DeepCrossing: 原理介紹與TensorFlow2.0實現
2021-03-14
模型ROS
阿里推出DeepInsight平臺：視覺化理解深度神經網路CTR預估模型
2018-07-09
阿里視覺化神經網路模型
java設計模式學習筆記-5-介面卡模式
2020-10-30
Java設計模式筆記
深度學習——loss函式的學習筆記
2020-10-13
深度學習函式筆記
odoo學習5-模型欄位知識
2024-07-08
Odoo模型
【深度學習筆記】Batch Normalization (BN)
2019-01-07
深度學習筆記BATORM
深度學習模型
2018-12-07
深度學習模型
深度學習趣談：什麼是遷移學習？（附帶Tensorflow程式碼實現）
2020-07-17
深度學習遷移學習
【深度學習】大牛的《深度學習》筆記，Deep Learning速成教程
2018-04-07
深度學習筆記
MURF3040CTR-ASEMI智慧AI應用MURF3040CTR
2024-06-22
AI
Python深度學習（使用 Keras 回撥函式和 TensorBoard 來檢查並監控深度學習模型）--學習筆記（十六）
2020-11-16
Python深度學習Keras函式ORB模型筆記
spring原始碼學習筆記之容器的基本實現(一)
2021-02-05
Spring原始碼筆記
深度學習模型調優方法（Deep Learning學習記錄）
2020-08-05
深度學習模型
使用Python實現深度學習模型：序列到序列模型（Seq2Seq）
2024-06-06
Python深度學習模型
jQuery 學習筆記：jQuery 程式碼結構
2018-05-07
jQuery筆記
JPG學習筆記2（附完整程式碼）
2021-02-13
筆記
JPG學習筆記1（附完整程式碼）
2021-02-12
筆記
JPG學習筆記3（附完整程式碼）
2021-02-18
筆記

CTR學習筆記&程式碼實現5-深度ctr模型 DeepCrossing -> DCN

Deep Crossing

程式碼實現

Deep&Cross

模型細節

程式碼實現

相關文章