推薦系統實踐 0x0c FM系列

NoMornings發表於2020-12-07

原文網址 : https://www.cnblogs.com/nomornings/p/14098476.html

邏輯迴歸（LR）

在介紹FM系列之前，我想首先簡單介紹一下邏輯迴歸。通常來說，邏輯迴歸模型能夠綜合利用更多的資訊，如使用者、物品、上下文等多種不同的特徵，生成更為全面的結果。另外，邏輯迴歸將推薦問題看成一個分類問題。通過預測正樣本的概率對物品進行排序，這裡的正樣本可以是使用者觀看了某個視訊，也可以是使用者點選了某個商品，或者使用者播放了某個音樂等等。邏輯迴歸模型將推薦問題轉換成了CTR(click throught rate)預估的問題。

步驟

一般來說，邏輯迴歸模型的推薦過程分成以下幾步：

將使用者年齡、性別等資訊，商品名稱、屬性等資訊，以及上下文等資訊轉換成數值型特徵向量。
將邏輯迴歸作為優化目標，利用樣本資料對邏輯迴歸模型進行訓練，調整模型內部引數。
在模型服務階段，將特徵向量的輸入到模型當中，得到使用者“點選”等正反饋的概率。
按照正反饋的概率對物品進行排序，得到推薦列表。

這裡的邏輯迴歸也使用了梯度下降的演算法。這裡我推薦一篇文章專門介紹邏輯迴歸的數學原理，感興趣的讀者可以繼續閱讀。另外特別要說明的事是，邏輯迴歸是分類模型，不是迴歸模型。

優點

有著具體的數學含義作為支撐。由於CTR模型符合伯努利分佈，所以使用邏輯迴歸作為CTR模型符合邏輯規律。
可解釋性強，能夠通過權重對各個因素進行定位，給出結果的可解釋性原因。
實際工程需要。由於易於並行化、模型簡單以及訓練開銷小等特點，邏輯迴歸受到了廣泛認可。

侷限

表達能力不強，無法進行特徵交叉、特徵篩選等操作等

POLY2

POLY2是最簡單的特徵交叉的演算法，直接對特徵進行暴力組合，看看它的數學形式就能知道

\[\mathrm{POLY2}(w,x)=\sum_{j_1=1}^{n-1}\sum_{j_2=j_1+1}^{n}w_{h(j_1,j_2)}x_{j_1}x_{j_2} \]

直接對特徵進行兩兩交叉，並對交叉後的特徵組合賦予權重。POLY2仍然是線性模型，訓練方法與邏輯迴歸模型並無區別。

侷限

對於很多網際網路資料，通常使用的是one-hot編碼，無選擇的特徵交叉使得特徵向量更加稀疏，對於權重缺乏有效訓練，甚至無法收斂。
權重引數直接上升了一個數量級，計算量難以接受

Factorization Machines(FM)

為了解決POLY2的侷限，FM模型使用了兩個向量內積取代了單一的權重係數。FM模型為每個特徵學習了一個隱權重向量，在做特徵交叉時使用兩個特徵隱向量的內積作為交叉特徵的權重。如以下公式：

\[\mathrm{FM}(w,x)=\sum_{j_1=1}^{n-1}\sum_{j_2=j_1+1}^{n}(w_{j_1}w_{j_2})x_{j_1}x_{j_2} \]

FM引入特徵隱向量與矩陣分解中的隱向量有異曲同工之妙。通過引入特徵隱向量的方式，把POLY2當中\(n^2\)級別的權重引數降低到了\(nk\)，極大地降低了訓練開銷。

另外，由於特徵隱向量的存在，使得模型具備了計算特徵組合權重的能力，如傢俱，蔬菜兩種特徵中的一個訓練樣本，(桌子，蕃茄)，就不需要同時出現桌子和蕃茄才能學習這種特徵組合。另外，當出現新的樣本事也能通過計算過的特徵隱向量進行線上服務。

同樣的，FM也可以使用梯度下降法進行學習，不失實時性和靈活性。我們看一下PyTorch版本的FM是如何實現的吧。

import torch as torch
import torch.nn as nn
import numpy as np
import torch.nn.functional as F


class FeaturesLinear(nn.Module):

    def __init__(self, field_dims, output_dim=1):
        super(FeaturesLinear, self).__init__()
        print("field_dims: ", field_dims)
        self.fc = nn.Embedding(sum(field_dims), output_dim)
        self.bias = nn.Parameter(torch.zeros((output_dim,)))
        # accumulation add function to sparse the categories like:[1,3,4,7]==>[1,4,8,15]
        self.offsets = np.array((0, *np.cumsum(field_dims)[:-1]), dtype=np.long)

    def forward(self, x):
        """
          to change the category Serial number to ordered number
          like we got x = [2, 4] means category_1's id is 2, and category_2's id is 4
          assume field_dims like [3, 8], category_1 has 3 ids, category_2 has 8 ids. ==> offsets=[0, 3]
          x = [0 + 2, 4 + 3] ==> [2, 7]
        """
        x = x + x.new_tensor(self.offsets).unsqueeze(0)
        return torch.sum(self.fc(x), dim=1)+self.bias


class FeaturesEmbedding(nn.Module):

    def __init__(self, field_dims, embed_dim):
        super(FeaturesEmbedding, self).__init__()
        self.embedding = nn.Embedding(sum(field_dims), embed_dim)
        self.offsets = np.array((0, *np.cumsum(field_dims)[:-1]), dtype=np.long)
        nn.init.xavier_uniform_(self.embedding.weight.data)

    def forward(self, x):
        x = x + x.new_tensor(self.offsets).unsqueeze(0)
        return self.embedding(x)

class FactorizationMachine(nn.Module):
    def __init__(self, reduce_sum=True):
        super(FactorizationMachine, self).__init__()
        self.reduce_sum = reduce_sum

    def forward(self, x):
        """
             $\frac{1}{2}\sum_{k=1}^{K}[(\sum_{i=1}^{n}v_{ik}x_i)^2-\sum_{i=1}^{n}v_{ik}^2x_i^2]$
        :param x: float tensor of size (batch_size, num_fields, embed_dim)
        :return:
        """
        square_of_sum = torch.sum(x, dim=1) ** 2
        sum_of_square = torch.sum(x ** 2, dim=1)
        ix = square_of_sum - sum_of_square
        if self.reduce_sum:
            ix = torch.sum(ix, dim=1, keepdim=True)
        return 0.5 * ix

import torch.nn.functional as F
from base import BaseModel
import torch as torch
import torch.nn as nn

from model.layers import *


class FM(BaseModel):

    def __init__(self, field_dims=None, embed_dim=None):
        super().__init__()
        self.linear = FeaturesLinear(field_dims)
        self.embedding = FeaturesEmbedding(field_dims, embed_dim)
        self.fm = FactorizationMachine(reduce_sum=True)

    def forward(self, x):
        x = self.linear(x) + self.fm(self.embedding(x))
        x = torch.sigmoid(x.squeeze(1))
        return x

Field-aware Factorization Machine(FFM)

還是為了解決資料特徵係數的問題，FFM在FM的基礎上進一步改進，在模型中引入域的概念，即field。將同一個域的特徵單獨進行one-hot，因此在FFM中，每一維特徵都會針對其他特徵的每個域，分別學習一個隱變數，該隱變數不僅與特徵相關，也與域相關。

\[\mathrm{FFM}(w,x)=\sum_{j_1=1}^{n-1}\sum_{j_2=j_1+1}^{n}(w_{j_1,f_2}w_{j_2,f_1})x_{j_1}x_{j_2} \]

按照我的理解，引入特徵域的概念實際上是希望每種特徵都能夠針對性對其他特徵有更合適的權重，也就是學習域與域之間的權重分佈，作為特徵隱變數。但是與此同時，計算複雜度從\(nk\)上升到了\(n^2k\)，在實際應用中需要在效果和工程投入進行權衡。

我們看一下相關程式碼：

class FieldAwareFactorizationMachine(nn.Module):
    def __init__(self, field_dims, embed_dim):
        super().__init__()
        self.num_fields = len(field_dims)
        self.embeddings = nn.ModuleList([
            nn.Embedding(sum(field_dims), embed_dim) for _ in range(self.num_fields)
        ])
        self.offsets = np.array((0, *np.cumsum(field_dims)[:-1]), dtype=np.long)
        for embedding in self.embeddings:
            nn.init.xavier_uniform_(embedding.weight.data)

    def forward(self, x):
        x = x + x.new_tensor(self.offsets).unsqueeze(0)
        xs = [self.embeddings[i](x) for i in range(self.num_fields)]
        ix = list()
        for i in range(self.num_fields-1):
            for j in range(i+1, self.num_fields):
                ix.append(xs[j][:, j] * xs[i][:, j])
        ix = torch.stack(ix, dim=1)
        return ix

from model.layers import *


class FFM(nn.Module):

    def __init__(self, field_dims, embed_dim):
        super().__init__()
        self.linear = FeaturesLinear(field_dims)
        self.ffm = FieldAwareFactorizationMachine(field_dims, embed_dim)

    def forward(self, x):
        ffm_term = torch.sum(torch.sum(self.ffm(x), dim=1), dim=1, keepdim=True)
        x = self.linear(x) + ffm_term
        return x.squeeze(1)

參考

【機器學習】邏輯迴歸（非常詳細）
Github:ottsion/deeplite

推薦系統FM系列文章（三）-- NFM模型
2020-12-19
模型
推薦系統實踐學習系列（三）推薦系統冷啟動問題
2018-06-24
Spark推薦系統實踐
2021-01-12
Spark
《推薦系統實踐》筆記 01 推薦系統簡介
2020-11-22
筆記
聊聊推薦系統，FM模型效果好在哪裡？
2021-06-17
模型
推薦系統召回四模型之全能的FM模型
2019-03-06
模型
推薦系統 embedding 技術實踐總結
2020-06-30
個性化推薦系統實踐應用
2019-02-11
推薦演算法在商城系統實踐
2023-04-09
演算法
19期推薦系統實踐學習(二）
2020-11-27
推薦系統實踐 0x12 Embedding
2020-12-21
推薦系統實踐 0x11 NeuralCF
2020-12-18
推薦系統實踐 0x10 Deep Crossing
2020-12-17
ROS
推薦系統實踐 0x0f AutoRec
2020-12-14
推薦系統實踐 0x05 推薦資料集MovieLens及評測
2020-11-24
58同城智慧推薦系統的演進與實踐
2019-04-10
PyCon 2018: SVD推薦系統在Python中的實踐
2018-06-01
Python
《推薦系統實踐》筆記 03 評測指標
2020-11-23
筆記指標
RecSysOps：奈飛運維大型推薦系統的最佳實踐
2022-10-17
運維
推薦系統實踐 0x0e LS-PLM
2020-12-10
推薦系統實踐 0x0b 矩陣分解
2020-12-04
矩陣
推薦系統實踐 0x13 Word2Vec
2021-01-04
網易雲音樂推薦系統簡單實現系列
2019-03-04
分期商城實時推薦系統
2018-12-29
推薦系統實踐 0x0a 冷啟動問題
2020-12-02
推薦系統實踐 0x09 基於圖的模型
2020-12-01
模型
【推薦系統篇】--推薦系統之訓練模型
2018-03-26
模型
【轉】推薦系統演算法總結（二）——協同過濾(CF) MF FM FFM
2018-08-30
演算法
萬字長文，詳解推薦系統領域經典模型FM因子分解機
2020-11-13
模型
Netflix推薦系統(Part Seven)-改善實驗系統
2019-03-01
【推薦系統篇】--推薦系統之測試資料
2018-03-27
今日頭條推薦系統架構設計實踐（附下載）
2018-04-24
架構
深度召回在飛豬旅行推薦系統中的探索和實踐
2023-01-29
實時增量學習在雲音樂直播推薦系統中的實踐
2022-03-15
推薦系統概述
2018-10-31
python 推薦系統
2022-02-28
Python
百度基於雲原生的推薦系統設計與實踐
2024-02-20
美團綜合業務推薦系統的質量模型及實踐
2022-06-17
模型

推薦系統實踐 0x0c FM系列

邏輯迴歸（LR）

步驟

優點

侷限

POLY2

侷限

Factorization Machines(FM)

Field-aware Factorization Machine(FFM)

參考

相關文章