FM演算法python實現

xujing123qwe發表於2019-03-26

原文網址 : https://blog.csdn.net/weixin_43228814/article/details/88825027

演算法Python

我們需要注意以下四點：

1. 初始化引數，包括對偏置項權值w0、一次項權值w以及交叉項輔助向量的初始化；

2. 定義FM演算法；

3. 損失函式梯度的定義；

4. 利用梯度下降更新引數。

下面的程式碼片段是以上四點的描述，其中的loss並不是二分類的損失loss，而是分類loss的梯度中的一部分：

loss = self.sigmoid(classLabels[x] * p[0, 0]) -1

實際上，二分類的損失loss的梯度可以表示為：

gradient = (self.sigmoid(classLabels[x] * p[0, 0]) -1)*classLabels[x]*p_derivative

其中 p_derivative 代表常數項、一次項、交叉項的導數

# -*- coding: utf-8 -*-

from __future__ import division
from math import exp
from numpy import *
from random import normalvariate  # 正態分佈
from sklearn import preprocessing
import numpy as np

'''
    data : 資料的路徑
    feature_potenital : 潛在分解維度數
    alpha ： 學習速率
    iter ： 迭代次數
    _w,_w_0,_v ： 拆分子矩陣的weight
    with_col : 是否帶有columns_name
    first_col : 首列有價值的feature的index
'''


class fm(object):
    def __init__(self):
        self.data = None
        self.feature_potential = None
        self.alpha = None
        self.iter = None
        self._w = None
        self._w_0 = None
        self.v = None
        self.with_col = None
        self.first_col = None

    def min_max(self, data):
        self.data = data
        min_max_scaler = preprocessing.MinMaxScaler()
        return min_max_scaler.fit_transform(self.data)

    def loadDataSet(self, data, with_col=True, first_col=2):
        # 我就是閒的蛋疼，明明pd.read_table()可以直接度，非要搞這樣的，顯得程式碼很長，小資料下完全可以直接讀嘛，唉～
        self.first_col = first_col
        dataMat = []
        labelMat = []
        fr = open(data)
        self.with_col = with_col
        if self.with_col:
            N = 0
            for line in fr.readlines():
                # N=1時幹掉列表名
                if N > 0:
                    currLine = line.strip().split()
                    lineArr = []
                    featureNum = len(currLine)
                    for i in range(self.first_col, featureNum):
                        lineArr.append(float(currLine[i]))
                    dataMat.append(lineArr)
                    labelMat.append(float(currLine[1]) * 2 - 1)
                N = N + 1
        else:
            for line in fr.readlines():
                currLine = line.strip().split()
                lineArr = []
                featureNum = len(currLine)
                for i in range(2, featureNum):
                    lineArr.append(float(currLine[i]))
                dataMat.append(lineArr)
                labelMat.append(float(currLine[1]) * 2 - 1)
        return mat(self.min_max(dataMat)), labelMat

    def sigmoid(self, inx):
        # return 1.0/(1+exp(min(max(-inx,-10),10)))
        return 1.0 / (1 + exp(-inx))

    # 得到對應的特徵weight的矩陣
    def fit(self, data, feature_potential=8, alpha=0.01, iter=100):
        # alpha是學習速率
        self.alpha = alpha
        self.feature_potential = feature_potential
        self.iter = iter
        # dataMatrix用的是mat, classLabels是列表
        dataMatrix, classLabels = self.loadDataSet(data)
        print('dataMatrix:',dataMatrix.shape)
        print('classLabels:',classLabels)
        k = self.feature_potential
        m, n = shape(dataMatrix)
        # 初始化引數
        w = zeros((n, 1))  # 其中n是特徵的個數
        w_0 = 0.
        v = normalvariate(0, 0.2) * ones((n, k))
        for it in range(self.iter): # 迭代次數
            # 對每一個樣本，優化
            for x in range(m):
                # 這邊注意一個數學知識：對應點積的地方通常會有sum，對應位置積的地方通常都沒有，詳細參見矩陣運算規則，本處計算邏輯在：http://blog.csdn.net/google19890102/article/details/45532745
                # xi·vi,xi與vi的矩陣點積
                inter_1 = dataMatrix[x] * v
                # xi與xi的對應位置乘積   與   xi^2與vi^2對應位置的乘積    的點積
                inter_2 = multiply(dataMatrix[x], dataMatrix[x]) * multiply(v, v)  # multiply對應元素相乘
                # 完成交叉項,xi*vi*xi*vi - xi^2*vi^2
                interaction = sum(multiply(inter_1, inter_1) - inter_2) / 2.
                # 計算預測的輸出
                p = w_0 + dataMatrix[x] * w + interaction
                print('classLabels[x]:',classLabels[x])
                print('預測的輸出p:', p)
                # 計算sigmoid(y*pred_y)-1
                loss = self.sigmoid(classLabels[x] * p[0, 0]) - 1
                if loss >= -1:
                    loss_res = '正方向 '
                else:
                    loss_res = '反方向'
                # 更新引數
                w_0 = w_0 - self.alpha * loss * classLabels[x]
                for i in range(n):
                    if dataMatrix[x, i] != 0:
                        w[i, 0] = w[i, 0] - self.alpha * loss * classLabels[x] * dataMatrix[x, i]
                        for j in range(k):
                            v[i, j] = v[i, j] - self.alpha * loss * classLabels[x] * (
                                    dataMatrix[x, i] * inter_1[0, j] - v[i, j] * dataMatrix[x, i] * dataMatrix[x, i])
            print('the no %s times, the loss arrach %s' % (it, loss_res))
        self._w_0, self._w, self._v = w_0, w, v

    def predict(self, X):
        if (self._w_0 == None) or (self._w == None).any() or (self._v == None).any():
            raise NotFittedError("Estimator not fitted, call `fit` first")
        # 型別檢查
        if isinstance(X, np.ndarray):
            pass
        else:
            try:
                X = np.array(X)
            except:
                raise TypeError("numpy.ndarray required for X")
        w_0 = self._w_0
        w = self._w
        v = self._v
        m, n = shape(X)
        result = []
        for x in range(m):
            inter_1 = mat(X[x]) * v
            inter_2 = mat(multiply(X[x], X[x])) * multiply(v, v)  # multiply對應元素相乘
            # 完成交叉項
            interaction = sum(multiply(inter_1, inter_1) - inter_2) / 2.
            p = w_0 + X[x] * w + interaction  # 計算預測的輸出
            pre = self.sigmoid(p[0, 0])
            result.append(pre)
        return result

    def getAccuracy(self, data):
        dataMatrix, classLabels = self.loadDataSet(data)
        w_0 = self._w_0
        w = self._w
        v = self._v
        m, n = shape(dataMatrix)
        allItem = 0
        error = 0
        result = []
        for x in range(m):
            allItem += 1
            inter_1 = dataMatrix[x] * v
            inter_2 = multiply(dataMatrix[x], dataMatrix[x]) * multiply(v, v)  # multiply對應元素相乘
            # 完成交叉項
            interaction = sum(multiply(inter_1, inter_1) - inter_2) / 2.
            p = w_0 + dataMatrix[x] * w + interaction  # 計算預測的輸出
            pre = self.sigmoid(p[0, 0])
            result.append(pre)
            if pre < 0.5 and classLabels[x] == 1.0:
                error += 1
            elif pre >= 0.5 and classLabels[x] == -1.0:
                error += 1
            else:
                continue
        # print(result)
        value = 1 - float(error) / allItem
        return value


class NotFittedError(Exception):
    """
    Exception class to raise if estimator is used before fitting
    """
    pass


if __name__ == '__main__':
    fm()

python實現FM演算法
2020-12-25
Python演算法
4.【Python】分類演算法—Factorization Machine（FM，因子分解機）
2020-12-15
Python演算法Mac
python實現冒泡演算法
2019-02-16
Python演算法
RSA演算法與Python實現
2018-08-08
演算法Python
PageRank演算法概述與Python實現
2024-04-27
演算法Python
python實現希爾排序演算法
2019-04-18
Python排序演算法
蟻群演算法原理及其實現(python)
2018-05-20
演算法Python
python實現常用五種排序演算法
2021-08-07
Python排序演算法
模擬退火演算法（1）Python 實現
2021-05-01
演算法Python
[譯]使用 Python 實現接縫裁剪演算法
2018-07-12
Python演算法
隨機森林演算法原理與Python實現
2024-04-28
隨機森林演算法Python
CART演算法解密：從原理到Python實現
2023-11-23
演算法解密Python
用Python實現約瑟夫環演算法
2019-06-11
Python演算法
目標匹配：匈牙利演算法的python實現
2020-12-29
演算法Python
排序演算法原理總結和Python實現
2021-01-01
排序演算法Python
手寫演算法-python程式碼實現Kmeans
2020-12-17
演算法Python
python實現常見的五種排序演算法
2019-02-16
Python排序演算法
高斯混合模型(GMM)和EM演算法 —— python實現
2024-03-27
模型演算法Python
八大排序演算法的python實現
2024-06-27
排序演算法Python
社群發現之標籤傳播演算法（LPA）python實現
2024-04-26
演算法Python
python基礎之 python實現PID演算法及測試的例子
2020-03-13
Python演算法
基於鄰域粗糙集演算法python實現
2019-01-07
演算法Python
如何使用Python語言實現計數排序演算法?
2023-09-22
Python排序演算法
Python程式設計實現蟻群演算法詳解
2020-02-07
Python程式設計演算法
python實現Dijkstra演算法之最短路徑問題
2019-09-19
Python演算法
感知器演算法及其python 實現 V2.0
2021-11-18
演算法Python
Python-遺傳演算法君主交叉程式碼實現
2021-08-12
Python演算法
快速理解7種排序演算法 | python3實現(
2021-09-09
排序演算法Python
RSA加密演算法簡單介紹以及python實現
2021-10-11
加密演算法Python
用Python寫演算法 | 蓄水池演算法實現隨機抽樣
2019-02-18
Python演算法隨機
利用python的KMeans和PCA包實現聚類演算法
2019-09-15
PythonPCA聚類演算法
python實現之 K-means演算法簡單介紹
2020-05-21
Python演算法
HMM-維特比演算法理解與實現（python）
2020-05-13
HMM維特比演算法Python
python3實現幾種常見的排序演算法
2021-07-03
Python排序演算法
智慧優化演算法——python實現免疫遺傳演算法的影像擬合
2022-04-12
優化演算法Python
7.2 FM Index Matching
2020-12-04
Index
micropather實現A*演算法
2019-01-08
演算法
ARC演算法實現
2024-07-12
演算法

FM演算法python實現

相關文章