Python之協同過濾(尋找相近的使用者)

勿在浮沙築高臺LS發表於2017-01-15

資料內容是人們對不同電影的評價:我們通過計算人與人之間評價電影的相關度來找到口味相同的人,根據口味相同的人來推薦可能喜歡的電影。
資料如下:



critics={'lisa rose':{'lady in the Water':2.5,'snakes on a plane':3.5,'just my luck':3.0,'superman returns':3.5,
                      'you ,me and dupree':2.5,'the night listener':3.0},
         'gene seymour':{'lady in the Water':3.0,'snakes on a plane':3.5,'just my luck':1.5,'superman returns':5.0,
                         'you ,me and dupree':3.5,'the night listener':3.0},
         'michael phillips':{'lady in the Water':2.5,'snakes on a plane':3.0,'superman returns':3.5,
                         'the night listener':4.0},
         'claudia puig':{'snakes on a plane':3.5,'just my luck':3.0,'superman returns':4.0,
                      'you ,me and dupree':2.5,'the night listener':4.5},
         'mick lasalle':{'lady in the Water':3.0,'snakes on a plane':4.0,'just my luck':2.0,'superman returns':3.0,
                      'you ,me and dupree':2.0,'the night listener':3.0},
         'jack mattews':{'lady in the Water':3.0,'snakes on a plane':4.0,'superman returns':5.0,
                      'you ,me and dupree':3.5,'the night listener':3.0},
         'toby':{'snakes on a plane':4.5,'superman returns':4.0,'you ,me and dupree':1.0}}

兩種計算距離的方法:

from math import sqrt

def sim_distnace(prefs,persion1,persion2):
    si={}
    for item in prefs[persion1]:
      if item in prefs[persion2]:
       si[item]=1
    if len(si)==0:return 0

    sum_of_squares=sum([pow(prefs[persion1][item]-prefs[persion2][item],2) 
                        for item in prefs[persion1] if item in prefs[persion2]])

    return 1/(1+sqrt(sum_of_squares))


def sim_pearson(prefs,p1,p2):
    si={}
    for item in prefs[p1]:
        if item in prefs[p2]:
            si[item]=1

    n=len(si)

    if n==0 : return 1

    sum1=sum([prefs[p1][it] for it in si])
    sum2=sum([prefs[p2][it] for it in si])


    sum1sq=sum([pow(prefs[p1][it],2) for it in si])
    sum2sq=sum([pow(prefs[p2][it],2) for it in si])


    psum=sum([prefs[p1][it]*prefs[p2][it] for it in si])

    num=psum-(sum1*sum2/n)

    den=sqrt((sum1sq-pow(sum1,2)/n)*(sum2sq-pow(sum2,2)/n))

    if den==0:return 0

    r=num/den

    return r


測試程式碼:

from recommendations import critics
from distance import sim_pearson
from skimage.transform._geometric import SimilarityTransform


def topMatches(prefs,person,n=5,Similarity=sim_pearson):
    scores=[(Similarity(prefs,person,other),other) for other in  prefs if other!=person]

    scores.sort();
    scores.reverse();
    return scores[0:n]


print(topMatches(critics,'toby',n=3))

實驗結果:

[(0.9912407071619299, 'lisa rose'), (0.9244734516419049, 'mick lasalle'), (0.8934051474415647, 'claudia puig')]

我們僅僅找到跟我們品味相同的人是不夠的,我們要得到對影片的評價,跟我們品味相同的人,我們就更加看重他的評價,所以我們把相關係數作為權值來計算一個影片的評價分數。
程式碼如下:

def getRecommendations(prefs,person,Similarity=sim_pearson):

    totals={}

    simSums={}

    for other in prefs:
        if other == person:continue
        sim=Similarity(prefs,person,other)
        if sim<0:continue
        for item in prefs[other]:
            if item not in prefs[person] or prefs[person][item]==0:
                totals.setdefault(item,0)
                totals[item]+=prefs[other][item]*sim
                simSums.setdefault(item,0)
                simSums[item]+=Similarity
    rankings=[(total/simSums[item],item) for item,total in totals.items()]

    rankings.sort()
    rankings.reverse()
    return rankings

相關文章