基於專案的協同過濾推薦演算法(Item-Based Collaborative Filtering Recommendation Algorithms)

想你时风起發表於2024-04-07

前言

協同過濾推薦系統,包括基於使用者的、基於專案的息肉透過率等,今天我們讀一篇基於專案的協同過濾演算法的論文。

今天讀的論文為一篇名叫《基於專案的協同過濾推薦演算法》(Item-Based Collaborative Filtering RecommendationAlgorithms)。

摘要

Recommender systems apply knowledge discovery techniques to the problem of making personalized recommendations for information, products or services during a live interaction. These systems, especially the k-nearest neighbor collaborative \x0cltering based ones, are achieving widespread success on the Web. The tremendous growth in the amount of available information and the number of visitors to Web sites in recent years poses some key challenges for recommender systems. These are: producing high quality recommendations, performing many recommendations per second for millions of users and items and achieving high coverage in the face of data sparsity. In traditional collaborative filtering systems the amount of work increases with the number of participants in the system. New recommender system technologies are needed that can quickly produce high quality recommendations, even for very large-scale problems. To address these issues we have explored item-based collaborative filtering techniques. Item-based techniques first analyze the user-item matrix to identify relationships between different items, and then use these relationships to indirectly compute recommendations for users.

推薦系統將知識發現技術應用於實時互動中,為資訊、產品或服務提供個性化推薦。這些系統,特別是基於k近鄰協作聚類的系統,在Web上取得了廣泛的成功。近年來,網站可用資訊量和訪問量的急劇增長對推薦系統提出了嚴峻的挑戰。這些是:產生高質量的推薦,每秒為數百萬使用者和物品執行多次推薦,以及在資料稀疏的情況下實現高覆蓋率。在傳統的協同過濾系統中,工作量會隨著參與者數量的增加而增加。新的推薦系統技術需要能夠快速產生高質量的推薦,即使是對於非常大規模的問題。為了解決這些問題,我們探索了基於物品的協同過濾技術。基於物品的推薦技術首先透過分析使用者-物品矩陣來識別不同物品之間的關係,然後利用這些關係間接地為使用者計算推薦。

In this paper we analyze different item-based recommendation generation algorithms. We look into different techniques for computing item-item similarities (e.g., item-item correlation vs. cosine similarities between item vectors) and different techniques for obtaining recommendations from them (e.g., weighted sum vs. regression model). Finally, we ex- perimentally evaluate our results and compare them to the basic k-nearest neighbor approach. Our experiments suggest that item-based algorithms provide dramatically better performance than user-based algorithms, while at the same time providing better quality than the best available userbased algorithms.

本文分析了不同的基於專案的推薦生成演算法。我們研究了計算物品相似度的不同技術(例如物品之間的相關度物品向量之間的餘弦相似度),以及從中獲得推薦的不同技術(例如加權和迴歸模型)。最後,對實驗結果進行評估,並與基本的k近鄰方法進行比較。實驗表明,基於物品的演算法在效能上明顯優於基於使用者的演算法,同時在質量上也優於現有的最好的基於使用者的演算法。

Sarwar B, Karypis G, Konstan J, et al. Item-based collaborative filtering recommendation algorithms[C]//Proceedings of the 10th international conference on World Wide Web. 2001: 285-295.

摘要部分主要內容

摘要主要介紹了傳統的K近鄰演算法的缺陷:隨著網際網路技術的快速發展,對推薦系統產生了很大的衝擊,文章提出了計算物品相似度的技術,並從中獲得不同的推薦技術,最後分析實驗結果,同時與K近鄰演算法比較,實驗結果表明,協同過濾推薦演算法更好。

引言

The amount of information in the world is increasing far more quickly than our ability to process it. All of us have known the feeling of being overwhelmed by the number of new books, journal articles, and conference proceedings coming out each year. Technology has dramatically reduced the barriers to publishing and distributing information. Now it is time to create the technologies that can help us sift through all the available information to find that which is most valuable to us.

世界上資訊量的增長速度遠遠超過了我們處理資訊的能力。我們都有過被每年湧現的新書、期刊文章和會議記錄所淹沒的感覺。科技極大地減少了出版和傳播資訊的障礙。現在是時候創造一種技術,幫助我們篩選所有可用的資訊,找到對我們最有價值的資訊。

One of the most promising such technologies is col laborative filtering [19,27,14,16]. Collaborative filtering works by building a database of preferences for items by users. A new user, Neo, is matched against the database to discover neighbors, which are other users who have historically had similar taste to Neo. Items that the neighbors like are then recommended to Neo, as he will probably also like them. Collaborative filtering has been very successful in both research and practice, and in both information filtering applications and E-commerce applications. However, there remain important research questions in overcoming two fundamental challenges for collaborative filtering recommender systems.

其中最有前途的技術之一是協同過濾。協同過濾的工作原理是建立使用者對專案的偏好資料庫。將新使用者Neo與資料庫進行匹配,以發現鄰居,這些鄰居是歷史上與Neo有著相似品味的其他使用者。鄰居喜歡的物品會被推薦給Neo,因為他可能也會喜歡這些物品。協同過濾在資訊過濾應用和電子商務應用中都取得了很大的成功。然而,在克服協同過濾推薦系統的兩個基本挑戰方面,仍然存在重要的研究問題。

The first challenge is to improve the scalability of the collaborative filtering algorithms. These algorithms are able to search tens of thousands of potential neighbors in real-time, but the demands of modern systems are to search tens of millions of potential neighbors. Further, existing algorithms have performance problems with individual users for whomthe site has large amounts of information. For instance, if a site is using browsing patterns as indications of con- tent preference, it may have thousands of data points for its most frequent visitors. These "long user rows" slow down the number of neighbors that can be searched per second, further reducing scalability.

第一個挑戰是提高協同過濾演算法的可擴充套件性。這些演算法能夠實時搜尋數以萬計的潛在鄰居,但現代系統的需求是搜尋數以千萬計的潛在鄰居。此外,現有演算法在處理擁有大量網站資訊的個人使用者時存在效能問題。例如,如果一個網站使用瀏覽模式作為內容偏好的指示,那麼它可能有數千個最頻繁訪問者的資料點。這些“長使用者行”減慢了每秒可以搜尋的鄰居的數量,進一步降低了可伸縮性。

The second challenge is to improve the quality of the recommendations for the users. Users need recommendations they can trust to help them find items they will like. Users will "vote with their feet" by refusing to use recommender systems that are not consistently accurate for them.

第二個挑戰是提高使用者推薦的質量。使用者需要他們信任的推薦來幫助他們找到他們喜歡的東西。使用者將“用腳投票”,拒絕使用對他們來說不始終準確的推薦系統。

In some ways these two challenges are in con ict, since the less time an algorithm spends searching for neighbors, the more scalable it will be, and the worse its quality. For this reason, it is important to treat the two challenges simultaneously so the solutions discovered are both useful and practical.

在某些方面,這兩個挑戰是相互衝突的,因為演算法搜尋鄰居的時間越少,它的可擴充套件性就越強,質量就越差。因此,同時處理這兩個挑戰非常重要,這樣所發現的解決方案才既有用又實用。

In this paper, we address these issues of recommender systems by applying a different approach{item-based algorithm. The bottleneck in conventional collaborative filtering algorithms is the search for neighbors among a large user population of potential neighbors [12]. Item-based algorithms avoid this bottleneck by exploring the relationships between items first, rather than the relationships between users. Recommendations for users are computed by finding items that are similar to other items the user has liked. Because the relationships between items are relatively static,item-based algorithms may be able to provide the same quality as the user-based algorithms with less online computation.

在本文中,我們透過應用一種不同的方法(基於專案的演算法)來解決推薦系統的這些問題。傳統協同過濾演算法的瓶頸是在大量潛在鄰居使用者群中搜尋鄰居基於專案的演算法透過首先探索專案之間的關係不是使用者之間的關係來避免這個瓶頸。對使用者的推薦是透過查詢與使用者喜歡的其他物品相似的物品來計算的。因為專案之間的關係是相對靜態的基於專案的演算法可能能夠提供與基於使用者的演算法相同的質量,並且線上計算較少

結尾

今天的論文就先讀到這裡了,今天主要學習相關概念與知識,下次再補充詳細的資訊吧。


2024-01-28 18:05:28 星期日

這幾天有點忙,忘記上傳補充內容了,今天有時間補充一下,

補充:檢視補充內容,請訪問 補充:基於專案的協同過濾推薦演算法(Item-Based Collaborative Filtering Recommendation Algorithms)

相關文章