清華大學和京東發表於KDD 2019的全新強化學習框架FeedRec

AMiner學術頭條發表於2019-05-20

清華大學和京東發表於KDD 2019的全新強化學習框架FeedRec

KDD 2019 包括兩個 track:Research track 和 Applied Data Science track。

今年的 KDD Research track 共評審約 1200 篇投稿,其中約 110 篇 oral 論文,60 篇 poster 論文,接收率約 14%,比往年的 17%~18% 還要下降了近 4 個百分點。此前 3 年 KDD Research track 的錄用情況分別是:投稿 983 篇,收錄 178 篇(2018);投稿748 篇,收錄 130 篇(2017);投稿 784篇,收錄142 篇(2016)。

而此次 ADS track 約投稿 700 篇,其中 45 篇 oral 論文,100 篇 poster 論文。

學術君今天為大家推薦的是清華大學和京東發表於KDD 2019的工作。

  • 論文題目

    Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems
  • 作者

    Lixin Zou, Long Xia, Zhuoye Ding, Jiaxing Song, Weidong Liu, Dawei Yin

  • 會議/年份

    KDD 2019

  • 連結

    http://export.arxiv.org/abs/1902.05570

  • Abstract

    Recommender systems play a crucial role in our daily lives. Feed streaming mechanism has been widely used in the recommender system, especially on the mobile Apps. The feed streaming setting provides users the interactive manner of recommendation in never-ending feeds. In such an interactive manner, a good recommender system should pay more attention to user stickiness, which is far beyond classical instant metrics, and typically measured by {\bf long-term user engagement}. Directly optimizing the long-term user engagement is a non-trivial problem, as the learning target is usually not available for conventional supervised learning methods. Though reinforcement learning~(RL) naturally fits the problem of maximizing the long term rewards, applying RL to optimize long-term user engagement is still facing challenges: user behaviors are versatile and difficult to model, which typically consists of both instant feedback~(\eg clicks, ordering) and delayed feedback~(\eg dwell time, revisit); in addition, performing effective off-policy learning is still immature, especially when combining bootstrapping and function approximation. 

    To address these issues, in this work, we introduce a reinforcement learning framework --- FeedRec to optimize the long-term user engagement. FeedRec includes two components: 1)~a Q-Network which designed in hierarchical LSTM takes charge of modeling complex user behaviors, and 2)~an S-Network, which simulates the environment, assists the Q-Network and voids the instability of convergence in policy learning. Extensive experiments on synthetic data and a real-world large scale data show that FeedRec effectively optimizes the long-term user engagement and outperforms state-of-the-arts.

    推薦理由

    本文是清華大學和京東發表於 KDD 2019 的工作。論文針對利用強化學習解決推薦系統時存在使用者行為難以建模的問題,提出了一種新的強化學習框架 FeedRec,包括兩個網路:Q 網路利用層次化 LSTM 對複雜使用者行為建模,S 網路用來模擬環境,輔助和穩定 Q 網路的訓練。方法在合成資料和真實資料上進行了驗證,取得了 SOTA 的結果。清華大學和京東發表於KDD 2019的全新強化學習框架FeedRec

清華大學和京東發表於KDD 2019的全新強化學習框架FeedRec

清華大學和京東發表於KDD 2019的全新強化學習框架FeedRec

清華大學和京東發表於KDD 2019的全新強化學習框架FeedRec

清華大學和京東發表於KDD 2019的全新強化學習框架FeedRec傳送門:

論文地址:

http://export.arxiv.org/pdf/1902.05570

相關文章