NeurIPS 2018值得一讀的強化學習論文清單

AMiner学术头条發表於2018-12-13

原文網址 : http://www.jiqizhixin.com/articles/2018-12-13-22

強化學習

這個列表中的論文主要是關於深度強化學習和RL / AI，希望它對大家有所幫助。有關NeurIPS 2018中強化學習論文的清單如下，按第一作者姓氏的字母順序排列。

Brandon Amos, Ivan Jimenez, Jacob Sacks, Byron Boots, and J. Zico Kolter.
Differentiable MPC for end-to-end planning and control.
Yusuf Aytar, Tobias Pfaff, David Budden, Thomas Paine, Ziyu Wang, and Nando de Freitas.
Playing hard exploration games by watching YouTube.
Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, and Honglak Lee.
Sample-efficient reinforcement learning with stochastic ensemble value expansion.
Kurtland Chua, Roberto Calandra, Rowan McAllister, and Sergey Levine.
Data-efficient model-based reinforcement learning with deep probabilistic dynamics models.
Filipe de Avila Belbute-Peres, Kevin Smith, Kelsey Allen, Josh Tenenbaum, and J. Zico Kolter.
End-to-end differentiable physics for learning and control.
Amir massoud Farahmand.
Iterative value-aware model learning.
Justin Fu, Sergey Levine, Dibya Ghosh, Larry Yang, and Avi Singh.
An event-based framework for task specification and control.
Vikash Goel, Jameson Weng, and Pascal Poupart.
Unsupervised video object segmentation for deep reinforcement learning.
Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, and Sergey Levine.
Meta-reinforcement learning of structured exploration strategies.
David Ha and Jürgen Schmidhuber.
Recurrent world models facilitate policy evolution.
Nick Haber, Damian Mrowca, Stephanie Wang, Li Fei-Fei, and Daniel Yamins.Learning to play with intrinsically-motivated, self-aware agents.
Rein Houthooft, Yuhua Chen, Phillip Isola, Bradly Stadie, Filip Wolski, Jonathan Ho, and Pieter Abbeel.
Evolved policy gradients.
Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, LIANHUI Qin, Xiaodan Liang, Haoye Dong, and Eric Xing.
Deep generative models with learnable knowledge constraints.
Jiexi Huang, Fa Wu, Doina Precup, and Yang Cai.
Learning safe policies with expert guidance.
Kwang-Sung Jun, Lihong Li, Yuzhe Ma, and Xiaojin Zhu.
Adversarial attacks on stochastic bandits.
Raksha Kumaraswamy, Matthew Schlegel, Adam White, and Martha White.Context-dependent upper-confidence bounds for directed exploration.
Isaac Lage, Andrew Ross, Samuel J Gershman, Been Kim, and Finale Doshi-Velez.
Human-in-the-loop interpretability prior.
Marc Lanctot, Sriram Srinivasan, Vinicius Zambaldi, Julien Perolat, karl Tuyls, Remi Munos, and Michael Bowling.
Actor-critic policy optimization in partially observable multiagent environments.
Nevena Lazic, Craig Boutilier, Tyler Lu, Eehern Wong, Binz Roy, MK Ryu, and Greg Imwalle.
Data center cooling using model-predictive control.
Jan Leike, Borja Ibarz, Dario Amodei, Geoffrey Irving, and Shane Legg.
Reward learning from human preferences and demonstrations in Atari.
Shuang Li, Shuai Xiao, Shixiang Zhu, Nan Du, Yao Xie, and Le Song.
Learning temporal point processes via reinforcement learning.
Yuan Li, Xiaodan Liang, Zhiting Hu, and Eric Xing.
Hybrid retrieval-generation reinforced agent for medical image report generation.
Chen Liang, Mohammad Norouzi, Jonathan Berant, Quoc V Le, and Ni Lao.Memory augmented policy optimization for program synthesis with generalization.
Qiang Liu, Lihong Li, Ziyang Tang, and Denny Zhou.
Breaking the curse of horizon: Infinite-horizon off-policy estimation.
Yao Liu, Omer Gottesman, Aniruddh Raghu, Matthieu Komorowski, Aldo A Faisal, Finale Doshi-Velez, and Emma Brunskill.
Representation balancing MDPs for off-policy policy evaluation.
Tyler Lu, Craig Boutilier, and Dale Schuurmans.
Non-delusional Q-learning and value-iteration.
Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, and Olivier Bousquet.
Are GANs created equal? a large-scale study.
David Alvarez Melis and Tommi Jaakkola.
Towards robust interpretability with self-explaining neural networks.
Robin Manhaeve, Sebastijan Dumancic, Angelika Kimmig, Thomas Demeester, and Luc De Raedt.
DeepProbLog: Neural probabilistic logic programming.
Horia Mania, Aurelia Guy, and Benjamin Recht.
Simple random search of static linear policies is competitive for reinforcement learning.
Damian Mrowca, Chengxu Zhuang, Elias Wang, Nick Haber, Li Fei-Fei, Josh Tenenbaum, and Daniel Yamins.
A flexible neural representation for physics prediction.
Ofir Nachum, Shixiang Gu, Honglak Lee, and Sergey Levine.
Data-efficient hierarchical reinforcement learning.
Ashvin Nair, Vitchyr Pong, Shikhar Bahl, Sergey Levine, Steven Lin, and Murtaza Dalal.
Visual goal-conditioned reinforcement learning by representation learning.
Matthew O’Kelly, Aman Sinha, Hongseok Namkoong, Russ Tedrake, and John C Duchi.
Scalable end-to-end autonomous vehicle testing via rare-event simulation.
Ian Osband, John S Aslanides, and Albin Cassirer.
Randomized prior functions for deep reinforcement learning.
Matthew Riemer, Miao Liu, and Gerald Tesauro.
Learning abstract options.
Adam Santoro, Ryan Faulkner, David Raposo, Jack Rae, Mike Chrzanowski, Theophane Weber, Daan Wierstra, Oriol Vinyals, Razvan Pascanu, and Tim Lillicrap.
Relational recurrent neural networks.
Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, and Aleksander Madry.
How does batch normalization help optimization? (no, it is not about internal covariate shift).
Ozan Sener and Vladlen Koltun.
Multi-task learning as multi-objective optimization.
Jiaming Song, Hongyu Ren, Dorsa Sadigh, and Stefano Ermon.
Multi-agent generative adversarial imitation learning.
Wen Sun, Geoffrey Gordon, Byron Boots, and J. Bagnell.
Dual policy iteration.
Aviv Tamar, Pieter Abbeel, Ge Yang, Thanard Kurutach, and Stuart Russell.Learning plannable representations with causal InfoGAN.
Andrew Trask, Felix Hill, Scott Reed, Jack Rae, Chris Dyer, and Phil Blunsom.Neural arithmetic logic units.
Tongzhou Wang, YI WU, David Moore, and Stuart Russell.
Meta-learning MCMC proposals.
Catherine Wong, Neil Houlsby, Yifeng Lu, and Andrea Gesmundo.
Transfer learning with neural AutoML.
Kelvin Xu, Chelsea Finn, and Sergey Levine.
Uncertainty-aware few-shot learning with probabilistic model-agnostic meta-learning.
Zhongwen Xu, Hado van Hasselt, and David Silver.
Meta-gradient reinforcement learning.
Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, and Josh Tenenbaum.
Neural-Symbolic VQA: Disentangling reasoning from vision and language understanding.
Lisa Zhang, Gregory Rosenblatt, Ethan Fetaya, Renjie Liao, William Byrd, Matthew Might, Raquel Urtasun, and Richard Zemel.
Neural guided con- straint logic programming for program synthesis.
Yu Zhang, Ying Wei, and Qiang Yang.
Learning to multitask.
Zeyu Zheng, Junhyuk Oh, and Satinder Singh.
On learning intrinsic rewards for policy gradient methods.

資訊來源：https://medium.com/@yuxili/nips-2018-rl-papers-to-read-5bc1edb85a28

AMiner學術頭條

AMiner平臺由清華大學計算機系研發，擁有我國完全自主智慧財產權。系統2006年上線，吸引了全球220個國家/地區800多萬獨立IP訪問，資料下載量230萬次，年度訪問量1000萬，成為學術搜尋和社會網路挖掘研究的重要資料和實驗平臺。

https://www.aminer.cn/

從強化學習到生成模型：ICML 2018 40篇值得一讀的論文
2018-08-06
強化學習模型
解讀NeurIPS2019最好的機器學習論文
2020-01-09
機器學習
NeurIPS 2020 | 近期必讀模仿學習精選論文
2020-12-08
2018讀書清單
2018-12-30
NeurIPS 2019｜騰訊AI Lab詳解入選論文，含模仿學習、強化學習、自動機器學習等主題
2019-12-11
AI強化學習機器學習
「如何跳出鞍點？」NeurIPS 2018優化相關論文提前看
2018-12-04
優化
近期有哪些值得讀的QA論文？| 專題論文解讀
2018-06-05
NeurIPS提前看 | 四篇論文，一窺元學習的最新研究進展
2019-12-09
谷歌論文：使用深度強化學習的晶片佈局
2020-05-07
谷歌強化學習晶片
「如何跳出鞍點？」NeurIPS 2018最佳化相關論文提前看
2018-12-04
讀不懂NeurIPS 2018的艱深論文？我們已經為你劃好了重點
2018-12-05
一文讀懂人工智慧、機器學習、深度學習、強化學習的關係（必看）
2019-02-14
人工智慧機器學習深度學習強化學習
ICLR 2021投稿中值得一讀的NLP相關論文
2020-11-10
ICLR
NeurIPS 2018 oral論文解讀：如何給詞嵌入模型選擇最優維度
2019-01-03
模型
ICLR 2020 多智慧體強化學習論文總結
2020-09-29
ICLR智慧體強化學習
NeurIPS 2018，最佳論文也許就藏在這30篇oral論文中
2018-12-02
NeurIPS 2019 | 17篇論文，詳解圖的機器學習趨勢
2019-12-24
機器學習
17篇論文，詳解圖的機器學習趨勢 | NeurIPS 2019
2020-04-06
機器學習
NeurIPS 2020 | 微軟亞洲研究院論文摘錄之強化學習&GAN篇
2020-12-11
微軟強化學習
一文讀懂強化學習：RL全面解析與Pytorch實戰
2023-11-02
強化學習PyTorch
近期值得讀的10篇GAN進展論文
2019-01-03
並行多工學習論文閱讀（五）：論文閱讀總結
2021-11-12
並行
AAAI 2020 | 52篇深度強化學習accept論文彙總
2020-02-03
AI強化學習
並行多工學習論文閱讀（一）：多工學習速覽
2021-10-29
並行
2019 年 12 個深度學習最佳書籍清單！值得收藏
2019-03-27
深度學習
清華大學、北京語言大學獲得 CCL 2018 最佳論文獎
2018-10-25
每月都有重磅研究，2024全年值得一讀的論文都在這了
2025-01-01
深度學習論文閱讀路線圖
2018-08-06
深度學習
一文了解強化學習的商業應用2
2018-11-09
強化學習
【強化學習篇】--強化學習案例詳解一
2018-06-30
強化學習
當博弈論遇上機器學習：一文讀懂相關理論
2019-10-28
機器學習
【深度學習論文篇 02-1 】YOLOv1論文精讀
2022-04-14
深度學習YOLOv1
NeurIPS 今年共收錄1900篇論文，我該怎麼閱讀？
2020-10-13
NeurIPS 2018提前看：生物學與學習演算法
2018-12-02
演算法
論文學習
2020-06-01
NeurIPS 2018亮點選讀：深度推理學習中的圖網路與關係表徵
2018-11-27
選對論文，效率提升50% | 本週值得讀
2018-03-09
NeurIPS 2018 開幕重磅：四篇最佳論文正式揭曉，論文接受全方位資料公開
2018-12-04

NeurIPS 2018值得一讀的強化學習論文清單

相關文章