如何開啟深度學習之旅？這三大類125篇論文為你導航（附資源下載）

如果你現在還是個深度學習的新手，那麼你問的第一個問題可能是「我應該從哪篇文章開始讀呢？」在 Github 上，songrotek 準備了一套深度學習閱讀清單，而且這份清單在隨時更新。至於文中提到的 PDF，讀者們可點選閱讀原文下載機器之心打包的論文，或點開下面的專案地址下載自己喜歡的學習材料。

專案地址：https://github.com/songrotek/Deep-Learning-Papers-Reading-Roadmap

這份清單依照下述 4 條原則建立：

從整體輪廓到細節
從過去到當代
從一般到具體領域
聚焦當下最先進技術

你會發現很多非常新但很值得一讀的論文。這份清單我會持續更新。

1、深度學習的歷史與基礎知識

1.0 書籍

[0] Bengio, Yoshua, Ian J. Goodfellow, and Aaron Courville. 深度學習（Deep learning）, An MIT Press book. (2015). （這是深度學習領域的聖經，你可以在讀此書的同時閱讀下面的論文）。

1.1 調查類：

[1] LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. 深度學習 (Deep learning), Nature 521.7553 (2015): 436-444. (深度學習三位大牛對各種學習模型的評價)

1.2 深度信念網路（DBN）（深度學習前夜的里程碑）

[2] Hinton, Geoffrey E., Simon Osindero, and Yee-Whye Teh. 一個關於深度信念網路的快速學習演算法（A fast learning algorithm for deep belief nets）, (深度學習的前夜)

[3] Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. 使用神經網路降低資料的維度（Reducing the dimensionality of data with neural networks）, (里程碑式的論文，展示了深度學習的可靠性)

1.3 ImageNet 的演化（深度學習從這裡開始）

[4] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. 使用深度卷積神經網路進行 ImageNet 分類任務（Imagenet classification with deep convolutional neural networks）(AlexNet, 深度學習的突破)

[5] Simonyan, Karen, and Andrew Zisserman. 針對大尺度影像識別工作的的超深卷積網路（Very deep convolutional networks for large-scale image recognition） (VGGNet, 神經網路開始變得非常深！)

[6] Szegedy, Christian, et al. 更深的卷積（Going deeper with convolutions）(GoogLeNet)

[7] He, Kaiming, et al. 影像識別的深度殘差學習（Deep residual learning for image recognition）(ResNet，超級超級深的深度網路！CVPR--IEEE 國際計算機視覺與模式識別會議-- 最佳論文)

1.4 語音識別的演化

[8] Hinton, Geoffrey, et al. 語音識別中深度神經網路的聲學建模（Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups）（語音識別中的突破)

[9] Graves, Alex, Abdel-rahman Mohamed, and Geoffrey Hinton. 用深度迴圈神經網路進行語音識別（Speech recognition with deep recurrent neural networks）(RNN)

[10] Graves, Alex, and Navdeep Jaitly. 面向端到端語音識別的迴圈神經網路（Towards End-To-End Speech Recognition with Recurrent Neural Networks）

[11] Sak, Haşim, et al. 語音識別中快且精準的迴圈神經網路聲學模型（Fast and accurate recurrent neural network acoustic models for speech recognition）(谷歌語音識別系統)

[12] Amodei, Dario, et al. Deep speech 2:英語和漢語的端到端語音識別（Deep speech 2: End-to-end speech recognition in english and mandarin）(百度語音識別系統)

[13] W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, G. Zweig，在對話語音識別中實現人類平等（Achieving Human Parity in Conversational Speech Recognition） (最先進的語音識別技術，微軟)

當你讀完了上面給出的論文，你會對深度學習歷史有一個基本的瞭解，深度學習建模的基本架構（包括了 CNN，RNN，LSTM）以及深度學習如何可以被應用於影像和語音識別問題。下面的論文會讓你對深度學習方法，不同應用領域中的深度學習技術和其侷限有深度認識。我建議你可以基於自己的興趣和研究方向選擇下面這些論文。

2 深度學習方法

2.1 模型

[14] Hinton, Geoffrey E., et al. 透過避免特徵檢測器的共適應來改善神經網路（Improving neural networks by preventing co-adaptation of feature detectors）(Dropout)

[15] Srivastava, Nitish, et al. Dropout：一種避免神經網路過度擬合的簡單方法（Dropout: a simple way to prevent neural networks from overfitting）

[16] Ioffe, Sergey, and Christian Szegedy. Batch normalization:透過減少內部協變數加速深度網路訓練（Batch normalization: Accelerating deep network training by reducing internal covariate shift）(2015 年一篇傑出論文)

[17] Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton.層歸一化（Layer normalization）(批歸一化的升級版)

[18] Courbariaux, Matthieu, et al. 二值神經網路：訓練神經網路的權重和啟用約束到正 1 或者負 1（Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to+ 1 or−1）(新模型，快)

[19] Jaderberg, Max, et al. 使用合成梯度的解耦神經介面（Decoupled neural interfaces using synthetic gradients）(訓練方法的發明，令人驚歎的文章)

[20] Chen, Tianqi, Ian Goodfellow, and Jonathon Shlens. Net2net：透過知識遷移加速學習（Net2net: Accelerating learning via knowledge transfer） (修改之前的訓練網路以減少訓練)

[21] Wei, Tao, et al. 網路形態（Network Morphism）(修改之前的訓練網路以減少訓練 epoch)

2.2 最佳化

[22] Sutskever, Ilya, et al. 有關深度學習中初始化與動量因子的研究（On the importance of initialization and momentum in deep learning） (動量因子最佳化器)

[23] Kingma, Diederik, and Jimmy Ba. Adam：隨機最佳化的一種方法（Adam: A method for stochastic optimization）(可能是現在用的最多的一種方法)

[24] Andrychowicz, Marcin, et al. 透過梯度下降學習梯度下降（Learning to learn by gradient descent by gradient descent） (神經最佳化器，令人稱奇的工作)

[25] Han, Song, Huizi Mao, and William J. Dally. 深度壓縮：透過剪枝、量子化訓練和霍夫曼程式碼壓縮深度神經網路（Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding） (ICLR 最佳論文，來自 DeePhi 科技初創公司，加速 NN 執行的新方向)

[26] Iandola, Forrest N., et al. SqueezeNet：帶有 50x 更少引數和小於 1MB 模型大小的 AlexNet-層級精確度（SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 1MB model size.） (最佳化 NN 的另一個新方向，來自 DeePhi 科技初創公司)

2.3 無監督學習／深度生成模型

[27] Le, Quoc V. 透過大規模無監督學習構建高階特徵（Building high-level features using large scale unsupervised learning.） (里程碑，吳恩達，谷歌大腦，貓)

[28] Kingma, Diederik P., and Max Welling. 自動編碼變異貝葉斯（Auto-encoding variational bayes.） (VAE)

[29] Goodfellow, Ian, et al. 生成對抗網路（Generative adversarial nets.）(GAN, 超酷的想法)

[30] Radford, Alec, Luke Metz, and Soumith Chintala. 帶有深度捲曲生成對抗網路的無監督特徵學習（Unsupervised representation learning with deep convolutional generative adversarial networks.）(DCGAN)

[31] Gregor, Karol, et al. DRAW：一個用於影像生成的迴圈神經網路（DRAW: A recurrent neural network for image generation.） (值得注意的 VAE，傑出的工作)

[32] Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. 畫素迴圈神經網路（Pixel recurrent neural networks.）(畫素 RNN)

[33] Oord, Aaron van den, et al. 使用畫素 CNN 解碼器有條件地生成影像（Conditional image generation with PixelCNN decoders.） (畫素 CNN)

2.4 RNN／序列到序列模型

[34] Graves, Alex. 帶有迴圈神經網路的生成序列（Generating sequences with recurrent neural networks.）(LSTM, 非常好的生成結果，展示了 RNN 的力量)

[35] Cho, Kyunghyun, et al. 使用 RNN 編碼器-解碼器學習片語表徵用於統計機器翻譯（Learning phrase representations using RNN encoder-decoder for statistical machine translation.） (第一個序列到序列論文)

[36] Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. 運用神經網路的序列到序列學習（Sequence to sequence learning with neural networks.」）(傑出的工作)

[37] Bahdanau, Dzmitry, KyungHyun Cho, and Yoshua Bengio. 透過共同學習來匹配和翻譯神經機器翻譯（Neural Machine Translation by Jointly Learning to Align and Translate.）

[38] Vinyals, Oriol, and Quoc Le. 一個神經對話模型（A neural conversational model.）(聊天機器人上的序列到序列)

2.5 神經圖靈機

[39] Graves, Alex, Greg Wayne, and Ivo Danihelka. 神經圖靈機器（Neural turing machines.）arXiv preprint arXiv:1410.5401 (2014). (未來計算機的基本原型）

[40] Zaremba, Wojciech, and Ilya Sutskever. 強化學習神經圖靈機（Reinforcement learning neural Turing machines.）

[41] Weston, Jason, Sumit Chopra, and Antoine Bordes. 記憶網路（Memory networks.）

[42] Sukhbaatar, Sainbayar, Jason Weston, and Rob Fergus. 端到端記憶網路（End-to-end memory networks.）

[43] Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly. 指示器網路（Pointer networks.）

[44] Graves, Alex, et al. 使用帶有動力外部記憶體的神經網路的混合計算（Hybrid computing using a neural network with dynamic external memory.）(里程碑，結合上述論文的思想)

2.6 深度強化學習

[45] Mnih, Volodymyr, et al. 使用深度強化學習翫 atari 遊戲（Playing atari with deep reinforcement learning.） (第一篇以深度強化學習命名的論文)

[46] Mnih, Volodymyr, et al. 透過深度強化學習達到人類水準的控制（Human-level control through deep reinforcement learning.） (里程碑)

[47] Wang, Ziyu, Nando de Freitas, and Marc Lanctot. 用於深度強化學習的決鬥網路架構（Dueling network architectures for deep reinforcement learning.） (ICLR 最佳論文，偉大的想法 )

[48] Mnih, Volodymyr, et al. 用於深度強化學習的非同步方法（Asynchronous methods for deep reinforcement learning.） (當前最先進的方法)

[49] Lillicrap, Timothy P., et al. 運用深度強化學習進行持續控制（Continuous control with deep reinforcement learning.） (DDPG)

[50] Gu, Shixiang, et al. 帶有模型加速的持續深層 Q-學習（Continuous Deep Q-Learning with Model-based Acceleration.）

[51] Schulman, John, et al. 信賴域策略最佳化（Trust region policy optimization.） (TRPO)

[52] Silver, David, et al. 使用深度神經網路和樹搜尋掌握圍棋遊戲（Mastering the game of Go with deep neural networks and tree search.） (阿爾法狗)

2.7 深度遷移學習／終身學習／尤其對於 RL

[53] Bengio, Yoshua. 表徵無監督和遷移學習的深度學習（Deep Learning of Representations for Unsupervised and Transfer Learning.） (一個教程)

[54] Silver, Daniel L., Qiang Yang, and Lianghao Li. 終身機器學習系統：超越學習演算法（Lifelong Machine Learning Systems: Beyond Learning Algorithms.） (一個關於終生學習的簡要討論)

[55] Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. 提取神經網路中的知識（Distilling the knowledge in a neural network.） (教父的工作)

[56] Rusu, Andrei A., et al. 策略提取（Policy distillation.） (RL 領域)

[57] Parisotto, Emilio, Jimmy Lei Ba, and Ruslan Salakhutdinov. 演員模仿：深度多工和遷移強化學習（Actor-mimic: Deep multitask and transfer reinforcement learning.） (RL 領域)

[58] Rusu, Andrei A., et al. 漸進神經網路（Progressive neural networks.）(傑出的工作，一項全新的工作)

2.8 一次性深度學習

[59] Lake, Brenden M., Ruslan Salakhutdinov, and Joshua B. Tenenbaum. 透過機率程式歸納達到人類水準的概念學習（Human-level concept learning through probabilistic program induction.）(不是深度學習，但是值得閱讀)

[60] Koch, Gregory, Richard Zemel, and Ruslan Salakhutdinov. 用於一次影像識別的孿生神經網路（Siamese Neural Networks for One-shot Image Recognition.）

[61] Santoro, Adam, et al. 用記憶增強神經網路進行一次性學習（One-shot Learning with Memory-Augmented Neural Networks ） (一個一次性學習的基本步驟)

[62] Vinyals, Oriol, et al. 用於一次性學習的匹配網路（Matching Networks for One Shot Learning.）

[63] Hariharan, Bharath, and Ross Girshick. 少量視覺物體識別（Low-shot visual object recognition.）(走向大資料的一步)

3 應用

3.1 NLP（自然語言處理）

[1] Antoine Bordes, et al. 開放文字語義分析的詞和意義表徵的聯合學習（Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing.）

[2] Mikolov, et al. 詞和短語及其組合性的分散式表徵（Distributed representations of words and phrases and their compositionality.） (word2vec)

[3] Sutskever, et al. 運用神經網路的序列到序列學習（Sequence to sequence learning with neural networks.）

[4] Ankit Kumar, et al. 問我一切：動態記憶網路用於自然語言處理（Ask Me Anything: Dynamic Memory Networks for Natural Language Processing.）

[5] Yoon Kim, et al. 角色意識的神經語言模型（Character-Aware Neural Language Models.）

[6] Jason Weston, et al. 走向人工智慧-完成問題回答：一組前提玩具任務（Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks.） (bAbI 任務)

[7] Karl Moritz Hermann, et al. 教機器閱讀和理解（Teaching Machines to Read and Comprehend.）(CNN/每日郵件完形風格問題)

[8] Alexis Conneau, et al. 非常深度捲曲網路用於自然語言處理（Very Deep Convolutional Networks for Natural Language Processing.） (在文字分類中當前最好的)

[9] Armand Joulin, et al. 詭計包用於有效文字分類（Bag of Tricks for Efficient Text Classification.）(比最好的差一點，但快很多)

3.2 目標檢測

[1] Szegedy, Christian, Alexander Toshev, and Dumitru Erhan. 深度神經網路用於目標檢測（Deep neural networks for object detection.）

[2] Girshick, Ross, et al. 富特徵層級用於精確目標檢測和語義分割（Rich feature hierarchies for accurate object detection and semantic segmentation.）(RCNN)

[3] He, Kaiming, et al. 深度捲曲網路的空間金字塔池用於視覺識別（Spatial pyramid pooling in deep convolutional networks for visual recognition.） (SPPNet)

[4] Girshick, Ross. 快速的迴圈捲曲神經網路（Fast r-cnn.）

[5] Ren, Shaoqing, et al. 更快的迴圈捲曲神經網路：透過區域建議網路趨向實時目標檢測（Faster R-CNN: Towards real-time object detection with region proposal networks.）

[6] Redmon, Joseph, et al. 你只看到一次：統一實時的目標檢測（You only look once: Unified, real-time object detection.） (YOLO, 傑出的工作，真的很實用)

[7] Liu, Wei, et al. SSD：一次性多盒探測器（SSD: Single Shot MultiBox Detector.）

3.3 視覺跟蹤

[1] Wang, Naiyan, and Dit-Yan Yeung. 學習視覺跟蹤用的一種深度壓縮圖象表示（Learning a deep compact image representation for visual tracking.） (第一篇使用深度學習進行視覺跟蹤的論文，DLT 跟蹤器)

[2] Wang, Naiyan, et al. 為穩定的視覺跟蹤傳輸豐富特徵層次（Transferring rich feature hierarchies for robust visual tracking.）(SO-DLT)

[3] Wang, Lijun, et al. 用全卷積網路進行視覺跟蹤（Visual tracking with fully convolutional networks.） (FCNT)

[4] Held, David, Sebastian Thrun, and Silvio Savarese. 用深度迴歸網路以 100FPS 速度跟蹤（Learning to Track at 100 FPS with Deep Regression Networks.） (GOTURN, 作為一個深度神經網路，其速度非常快，但是相較於非深度學習方法還是慢了很多)

[5] Bertinetto, Luca, et al. 物件跟蹤的全卷積 Siamese 網路（Fully-Convolutional Siamese Networks for Object Tracking.） (SiameseFC, 實時物件追蹤的最先進技術)

[6] Martin Danelljan, Andreas Robinson, Fahad Khan, Michael Felsberg. 超越相關濾波器：學習連續卷積運算元的視覺追蹤（Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking.）(C-COT)

[7] Nam, Hyeonseob, Mooyeol Baek, and Bohyung Han. 在視覺跟蹤的樹結構中傳遞卷積神經網路與建模（Modeling and Propagating CNNs in a Tree Structure for Visual Tracking.）(VOT2016 Winner,TCNN)

3.4 影像說明

[1] Farhadi,Ali,etal. 每幅圖都講述了一個故事：從影像中生成句子（Every picture tells a story: Generating sentences from images.）

[2] Kulkarni, Girish, et al. 兒語：理解並生成影像的描述（talk: Understanding and generating image descriptions.）

[3] Vinyals, Oriol, et al. 展示與表達：一個神經影像字幕生成器（Show and tell: A neural image caption generator）

[4] Donahue, Jeff, et al. 視覺認知和描述的長期遞迴卷積網路（Long-term recurrent convolutional networks for visual recognition and description）

[5] Karpathy, Andrej, and Li Fei-Fei. 產生影像描述的深層視覺語義對齊（Deep visual-semantic alignments for generating image descriptions）

[6] Karpathy, Andrej, Armand Joulin, and Fei Fei F. Li. 雙向影像句對映的深片段嵌入（Deep fragment embeddings for bidirectional image sentence mapping）

[7] Fang, Hao, et al. 從字幕到視覺概念，從視覺概念到字幕（From captions to visual concepts and back）

[8] Chen, Xinlei, and C. Lawrence Zitnick. 影像字幕生成的遞迴視覺表徵學習「Learning a recurrent visual representation for image caption generation

[9] Mao, Junhua, et al. 使用多模型遞迴神經網路（m-rnn）的深度字幕生成（Deep captioning with multimodal recurrent neural networks (m-rnn).）

[10] Xu, Kelvin, et al. 展示、參與與表達：視覺注意的神經影像字幕生成（Show, attend and tell: Neural image caption generation with visual attention.）

3.5 機器翻譯

一些里程碑式的論文在 RNN \序列到序列的主題分類下被列舉。

[1] Luong, Minh-Thang, et al. 神經機器翻譯中生僻詞問題的處理（Addressing the rare word problem in neural machine translation.）

[2] Sennrich, et al. 帶有子詞單元的生僻字神經機器翻譯（Neural Machine Translation of Rare Words with Subword Units）

[3] Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. 基於注意力的神經機器翻譯的有效途徑（Effective approaches to attention-based neural machine translation.）

[4] Chung, et al. 一個機器翻譯無顯式分割的字元級解碼器（A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation）

[5] Lee, et al. 無顯式分割的全字元級神經機器翻譯（Fully Character-Level Neural Machine Translation without Explicit Segmentation）

[6] Wu, Schuster, Chen, Le, et al. 谷歌的神經機器翻譯系統：彌合人與機器翻譯的鴻溝（Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation）

3.6 機器人

[1] Koutník, Jan, et al. 發展用於視覺強化學習的大規模神經網路（Evolving large-scale neural networks for vision-based reinforcement learning.）

[2] Levine, Sergey, et al. 深度視覺眼肌運動策略的端到端訓練（End-to-end training of deep visuomotor policies.）

[3] Pinto, Lerrel, and Abhinav Gupta. 超大尺度自我監督：從 5 萬次嘗試和 700 機器人小時中學習抓取（Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours.）

[4] Levine, Sergey, et al. 學習手眼協作用於機器人掌握深度學習和大資料蒐集（Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection.）

[5] Zhu, Yuke, et al. 使用深度強化學習視覺導航目標驅動的室內場景（Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning.）

[6] Yahya, Ali, et al. 使用分散式非同步引導策略搜尋進行集體機器人增強學習（Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search.）

[7] Gu, Shixiang, et al. 深度強化學習用於機器操控（Deep Reinforcement Learning for Robotic Manipulation.）

[8] A Rusu, M Vecerik, Thomas Rothörl, N Heess, R Pascanu, R Hadsell. 模擬實機機器人使用過程網從畫素中學習（Sim-to-Real Robot Learning from Pixels with Progressive Nets.）

[9] Mirowski, Piotr, et al. 學習在複雜環境中導航（Learning to navigate in complex environments.）

3.7 藝術

[1] Mordvintsev, Alexander; Olah, Christopher; Tyka, Mike (2015). 初始主義：神經網路的更深層（Inceptionism: Going Deeper into Neural Networks）(谷歌 Deep Dream)

[2] Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. 一個藝術風格的神經演算法（A neural algorithm of artistic style.） (傑出的工作，目前最成功的演算法)

[3] Zhu, Jun-Yan, et al. 自然影像流形上的生成視覺操縱（Generative Visual Manipulation on the Natural Image Manifold.）

[4] Champandard, Alex J. Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks. (神經塗鴉)

[5] Zhang, Richard, Phillip Isola, and Alexei A. Efros. 多彩的影像彩色化（Colorful Image Colorization.）

[6] Johnson, Justin, Alexandre Alahi, and Li Fei-Fei. 實時風格遷移和超解析度的感知損失（Perceptual losses for real-time style transfer and super-resolution.）

[7] Vincent Dumoulin, Jonathon Shlens and Manjunath Kudlur. 一個藝術風格的學習表徵（A learned representation for artistic style.）

[8] Gatys, Leon and Ecker, et al. 神經風格遷移中的控制感知因子（Controlling Perceptual Factors in Neural Style Transfer.） (控制空間定位、色彩資訊和全空間尺度方面的風格遷移)

[9] Ulyanov, Dmitry and Lebedev, Vadim, et al. 紋理網路：紋理和風格化影像的前饋合成（Texture Networks: Feed-forward Synthesis of Textures and Stylized Images.） (紋理生成和風格遷移)

3.8 物件分割

[1] J. Long, E. Shelhamer, and T. Darrell, 用於語義分割的全卷積網路（Fully convolutional networks for semantic segmentation）

[2] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. 具有深度卷積網路和全連線的條件隨機場的語義影像分割（Semantic image segmentation with deep convolutional nets and fully connected crfs）

[3] Pinheiro, P.O., Collobert, R., Dollar, P. 學習如何分割候選物件（Learning to segment object candidates）

[4] Dai, J., He, K., Sun, J. 基於多工網路級聯的例項感知語義分割（Instance-aware semantic segmentation via multi-task network cascades）

[5] Dai, J., He, K., Sun, J. 例項敏感的全卷積網路（Instance-sensitive Fully Convolutional Networks）

相關文章