如何開啟深度學習之旅?這三大類125篇論文為你導航(附資源下載)

黃小天發表於2017-03-06

如果你現在還是個深度學習的新手,那麼你問的第一個問題可能是「我應該從哪篇文章開始讀呢?」在 Github 上,songrotek 準備了一套深度學習閱讀清單,而且這份清單在隨時更新。至於文中提到的 PDF,讀者們可點選閱讀原文下載機器之心打包的論文,或點開下面的專案地址下載自己喜歡的學習材料。


專案地址:https://github.com/songrotek/Deep-Learning-Papers-Reading-Roadmap


這份清單依照下述 4 條原則建立:


  • 從整體輪廓到細節

  • 從過去到當代

  • 從一般到具體領域

  • 聚焦當下最先進技術


你會發現很多非常新但很值得一讀的論文。這份清單我會持續更新。


1、深度學習的歷史與基礎知識

1.0 書籍


[0] Bengio, Yoshua, Ian J. Goodfellow, and Aaron Courville. 深度學習(Deep learning), An MIT Press book. (2015). (這是深度學習領域的聖經,你可以在讀此書的同時閱讀下面的論文)。


1.1 調查類:


[1] LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. 深度學習 (Deep learning), Nature 521.7553 (2015): 436-444. (深度學習三位大牛對各種學習模型的評價)


1.2 深度信念網路(DBN)(深度學習前夜的里程碑)


[2] Hinton, Geoffrey E., Simon Osindero, and Yee-Whye Teh. 一個關於深度信念網路的快速學習演算法(A fast learning algorithm for deep belief nets), (深度學習的前夜)


[3] Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. 使用神經網路降低資料的維度(Reducing the dimensionality of data with neural networks),  (里程碑式的論文,展示了深度學習的可靠性)


1.3 ImageNet 的演化(深度學習從這裡開始)


[4] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. 使用深度卷積神經網路進行 ImageNet 分類任務(Imagenet classification with deep convolutional neural networks)(AlexNet, 深度學習的突破)


[5] Simonyan, Karen, and Andrew Zisserman. 針對大尺度影像識別工作的的超深卷積網路(Very deep convolutional networks for large-scale image recognition) (VGGNet, 神經網路開始變得非常深!)


[6] Szegedy, Christian, et al. 更深的卷積(Going deeper with convolutions)(GoogLeNet)


[7] He, Kaiming, et al. 影像識別的深度殘差學習(Deep residual learning for image recognition)(ResNet,超級超級深的深度網路!CVPR--IEEE 國際計算機視覺與模式識別會議-- 最佳論文)


1.4 語音識別的演化


[8] Hinton, Geoffrey, et al. 語音識別中深度神經網路的聲學建模(Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups)(語音識別中的突破)


[9] Graves, Alex, Abdel-rahman Mohamed, and Geoffrey Hinton. 用深度迴圈神經網路進行語音識別(Speech recognition with deep recurrent neural networks)(RNN)


[10] Graves, Alex, and Navdeep Jaitly. 面向端到端語音識別的迴圈神經網路(Towards End-To-End Speech Recognition with Recurrent Neural Networks)


[11] Sak, Haşim, et al. 語音識別中快且精準的迴圈神經網路聲學模型(Fast and accurate recurrent neural network acoustic models for speech recognition)(谷歌語音識別系統)


[12] Amodei, Dario, et al. Deep speech 2:英語和漢語的端到端語音識別(Deep speech 2: End-to-end speech recognition in english and mandarin)(百度語音識別系統)


[13] W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, G. Zweig,在對話語音識別中實現人類平等(Achieving Human Parity in Conversational Speech Recognition) (最先進的語音識別技術,微軟)


當你讀完了上面給出的論文,你會對深度學習歷史有一個基本的瞭解,深度學習建模的基本架構(包括了 CNN,RNN,LSTM)以及深度學習如何可以被應用於影像和語音識別問題。下面的論文會讓你對深度學習方法,不同應用領域中的深度學習技術和其侷限有深度認識。我建議你可以基於自己的興趣和研究方向選擇下面這些論文。


2 深度學習方法


2.1 模型


[14] Hinton, Geoffrey E., et al. 透過避免特徵檢測器的共適應來改善神經網路(Improving neural networks by preventing co-adaptation of feature detectors)(Dropout)


[15] Srivastava, Nitish, et al. Dropout:一種避免神經網路過度擬合的簡單方法(Dropout: a simple way to prevent neural networks from overfitting)


[16] Ioffe, Sergey, and Christian Szegedy. Batch normalization:透過減少內部協變數加速深度網路訓練(Batch normalization: Accelerating deep network training by reducing internal covariate shift)(2015 年一篇傑出論文)


[17] Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton.層歸一化(Layer normalization)(批歸一化的升級版)


[18] Courbariaux, Matthieu, et al. 二值神經網路:訓練神經網路的權重和啟用約束到正 1 或者負 1(Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to+ 1 or−1)(新模型,快)


[19] Jaderberg, Max, et al. 使用合成梯度的解耦神經介面(Decoupled neural interfaces using synthetic gradients)(訓練方法的發明,令人驚歎的文章)


[20] Chen, Tianqi, Ian Goodfellow, and Jonathon Shlens. Net2net:透過知識遷移加速學習(Net2net: Accelerating learning via knowledge transfer) (修改之前的訓練網路以減少訓練)


[21] Wei, Tao, et al. 網路形態(Network Morphism)(修改之前的訓練網路以減少訓練 epoch)


2.2 最佳化


[22] Sutskever, Ilya, et al. 有關深度學習中初始化與動量因子的研究(On the importance of initialization and momentum in deep learning) (動量因子最佳化器)


[23] Kingma, Diederik, and Jimmy Ba. Adam:隨機最佳化的一種方法(Adam: A method for stochastic optimization)(可能是現在用的最多的一種方法)


[24] Andrychowicz, Marcin, et al. 透過梯度下降學習梯度下降(Learning to learn by gradient descent by gradient descent) (神經最佳化器,令人稱奇的工作)


[25] Han, Song, Huizi Mao, and William J. Dally. 深度壓縮:透過剪枝、量子化訓練和霍夫曼程式碼壓縮深度神經網路(Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding) (ICLR 最佳論文,來自 DeePhi 科技初創公司,加速 NN 執行的新方向)


[26] Iandola, Forrest N., et al. SqueezeNet:帶有 50x 更少引數和小於 1MB 模型大小的 AlexNet-層級精確度(SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 1MB model size.) (最佳化 NN 的另一個新方向,來自 DeePhi 科技初創公司)


2.3 無監督學習/深度生成模型


[27] Le, Quoc V. 透過大規模無監督學習構建高階特徵(Building high-level features using large scale unsupervised learning.) (里程碑,吳恩達,谷歌大腦,貓)


[28] Kingma, Diederik P., and Max Welling. 自動編碼變異貝葉斯(Auto-encoding variational bayes.) (VAE) 


[29] Goodfellow, Ian, et al. 生成對抗網路(Generative adversarial nets.)(GAN, 超酷的想法)


[30] Radford, Alec, Luke Metz, and Soumith Chintala. 帶有深度捲曲生成對抗網路的無監督特徵學習(Unsupervised representation learning with deep convolutional generative adversarial networks.)(DCGAN)


[31] Gregor, Karol, et al. DRAW:一個用於影像生成的迴圈神經網路(DRAW: A recurrent neural network for image generation.) (值得注意的 VAE,傑出的工作)


[32] Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. 畫素迴圈神經網路(Pixel recurrent neural networks.)(畫素 RNN)


[33] Oord, Aaron van den, et al. 使用畫素 CNN 解碼器有條件地生成影像(Conditional image generation with PixelCNN decoders.) (畫素 CNN)


2.4 RNN/序列到序列模型


[34] Graves, Alex. 帶有迴圈神經網路的生成序列(Generating sequences with recurrent neural networks.)(LSTM, 非常好的生成結果,展示了 RNN 的力量)


[35] Cho, Kyunghyun, et al. 使用 RNN 編碼器-解碼器學習片語表徵用於統計機器翻譯(Learning phrase representations using RNN encoder-decoder for statistical machine translation.) (第一個序列到序列論文)


[36] Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. 運用神經網路的序列到序列學習(Sequence to sequence learning with neural networks.」)(傑出的工作)


[37] Bahdanau, Dzmitry, KyungHyun Cho, and Yoshua Bengio. 透過共同學習來匹配和翻譯神經機器翻譯(Neural Machine Translation by Jointly Learning to Align and Translate.)


[38] Vinyals, Oriol, and Quoc Le. 一個神經對話模型(A neural conversational model.)(聊天機器人上的序列到序列)


2.5 神經圖靈機


[39] Graves, Alex, Greg Wayne, and Ivo Danihelka. 神經圖靈機器(Neural turing machines.)arXiv preprint arXiv:1410.5401 (2014). (未來計算機的基本原型)


[40] Zaremba, Wojciech, and Ilya Sutskever. 強化學習神經圖靈機(Reinforcement learning neural Turing machines.)


[41] Weston, Jason, Sumit Chopra, and Antoine Bordes. 記憶網路(Memory networks.)


[42] Sukhbaatar, Sainbayar, Jason Weston, and Rob Fergus. 端到端記憶網路(End-to-end memory networks.)


[43] Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly. 指示器網路(Pointer networks.)


[44] Graves, Alex, et al. 使用帶有動力外部記憶體的神經網路的混合計算(Hybrid computing using a neural network with dynamic external memory.)(里程碑,結合上述論文的思想)


2.6 深度強化學習


[45] Mnih, Volodymyr, et al. 使用深度強化學習翫 atari 遊戲(Playing atari with deep reinforcement learning.) (第一篇以深度強化學習命名的論文)


[46] Mnih, Volodymyr, et al. 透過深度強化學習達到人類水準的控制(Human-level control through deep reinforcement learning.) (里程碑)


[47] Wang, Ziyu, Nando de Freitas, and Marc Lanctot. 用於深度強化學習的決鬥網路架構(Dueling network architectures for deep reinforcement learning.) (ICLR 最佳論文,偉大的想法 )


[48] Mnih, Volodymyr, et al. 用於深度強化學習的非同步方法(Asynchronous methods for deep reinforcement learning.) (當前最先進的方法)


[49] Lillicrap, Timothy P., et al. 運用深度強化學習進行持續控制(Continuous control with deep reinforcement learning.) (DDPG) 


[50] Gu, Shixiang, et al. 帶有模型加速的持續深層 Q-學習(Continuous Deep Q-Learning with Model-based Acceleration.)


[51] Schulman, John, et al. 信賴域策略最佳化(Trust region policy optimization.) (TRPO)


[52] Silver, David, et al. 使用深度神經網路和樹搜尋掌握圍棋遊戲(Mastering the game of Go with deep neural networks and tree search.) (阿爾法狗)


2.7 深度遷移學習/終身學習/尤其對於 RL


[53] Bengio, Yoshua. 表徵無監督和遷移學習的深度學習(Deep Learning of Representations for Unsupervised and Transfer Learning.) (一個教程)


[54] Silver, Daniel L., Qiang Yang, and Lianghao Li. 終身機器學習系統:超越學習演算法(Lifelong Machine Learning Systems: Beyond Learning Algorithms.) (一個關於終生學習的簡要討論)


[55] Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. 提取神經網路中的知識(Distilling the knowledge in a neural network.) (教父的工作) 


[56] Rusu, Andrei A., et al. 策略提取(Policy distillation.) (RL 領域)


[57] Parisotto, Emilio, Jimmy Lei Ba, and Ruslan Salakhutdinov. 演員模仿:深度多工和遷移強化學習(Actor-mimic: Deep multitask and transfer reinforcement learning.) (RL 領域)


[58] Rusu, Andrei A., et al. 漸進神經網路(Progressive neural networks.)(傑出的工作,一項全新的工作)


2.8 一次性深度學習


[59] Lake, Brenden M., Ruslan Salakhutdinov, and Joshua B. Tenenbaum. 透過機率程式歸納達到人類水準的概念學習(Human-level concept learning through probabilistic program induction.)(不是深度學習,但是值得閱讀)


[60] Koch, Gregory, Richard Zemel, and Ruslan Salakhutdinov. 用於一次影像識別的孿生神經網路(Siamese Neural Networks for One-shot Image Recognition.)


[61] Santoro, Adam, et al. 用記憶增強神經網路進行一次性學習(One-shot Learning with Memory-Augmented Neural Networks ) (一個一次性學習的基本步驟)


[62] Vinyals, Oriol, et al. 用於一次性學習的匹配網路(Matching Networks for One Shot Learning.)


[63] Hariharan, Bharath, and Ross Girshick. 少量視覺物體識別(Low-shot visual object recognition.)(走向大資料的一步)


3 應用


3.1 NLP(自然語言處理)


[1] Antoine Bordes, et al. 開放文字語義分析的詞和意義表徵的聯合學習(Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing.)


[2] Mikolov, et al. 詞和短語及其組合性的分散式表徵(Distributed representations of words and phrases and their compositionality.) (word2vec)


[3] Sutskever, et al. 運用神經網路的序列到序列學習(Sequence to sequence learning with neural networks.)


[4] Ankit Kumar, et al. 問我一切:動態記憶網路用於自然語言處理(Ask Me Anything: Dynamic Memory Networks for Natural Language Processing.)


[5] Yoon Kim, et al. 角色意識的神經語言模型(Character-Aware Neural Language Models.)


[6] Jason Weston, et al. 走向人工智慧-完成問題回答:一組前提玩具任務(Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks.) (bAbI 任務)


[7] Karl Moritz Hermann, et al. 教機器閱讀和理解(Teaching Machines to Read and Comprehend.)(CNN/每日郵件完形風格問題)


[8] Alexis Conneau, et al. 非常深度捲曲網路用於自然語言處理(Very Deep Convolutional Networks for Natural Language Processing.) (在文字分類中當前最好的)


[9] Armand Joulin, et al. 詭計包用於有效文字分類(Bag of Tricks for Efficient Text Classification.)(比最好的差一點,但快很多)


3.2 目標檢測


[1] Szegedy, Christian, Alexander Toshev, and Dumitru Erhan. 深度神經網路用於目標檢測(Deep neural networks for object detection.)


[2] Girshick, Ross, et al. 富特徵層級用於精確目標檢測和語義分割(Rich feature hierarchies for accurate object detection and semantic segmentation.)(RCNN)


[3] He, Kaiming, et al. 深度捲曲網路的空間金字塔池用於視覺識別(Spatial pyramid pooling in deep convolutional networks for visual recognition.) (SPPNet)


[4] Girshick, Ross. 快速的迴圈捲曲神經網路(Fast r-cnn.)


[5] Ren, Shaoqing, et al. 更快的迴圈捲曲神經網路:透過區域建議網路趨向實時目標檢測(Faster R-CNN: Towards real-time object detection with region proposal networks.)


[6] Redmon, Joseph, et al. 你只看到一次:統一實時的目標檢測(You only look once: Unified, real-time object detection.) (YOLO, 傑出的工作,真的很實用)


[7] Liu, Wei, et al. SSD:一次性多盒探測器(SSD: Single Shot MultiBox Detector.)


3.3 視覺跟蹤


[1] Wang, Naiyan, and Dit-Yan Yeung. 學習視覺跟蹤用的一種深度壓縮圖象表示(Learning a deep compact image representation for visual tracking.) (第一篇使用深度學習進行視覺跟蹤的論文,DLT 跟蹤器)


[2] Wang, Naiyan, et al. 為穩定的視覺跟蹤傳輸豐富特徵層次(Transferring rich feature hierarchies for robust visual tracking.)(SO-DLT)


[3] Wang, Lijun, et al. 用全卷積網路進行視覺跟蹤(Visual tracking with fully convolutional networks.) (FCNT)


[4] Held, David, Sebastian Thrun, and Silvio Savarese. 用深度迴歸網路以 100FPS 速度跟蹤(Learning to Track at 100 FPS with Deep Regression Networks.) (GOTURN, 作為一個深度神經網路,其速度非常快,但是相較於非深度學習方法還是慢了很多)


[5] Bertinetto, Luca, et al. 物件跟蹤的全卷積 Siamese 網路(Fully-Convolutional Siamese Networks for Object Tracking.) (SiameseFC, 實時物件追蹤的最先進技術)


[6] Martin Danelljan, Andreas Robinson, Fahad Khan, Michael Felsberg. 超越相關濾波器:學習連續卷積運算元的視覺追蹤(Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking.)(C-COT)


[7] Nam, Hyeonseob, Mooyeol Baek, and Bohyung Han. 在視覺跟蹤的樹結構中傳遞卷積神經網路與建模(Modeling and Propagating CNNs in a Tree Structure for Visual Tracking.)(VOT2016 Winner,TCNN)


3.4 影像說明


[1] Farhadi,Ali,etal. 每幅圖都講述了一個故事:從影像中生成句子(Every picture tells a story: Generating sentences from images.)


[2] Kulkarni, Girish, et al. 兒語:理解並生成影像的描述(talk: Understanding and generating image descriptions.)


[3] Vinyals, Oriol, et al. 展示與表達:一個神經影像字幕生成器(Show and tell: A neural image caption generator)


[4] Donahue, Jeff, et al. 視覺認知和描述的長期遞迴卷積網路(Long-term recurrent convolutional networks for visual recognition and description)


[5] Karpathy, Andrej, and Li Fei-Fei. 產生影像描述的深層視覺語義對齊(Deep visual-semantic alignments for generating image descriptions)


[6] Karpathy, Andrej, Armand Joulin, and Fei Fei F. Li. 雙向影像句對映的深片段嵌入(Deep fragment embeddings for bidirectional image sentence mapping)


[7] Fang, Hao, et al. 從字幕到視覺概念,從視覺概念到字幕(From captions to visual concepts and back)


[8] Chen, Xinlei, and C. Lawrence Zitnick. 影像字幕生成的遞迴視覺表徵學習「Learning a recurrent visual representation for image caption generation


[9] Mao, Junhua, et al. 使用多模型遞迴神經網路(m-rnn)的深度字幕生成(Deep captioning with multimodal recurrent neural networks (m-rnn).)


[10] Xu, Kelvin, et al. 展示、參與與表達:視覺注意的神經影像字幕生成(Show, attend and tell: Neural image caption generation with visual attention.)


3.5 機器翻譯


一些里程碑式的論文在 RNN \序列到序列的主題分類下被列舉。


[1] Luong, Minh-Thang, et al. 神經機器翻譯中生僻詞問題的處理(Addressing the rare word problem in neural machine translation.)


[2] Sennrich, et al. 帶有子詞單元的生僻字神經機器翻譯(Neural Machine Translation of Rare Words with Subword Units)


[3] Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. 基於注意力的神經機器翻譯的有效途徑(Effective approaches to attention-based neural machine translation.)


[4] Chung, et al. 一個機器翻譯無顯式分割的字元級解碼器(A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation)


[5] Lee, et al. 無顯式分割的全字元級神經機器翻譯(Fully Character-Level Neural Machine Translation without Explicit Segmentation)


[6] Wu, Schuster, Chen, Le, et al. 谷歌的神經機器翻譯系統:彌合人與機器翻譯的鴻溝(Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation)


3.6 機器人


[1] Koutník, Jan, et al. 發展用於視覺強化學習的大規模神經網路(Evolving large-scale neural networks for vision-based reinforcement learning.)


[2] Levine, Sergey, et al. 深度視覺眼肌運動策略的端到端訓練(End-to-end training of deep visuomotor policies.)


[3] Pinto, Lerrel, and Abhinav Gupta. 超大尺度自我監督:從 5 萬次嘗試和 700 機器人小時中學習抓取(Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours.)


[4] Levine, Sergey, et al. 學習手眼協作用於機器人掌握深度學習和大資料蒐集(Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection.)


[5] Zhu, Yuke, et al. 使用深度強化學習視覺導航目標驅動的室內場景(Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning.)


[6] Yahya, Ali, et al. 使用分散式非同步引導策略搜尋進行集體機器人增強學習(Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search.)


[7] Gu, Shixiang, et al. 深度強化學習用於機器操控(Deep Reinforcement Learning for Robotic Manipulation.)


[8] A Rusu, M Vecerik, Thomas Rothörl, N Heess, R Pascanu, R Hadsell. 模擬實機機器人使用過程網從畫素中學習(Sim-to-Real Robot Learning from Pixels with Progressive Nets.)


[9] Mirowski, Piotr, et al. 學習在複雜環境中導航(Learning to navigate in complex environments.)


3.7 藝術


[1] Mordvintsev, Alexander; Olah, Christopher; Tyka, Mike (2015). 初始主義:神經網路的更深層(Inceptionism: Going Deeper into Neural Networks)(谷歌 Deep Dream) 


[2] Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. 一個藝術風格的神經演算法(A neural algorithm of artistic style.) (傑出的工作,目前最成功的演算法) 


[3] Zhu, Jun-Yan, et al. 自然影像流形上的生成視覺操縱(Generative Visual Manipulation on the Natural Image Manifold.)


[4] Champandard, Alex J. Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks. (神經塗鴉) 


[5] Zhang, Richard, Phillip Isola, and Alexei A. Efros. 多彩的影像彩色化(Colorful Image Colorization.)


[6] Johnson, Justin, Alexandre Alahi, and Li Fei-Fei. 實時風格遷移和超解析度的感知損失(Perceptual losses for real-time style transfer and super-resolution.)


[7] Vincent Dumoulin, Jonathon Shlens and Manjunath Kudlur. 一個藝術風格的學習表徵(A learned representation for artistic style.)


[8] Gatys, Leon and Ecker, et al. 神經風格遷移中的控制感知因子(Controlling Perceptual Factors in Neural Style Transfer.) (控制空間定位、色彩資訊和全空間尺度方面的風格遷移)


[9] Ulyanov, Dmitry and Lebedev, Vadim, et al. 紋理網路:紋理和風格化影像的前饋合成(Texture Networks: Feed-forward Synthesis of Textures and Stylized Images.) (紋理生成和風格遷移)


3.8 物件分割


[1] J. Long, E. Shelhamer, and T. Darrell, 用於語義分割的全卷積網路(Fully convolutional networks for semantic segmentation)


[2] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. 具有深度卷積網路和全連線的條件隨機場的語義影像分割(Semantic image segmentation with deep convolutional nets and fully connected crfs)


[3] Pinheiro, P.O., Collobert, R., Dollar, P. 學習如何分割候選物件(Learning to segment object candidates)


[4] Dai, J., He, K., Sun, J. 基於多工網路級聯的例項感知語義分割(Instance-aware semantic segmentation via multi-task network cascades)


[5] Dai, J., He, K., Sun, J. 例項敏感的全卷積網路(Instance-sensitive Fully Convolutional Networks)

相關文章