深度學習2015年文章整理(CVPR2015)

煉丹術士發表於2016-01-20

   國內外從事計算機視覺和影像處理相關領域的著名學者都以在三大頂級會議(ICCV,CVPR和ECCV)上發表論文為榮,其影響力遠勝於一般SCI期刊論文,這三大頂級學術會議論文也引領著未來的研究趨勢。CVPR是主要的計算機視覺會議,可以把它看作是計算機視覺研究的奧林匹克。博主今天先來整理CVPR2015年的精彩文章(這個就夠很長一段時間消化的了)
   頂級會議CVPR2015參會paper網址:
http://www.cv-foundation.org/openaccess/CVPR2015.py

   來吧,一項項的開始整理,總有你需要的文章在等你!

CNN Architectures

CNN網路結構:
1.Hypercolumns for Object Segmentation and Fine-Grained Localization
Authors: Bharath Hariharan, Pablo Arbeláez, Ross Girshick, Jitendra Malik

2.Modeling Local and Global Deformations in Deep Learning: Epitomic Convolution, Multiple Instance Learning, and Sliding Window Detection
Authors: George Papandreou, Iasonas Kokkinos, Pierre-André Savalle

3.Going Deeper With Convolutions
Authors: Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich
這篇文章推薦一下,使用了《network in network》中的用 global averaging pooling layer 替代 fully-connected layer的思想。有看過的可以私信博主,一起討論文章心得。

4.Improving Object Detection With Deep Convolutional Networks via Bayesian Optimization and Structured Prediction
Authors: Yuting Zhang, Kihyuk Sohn, Ruben Villegas, Gang Pan, Honglak Lee

5.Deep Neural Networks Are Easily Fooled: High Confidence Predictions for Unrecognizable Images
Authors: Anh Nguyen, Jason Yosinski, Jeff Clune

Action and Event Recognition

1.Deeply Learned Attributes for Crowded Scene Understanding
Authors: Jing Shao, Kai Kang, Chen Change Loy, Xiaogang Wang

2.Modeling Video Evolution for Action Recognition
Authors: Basura Fernando, Efstratios Gavves, José Oramas M., Amir Ghodrati, Tinne Tuytelaars

3.Joint Inference of Groups, Events and Human Roles in Aerial Videos
Authors: Tianmin Shu, Dan Xie, Brandon Rothrock, Sinisa Todorovic, Song Chun Zhu

Segmentation in Images and Video

1.Causal Video Object Segmentation From Persistence of Occlusions
Authors: Brian Taylor, Vasiliy Karasev, Stefano Soatto

2.Fully Convolutional Networks for Semantic Segmentation
Authors: Jonathan Long, Evan Shelhamer, Trevor Darrell
——文章把全連線層當做卷積層,也用來輸出featuremap。這樣相比Hypercolumns/HED 這樣的模型,可遷移的模型層數(指VGG16/Alexnet等)就更多了。但是從文章來看,因為純卷積嘛,所以featuremap的每個點之間沒有位置資訊的區分。相較於Hypercolumns的claim,鼻子的點出現在影像的上半部分可以劃分為pedestrian類的畫素,但是如果出現在下方就應該劃分為背景。所以位置資訊應該是挺重要需要考慮的。這也許是速度與效能的trade-off?

3.Is object localization for free - Weakly-supervised learning with convolutional neural networks
——弱監督做object detection的文章。首先fc layer當做conv layer與上面這篇文章思想一致。同時把最後max pooling之前的feature map看做包含class localization的資訊,只不過從第五章“Does adding object-level supervision help classification”的結果看,效果雖好,但是這一物理解釋可能不夠完善。

4.Shape-Tailored Local Descriptors and Their Application to Segmentation and Tracking
Authors: Naeemullah Khan, Marei Algarni, Anthony Yezzi, Ganesh Sundaramoorthi

5.Deep Filter Banks for Texture Recognition and Segmentation
Authors: Mircea Cimpoi, Subhransu Maji, Andrea Vedaldi

6.Deeply learned face representations are sparse, selective, and robust, Yi Sun, Xiaogang Wang, Xiaoou Tang
——DeepID系列之DeepID2+。在DeepID2之上的改進是增加了網路的規模(feature map數目),另外每一層都接入一個全連通層加supervision。最精彩的地方應該是後面對神經元效能的分析,發現了三個特點:1.中度稀疏最大化了區分性,並適合二值化;2.身份和attribute選擇性;3.對遮擋的魯棒性。這三個特點在模型訓練時都沒有顯示或隱含地強加了約束,都是CNN自己學的。

Image and Video Processing and Restoration

1.Fast and Flexible Convolutional Sparse Coding
Authors: Felix Heide, Wolfgang Heidrich, Gordon Wetzstein

2.What do 15,000 Object Categories Tell Us About Classifying and Localizing Actions?
Authors: Mihir Jain, Jan C. van Gemert, Cees G. M. Snoek
——物品的分類對行為檢測有幫助作用。這篇文章是第一篇關於這個話題進行探討的,是個深坑,大家可以關注一下,考慮佔坑。

3.Hypercolumns for Object Segmentation and Fine-Grained Localization
Authors:Bharath Hariharan, Pablo Arbeláez, Ross Girshick, Jitendra Malik
——一個很好的思路!以前的CNN或者R-CNN,我們總是用最後一層作為class label,倒數第二層作為feature。這篇文章的作者想到利用每一層的資訊。因為對於每一個pixel來講,在所有層數上它都有被激發和不被激發兩種態,作者利用了每一層的激發態作為一個feature vector來幫助自己做精細的物體檢測。

3D Models and Images

1.The Stitched Puppet: A Graphical Model of 3D Human Shape and Pose
Authors: Silvia Zuffi, Michael J. Black

2.3D Shape Estimation From 2D Landmarks: A Convex Relaxation Approach
Authors: Xiaowei Zhou, Spyridon Leonardos, Xiaoyan Hu, Kostas Daniilidis

Images and Language

這個類別的文章需要好好看看,對思路的發散很有幫助

1.Show and Tell: A Neural Image Caption Generator
Authors: Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan

2.Deep Visual-Semantic Alignments for Generating Image Descriptions
Authors: Andrej Karpathy, Li Fei-Fei

3.Long-Term Recurrent Convolutional Networks for Visual Recognition and Description
Authors: Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell

4.Becoming the Expert - Interactive Multi-Class Machine Teaching
Authors: Edward Johns, Oisin Mac Aodha, Gabriel J. Brostow

其它

參考文獻一:CNN卷積神經網路的改進(15年最新paper):
http://blog.csdn.net/u010402786/article/details/50499864
文章中的四篇文章也值得一讀,其中一篇在上面出現過。一定要自己下載下來看一看。
參考文獻二:這是另外一個博主的部落格,也是對CVPR的文章進行了整理:
http://blog.csdn.net/jwh_bupt/article/details/46916653

基本許多文章裡面沒有註釋核心思想,接下來慢慢補充。2016-01-20

相關文章