[Read Paper] Improving neural networks by preventing co-adaptation of feature detectors
Title: Improving neural networks by preventing co-adaptation of feature detectors
Authors: G.E. Hinton, N. Srivastava, A. Krizhevsky et al
摘要:When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This “overfitting” is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors. Instead, each neuron learns to detect a feature that is generally helpful for producing the correct answer given the combinatorially large variety of internal contexts in which it must operate. Random “dropout” gives big improvements on many benchmark tasks and sets new records for speech and object recognition.
全文連結:http://arxiv.org/pdf/1207.0580.pdf
Note:
Main Point:引入dropout來降低過擬合(overfitting)
為什麼dropout可以降低過擬合:
[1] On each presentation of each training case, each hidden unit is randomly omitted from the network with a probability of 0.5, so a hidden unit cannot rely on other hidden units being present.
[2] Another way to view the dropout procedure is as a very efficient way of performing model averaging with neural networks.
A good way to reduce the error on the test set is to average the predictions produced by a very large number of different networks. The standard way to do this is to train many separate networks and then to apply each of these networks to the test data, but this is computationally expensive during both training and testing. Random dropout makes it possible to train a huge number of different networks in a reasonable time. There is almost certainly a different network for each presentation of each training case but all of these networks share the same weights for the hidden units that are present.
In networks with a single hidden layer of N units and a “softmax” output layer for computing the probabilities of the class labels, using the mean network is exactly equivalent to taking the geometric mean of the probability distributions over labels predicted by all 2^N possible networks. Assuming the dropout networks do not all make identical predictions, the prediction of the mean network is guaranteed to assign a higher log probability to the correct answer than the mean of the log probabilities assigned by the individual dropout networks. Similarly, for regression with linear output units, the squared error of the mean network is always better than the average of the squared errors of the dropout networks.
相關文章
- Convolutional Neural Networks(CNN)CNN
- 神經網路(neural networks)神經網路
- CNN (Convolutional Neural Networks) AbstractCNN
- 機器學習實戰(十三):Convolutional Neural Networks機器學習
- COMP9444 Neural Networks and Deep Learning
- [Paper Reading] VQ-VAE: Neural Discrete Representation Learning
- DEEP LEARNING WITH PYTORCH: A 60 MINUTE BLITZ | NEURAL NETWORKSPyTorch
- 論文解讀(GIN)《How Powerful are Graph Neural Networks》
- 論文解讀(DAGNN)《Towards Deeper Graph Neural Networks》GNN
- Delphi 論文閱讀 Delphi: A Cryptographic Inference Service for Neural Networks
- Recurrent Neural Networks(RNN) 迴圈神經網路初探RNN神經網路
- Outrageously Large Neural Networks The Sparsely-Gated Mixture-of-Experts Layer
- 卷積神經網路:Convolutional Neural Networks(CNN)卷積神經網路CNN
- DMCP: Differentiable Markov Channel Pruning for Neural Networks 閱讀筆記筆記
- 論文解讀(LA-GNN)《Local Augmentation for Graph Neural Networks》GNN
- 吳恩達 Convolutional Neural Networks第二週quizzes吳恩達UI
- 迴圈神經網路(RNN,Recurrent Neural Networks)介紹神經網路RNN
- 論文學習13“Feature Pyramid Networks for Object Detection”Object
- [Paper Reading] OFT Orthographic Feature Transform for Monocular 3D Object DetectionHOGORMMono3DObject
- ICLR2021-1:MULTI-HOP ATTENTION GRAPH NEURAL NETWORKSICLR
- 論文閱讀:《Deep Compositional Question Answering with Neural Module Networks》
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer 筆記筆記
- 遷移學習(DANN)《Domain-Adversarial Training of Neural Networks》遷移學習AI
- 論文解讀(GraphSMOTE)《GraphSMOTE: Imbalanced Node Classification on Graphs with Graph Neural Networks》
- Automatic Brain Tumor Segmentation using Cascaded Anisotropic Convolutional Neural NetworksAISegmentation
- (翻譯)DeepInspect: A Black-box Trojan Detection and Mitigation Framework for Deep Neural NetworksMITFramework
- 深度學習之卷積神經網路(Convolutional Neural Networks, CNN)(二)深度學習卷積神經網路CNN
- 【論文翻譯】MobileNets: Efficient Convolutional Neural Networks for Mobile Vision ApplicationsAPP
- Neural Networks and Deep Learning(神經網路與深度學習) - 學習筆記神經網路深度學習筆記
- 論文閱讀—第一篇《ImageNet Classification with Deep Convolutional Neural Networks》
- paper 管理
- 論文解讀(SelfGNN)《Self-supervised Graph Neural Networks without explicit negative sampling》GNN
- 論文解讀二代GCN《Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering》GCASTZedFilter
- 論文解讀(KP-GNN)《How Powerful are K-hop Message Passing Graph Neural Networks》GNN
- 論文翻譯:2018_Source localization using deep neural networks in a shallow water environment
- 論文筆記:Diffusion-Convolutional Neural Networks (傳播-卷積神經網路)筆記卷積神經網路
- 論文解讀(node2vec)《node2vec Scalable Feature Learning for Networks》
- erasure code paper