deep learing basics

C-B-LIU發表於2019-03-21
  1. accuracy: fraction of the images that were correctly classified
  2. overfitting: gap between training accuracy and test accuracy as example
  3. tensors: multidimensional Numpy arrays
  4. tensors are a generalization of matrices to an arbitrary number of dimensions
  5. scalar: 0D tensor, 0 axes, ndim == 0, rank == 0, point in hyperplane
  6. vector: 1D tensor, 1 axes, ndim == 1, axis can have many dimensions
  7. matrices: 2D tensor, 2 axes, ndim == 2, first axis is row, second axis is columns
  8. 3D tensor: ndim == 3
  9. key attributes
    1. number of axes: rank, ndim
    2. shape: how many dimensions along each axis
    3. data type: string tensor don’t exist in Numpy
  10. tensor slicing: independent slice, :means selecting the entire axis
  11. data tensors:
    1. feature data: 2D (samples, features)
    2. time series: 3D (samples, timesteps, features)
    3. image: 4D (samples, height, width, channels) (tensorflow backend)
    4. video: 5D (samples, frames, height, width, channels)
  12. tensor operation:
    1. element wise: Basic Linear Algebra Subprograms (BLAS)
    2. broadcasting: broadcast small tensor to match the shape of the larger tensor
    3. tensor dot: dimension change: (a, b, c, d) . (d, e) -> (a, b, c, e)
    4. tensor reshape: element number is not changed
    5. all tensor operations are geometric transformations of the input data
  13. tensor operations are differentiable
  14. gradient is the derivative of a tensor operation, take tensor as inputs
  15. gradient(f) (W0)is the gradient of f(W) = loss_valuein W0
  16. optimize follow the opposite direction: W1 = W0 - step*gradient(f)(W0)
  17. gradient optimization: O(N)
  18. mini-batch stochastic gradient descent: optimize on random small batch
  19. true SGD: randomly pick single sample at a time and perform optimization
  20. bath SGD: optimize on entire sample set
  21. loss surfaces: phase space of model fitting parameters
  22. momentum:
past_velocity = 0
momentum = 0.1
while loss > 0.01:
	weight, loss, gradient = get_current_parameters()
	velocity = past_velocity * momentum + learning_rate * gradient
	weight = weight + momentum * velocity - learning_rate * gradient
	past_velocity = velocity
	update_parameter(weight)
  1. back-propagation: http://galaxy.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html
  2. symbolic differentiation: tensorflow
  3. procedure:
    1. input data
    2. reshape data
    3. construct network
    4. network-compilation
    5. train: each iteration over all the training data is called an epoch
  4. layers:
    1. dense: vector data (samples, features)
    2. recurrent: sequence data (samples, timesteps, features)
    3. image: 2D convolution
  5. layer input and output: 784 input, 32 output
from keras import models
from keras import layers
model = models.Sequential()
model.add(layers.Dense(32, input_shape=(784, )))
model.add(layers.Dense(32)) # input 32
  1. topology of a network defines a hypothesis space
  2. network topology define the tensor operation series
  3. for multi-loss networks, all losses are combined via a function (average) into a single scalar quantity, for serving gradient-descent process
  4. loss function:
    1. binary crossentropy for two-class classification
    2. categorical crossentropy for many class classification
    3. mean squared error for regression problem
    4. connectionist temporal classification for sequence-learning problem

相關文章