2.8 Adam Optimization Algorithm

Adam: Adaptive moment estimation Put RMSprop and momentum together

2.9 Learning rate decay

  • Gradient descent step is noisy
  • The line is into minimum.
  • Learning rate should be smaller when close to minimum
How to decrease learning rate alpha

1 epoch = 1 pass through dot

  • manual decay, decrease learning rate hour by hour, day by day

2.10 The problem of local optima

In high dimension area, saddle point of a horse, that directive is 0

Problem of plateaus

  • Unlikely to get stuck in a bad local optima as long as your training rate is larger than your NN, and the cost function J is defined in a relative high space
  • Plateaus can make learning slow, this is where optimization algorithm can help the learning rate

results matching ""

    No results matching ""