Powered by GitBook

2.8 Adam Optimization Algorithm

Adam: Adaptive moment estimation Put RMSprop and momentum together

2.9 Learning rate decay

Gradient descent step is noisy
The line is into minimum.
Learning rate should be smaller when close to minimum

How to decrease learning rate alpha

1 epoch = 1 pass through dot

manual decay, decrease learning rate hour by hour, day by day

2.10 The problem of local optima

In high dimension area, saddle point of a horse, that directive is 0

Problem of plateaus

Unlikely to get stuck in a bad local optima as long as your training rate is larger than your NN, and the cost function J is defined in a relative high space
Plateaus can make learning slow, this is where optimization algorithm can help the learning rate

results matching ""

No results matching ""