2.8 Adam Optimization Algorithm
Adam: Adaptive moment estimation Put RMSprop and momentum together

2.9 Learning rate decay
- Gradient descent step is noisy
- The line is into minimum.
- Learning rate should be smaller when close to minimum

How to decrease learning rate alpha
1 epoch = 1 pass through dot

- manual decay, decrease learning rate hour by hour, day by day
2.10 The problem of local optima
In high dimension area, saddle point of a horse, that directive is 0

Problem of plateaus
- Unlikely to get stuck in a bad local optima as long as your training rate is larger than your NN, and the cost function J is defined in a relative high space
- Plateaus can make learning slow, this is where optimization algorithm can help the learning rate