3.1 Neural Network Overview

- Logistic regression
- Backward calculation
3.2 Neural Network Representation

2 Layer NN (input layer is ignored)
- Input layer - vector x
- Hidden layer (Params w and b)
- output layer - y-hat predict value (params w and b)
3.3 Computing a neural network output
Repeat logistic regression computation


Circle represent two steps:
- z = wx + b
- a = sigmoid(z)
Subscript represent node index
3.4 Vectorizing across multiple examples

Given m training examples, for the ith(i > 2) layer:
- input: last layer output
- apply formula:
- z = wx + b
- a = sigmoid(z)
Notations:
- Square brackets i: layer i
- Round brackets i: training example i
3.5 Explanation for vectorized implementation

- Simplify the justification, ignore b, b = 0
- wx(i) should get a column vector
- z[i]: one column corresponds to one example z value at ith layer

For multiple layer NN, just do the two steps repeatly.
3.6 One hidden layer NN - Activation function

- Use g(z) represent activation function
- Sigmoid function is an activation function
- tanh function, the hyperbolic tangent function, range from -1 to 1. centered 0, if z is too large or too small, the slope would be very small
- tanh function always works better than sigmoid function, because the mean is close to 0, has the effect of centering the data
- activation function can be different in different layer
- use square bracket to represent different layer

Pros and cons:
- sigmoid function: never use it except for the output layer if you are doing binary classification
- tanh function: always works better than sigmoid function
- reLu function: default one
- leak reLu function: a = max(0.01z, z)