2.1 Binary Classification

Example:
Given an image as input return label 1 if it is cat return label 0 otherwise
Goal: Train a classifier that the input is an image represented by a feature vector, x, and predicts whether the corresponding lable y is 1 or 0
Notations:

- x: input feature vector
- n: dimension of x, in this case, equals to 64 * 64 * 3 = 12288
- m: # of training example
- X: an nx * m demensional matrix
- Y: m dimension matrix
2.2 logistic Regression
Definition
An algorithm used in a supervised learning problem when the output y are all either 0 or 1, The goal of the logistic regression is to minimized the errors between its predictions and training data
- x: input, nx dimension vector
- w: parameters of logistic regression, nx dimension vector
- b: a real number, interceptor
- y: output
Linear regression
$$ y = w * x + b
$$
but y should be between 0 and 1, so this algorithm can't fit the model
Logistic regression

$$ y = sigmoid(w * x + b)
$$
sigmoid function:
- -infinity: close to 0
- 0: 0.5
- infinity: close to 1
2.3 Cost function

- m: # of training sets
- i: ith training example
- y-hat(i): The prediction on taining sample (i)
- Loss function: To measure how good the output y-hat is when true label is y. (For single training example)
- Cost Function: average of loss function sum. (Apply to all training samples)
2.4 Gradient descent algorithm

Cost function: convex function, looks like a bowl
Approach: Started from random initial point, then takes a step in the steepest downhill direction, this is one iteration of gradient descent, after n iterations, finally you converge to the global optimum.
2.5 Derivatives
2.6 More derivative example
Already learned, Skipped
2.7 Computation Graph
- Forward propagation step: compute the output of the neural network
- Backward propagation step: compute gradients or compute derivatives

- step1: u = bc
- step2: v = a + u
- step3: J = 3v
2.8 Computation Graph - Computing derivatives
- Derivative - backward the computation graph
- dJ / dv = 3
- dJ / da = 3 = (dJ / dv)*(dv / da)
In python: d finalOutputVar / d var -> dvar