Multi-class classification

3.8 Softmax regresison

Binary classification: label 1 or 0
Multi-class classification:

Softmax regression: A generalization of logistic regression that lets you make predictions one of multiple classes

Recognize cats, dogs, and baby chicks

C = # of classes = 4 (0, 1, 2, 3) In this case, we're going to build a neural network where the output layer has 4 output units, so n which is the units of layer L is going to equal to 4, we want each node of the output tell us the possibility of that class

Softmax layer

Softmax activation function: t = e^(z[L]) a (4, 1) dimension

Detail example

If you compute z[L], and z[L] is a 4-dimensional vector, [5, 2, -1, 3], then use the element-wise exponentiation to compute the vector t, t = [148.4m 7.4, 0.4, 20.1], t_norm = 176.3, a[l] = t/176.3
This algorithm takes the vector Z[L] and matched it to tour probabilities that sum to 1.
Summarize these steps to an activation function
The unusual thing about this particular activation function is that this activation function g, it takes us input a 4 by 1 vector and it output a 4 by 1 vector

Softmax examples

NN without hidden layer

The coloring in the input base on which one of the three outputs have the highest probability, this is like a generalization of logistic regression with sort of linear decision boundaries, but with more than 2 classes, instead of binary classes The decision boundary between the two classes is linear boundary.

3.9 Softmax regresison

Programming framework

3.10 Deep learning frameworks

Deep learning framework

Criteria of choosing deep learning framework:

Ease of programming(dev and deploy for actual use)
Running speed especially for large data set
Truly open(open source with good governance)

3.11 Tensorflow

Motivating problem

Minimize some cost function

$$ J(w) = (W - 5)^2

Forward progation

import numpy as np
import tensorflow as tf

coefficients = np.array([1.], [-10], [25.])

w = tf.Variable(0, dtype=tf.float32)
x = tf.placeholder(tf.float32, [3, 1])
cost = w**2 - 10*w + 25
cost = x[0][0]*w**2 + x[1][0]*w + x[2][0]
train = tf.train.GradientDescentOptimier(0.01).minimize(cost)

init = tf.global_variables_initializer()
session = tf.Session()
session.run(init)
print(session.run(w))

session.run(train, feed_dict=x:coefficients)
print(session.run(w))

for i in range(1000):
 session.run(train, feed_dict=x:coefficients)

print(session.run(w))

Advantages:

Tensorflow has build-in forward-progation functions, so you do not need to explicitly implement backward prop
Can switch easily between different optimization functions

Conclusion

Systematically organize the hyperparameter search,
Norm batch normalization to speed up learning process
Programming framework for deep learning

day10