End to end deep leaning
2.9 What is end-to-end deep learning
Speech recognition example
x: audio y: transcript
x ----MFCC-----> low level features ----ML----> phonemes(basic unit of sound) -------> words -------> y (transcript)
E2E audio ----------> transcript
You need large data set to make end-to-end approach work better than normal procedure
Face recognition: use face recognition to replace RFID card
- Use a software to detect human face
- Zoom in that part, crop that part
- Then feed the image in NN
In this example, why 2-steps approach works better, 2 reasons:
- Each of the two problems you're solving is much simpler
- You have a lot of data for each of the 2 subtasks
But if you want e2e approach, you don't have much data for x(camera taken photo) to y(identity).
More examples
Machine translation
English -> text -> ........-> French
e2e x(english) -> y(french) In reality, we do have a lot data for english -> french translations, e2e deep learning works quite well for machine translation.
Estimating child's age: 2 steps: Image ---separate bones----> bone ---compare avg bone length for each age with the captured bone---> age e2e: image ----> age
In this example, 2 steps approach works better cause there isn't enough data to train this task in an e2e fashion.
2.10 Whether to use end-to-end learning
Pros and cons of E2E deep learning:
Pros:
- Let data speak
- x -> y
- Sppech recognition, phoneme is created by human, human force NN to use this to recognize audio via phoneme, but if you let NN learn what it wants, it may perform better without phoneme,
- Less hand-designing of components needed
Cons:
- May need large amount of data
- Excludes potentially useful hand-designed components
- If you don't much data, hand-designed component allows you to inject manual knowledge into the algorithm
Apply e2e deep learning Key question: Do you have sufficient data to learn a function of the complexity needed to may x to y?
Drive.ai --- Self driving car
Images -> cars/pedestrians --> route --> steering
E2E: Image -> Steering
In this example, E2E is less promising than multiple steps approach.