2.3 Build your first system quickly, then iterate

Speech recognition example

Noisy background
- cafe noise
- car noise
Accented speech
Far from microphone
Young children's speech
Stuttering (ah, uh, um)
...

What you should do to quickly build a ML system:

Set up dev/test set and metric
Build initial system quickly (maybe a quick & dirty implementation)
Use bias/variance analysis to prioritize next step

2.4 Mismatched training and dev/test data

Training and testing on different distributions

2.4.1 Cat app example

datasource	number	comment
Data from webpage	200,000	high resolution, large amount
Data from mobile app	10,000	blurry, low amount

Solutions:

Dev set is aimed at target. the target of this system is to recognize cat from mobile taken photos

Option	Advantage	Disadvantage	Comment
Put them together, randomly shuffle them into a train, dev, and test set	dev/test/train sets from same distribution	you may spend much time on optimize system to recognize cat from webpage image	not recommended
training set: 20,000 from web, dev/test: 10,000 all images from mobile	dev set is aiming target	Training distribution is different from dev/test	recommended

2.4.2 Speech recognition example

Build a speech activated rearview mirror for a car

Training	Dev/test
purchased data, Smart speaker control, Voice keyboard	Speech activated rearview mirror

Conclusion: Always use the aimed target data as dev/test set

2.5 Mismatched training and dev/test data

Bias and Variance with mismatched data distributions

2.5.1 Cat classifier example

Assume human get almost 0% error (Bayers error)

Error analysis: Training error 1% Dev error 10%

It only reflects that dev set contains image that are much more different to classify accurately.

Two things changed when change from train set to dev set

The trained system never saw dev set data before
Train and dev/test sets from different distributions

Training-dev set: Same distribution as training set, but not used for training

Training error	Training-dev set	Dev error	problem
1%	9%	10%	variance problem
1%	1.5%	10%	data mismatch
10%	11%	12%	bias, cause human error is 0%
10%	11%	20%	2 issues: bias, data mismatch

General principles: Bias/variance on mismatched training and dev/test sets

Human/bayers error < -- avoidable bias -- > Train error Train error < -- variance --- > Train-dev error
Train-dev error < -- data mismatch - > Dev/test error
Test error < -- degree of overfitting - > Dev/test error

Dev/test set should from same distribution

2.5.2 More general formulation

Speech rearview mirror

	General speech recognition	Rearview mirror specific speech data
Human Level	"Human level" 4%	6% ask human to label
Error on examples trained on	"Training error" 7%	6% train specific data
Error on examples not trained on	"Training-dev set error" 10%	Dev/Test error 6%

Conclusion: Use different data distribution for train/test/dev sets, this gives you more data to improve the performance of the algorithm, but also brings a new issue, the data mismatch.(difference between training-dev set error and dev/test set error)

day7