1.8 Comparing to human level performance
2 reasons compare human with ml system:
1. The ML algorithms are working much better and become much more feasible in a lot of application areas.
2. It turns out the workflow of designing and building machine learning system, the workflow is much more efficient when you're trying to do something that humans can also do
Bayes Optima Error: the very best theoretical function for mapping from x to y that can never be surpress

Why compare to human-level performance: Humans are quite good at a lot of tasks, so long as ML is worse than humans, you can:
1. Get labeled data from humans
2. Gain insights from manual error analysis:
Why did a person get this right?
3. Better analysis of bias/variance
1.9 Avoidable bias
| Class | error rate 1 | error rate 2 | avoidable bias for example 2 |
|---|---|---|---|
| Human | 1% | 7.5% | 0 |
| Training | 8% | 8% | 0.5% |
| Dev error | 10% | 10% | 2.5% |
Think human-level error as a estimate for Bayes error for Bayes optimal error
Solution for error rate 1: Focus on reducing bias, make training set close to human-level error
Solution for error rate 2: Focus on reducing variance, make training and dev set error rate close to each other
1.10 Understanding human-level performance
Human-level error as a proxy for Bayers error
Medical images classification example:
Suppose:
| class | error rate |
|---|---|
| Typical human | 3% error |
| Typical doctor | 1% error |
| Experienced Doctor | 0.7% error |
| Team of experienced doctors | 0.5% error |
Then human-level error should be <= 0.5% error
Error analysis example Human proxy for Bayers 0.5%
Training error 5% Dev error 6%
Variance: 1% Bias: 4.5% So focus on bias error
Summary of bias/variance with human-level performance
Human-level error (proxy for Bayes error)
Training error
Dev error
Error between human-level and training is avoidable bias Error between training and dev error is variance
1.11 Surpassing human-level performance
Problem where ML significantly surpasses human-level performance
- Online AD
- Product recommendation
- Logistics (predicting transit time)
- Loan approvals
Common: Learning from structural data Not natural perception problem which human is good at Lots of data
another areas that overpass human:
- speech recognition system
- Some computer vision, image processing
- Some medical tasks
- Reading ECGs or diagnosing skin cancer, or certain narrow radiology task
1.12 Comparing to human-level performance
Getting a supervised learning algorithm to work well means fundamentally assuming you can do 2 things:
1. Can fit the training set pretty well = achieve low avoidable variance
2. The training set performance generalized pretty well to the dev/test set
Reducing (avoidable) bias and variance
Avoidable bias: Human Level <--> Training error Variance: Training error < -- > Dev error
| Class | error | methods |
|---|---|---|
| Human Level | ||
| Avoidable bias | 1. Train bigger model 2. Train longer/better optimization algorithms 3. NN architecture/hyperparameters search | |
| Training Error | ||
| Variance | 1. More data 2. Regularization, L2 and Dropout and Data augmentation 3. hyperparameters | |
| Dev Error |