1.7 When to change dev/test sets and metrics

Example: Cat dataset

Metric: classification error
Algorithm A: 3% error porn images categorized as cat
Algorithm B: 5% error no porn images

Algorithm A is better than B, but user prefer B because it not letting through pornographic images

Orthogonalization for cat pictures: anti-porn

We should take a machine learning problem and break it into distinct steps.

How to define a metric to evaluate classifiers
Worry separately about how to do well on this metric

Another example:

Algorithm A: 3% error Algorithm B: 5% error

Dev/test

Dev/train set: Training on high resolution images But user images are low resolution and less well framed, B performs better than A

Overall guideline: your current metric and data you are evaluating on doesn't corresponding on what you actually care about, then change your metric and test/dev sets, to better capture what you need your algorithm to actually do well on.

day3

1.7 When to change dev/test sets and metrics

results matching ""

No results matching ""