If you work with statistics or quantitative models, you might have run across a variety of terms: sensitivity, recall, power, precision, type I error, type II error, accuracy, specificity, and fall out.
If you want to speak specifically (not vaguely) about the performance of a binary classifier, you’ll want to become fluent. Or, at the least, you’ll want to have a reference at-the-ready.
Speaking of terminology, we’re just getting warmed up! There are more terms and symbols: α, β, true positive rate, false positive rate, false negative rate, true negative rate, positive predictive value, negative predictive value, true discovery rate, and false discovery rate.
Many of these are synonyms; but various communities have different preferences. Statisticians, machine learning practitioners, and academic researchers vary in their preferred terms.
Here, I offer a visual summary of how these metrics relate and how you calculate them, all without a single equation.
It shows how various metrics tell different slices of the story. I hope you find it useful.
To make the most of it, you’ll need to understand the confusion matrix. With that knowledge, you can then see that each metric is a ratio of two numbers, each drawn from some part of a confusion matrix.
The Omnigraffle version is available on Github along with a PDF Version. All versions are licensed under the GPL v3, meaning that you can use it commercially with attribution. If you modify, please share your changes back. I welcome your suggestions via Github Issues or a pull request.