Bluemont Labs

data-driven decisions using models, simulation, and analysis

Statistical Classification Metrics

If you work with statistics or quantitative models, you might have run across a variety of terms: sensitivity, recall, power, precision, type I error, type II error, accuracy, specificity, and fall out.

If you want to speak specifically (not vaguely) about the performance of a binary classifier, you’ll want to become fluent. Or, at the least, you’ll want to have a reference at-the-ready.

Speaking of terminology, we’re just getting warmed up! There are more terms and symbols: α, β, true positive rate, false positive rate, false negative rate, true negative rate, positive predictive value, negative predictive value, true discovery rate, and false discovery rate.

Many of these are synonyms; but various communities have different preferences. Statisticians, machine learning practitioners, and academic researchers vary in their preferred terms.

Here, I offer a visual summary of how these metrics relate and how you calculate them, all without a single equation.

It shows how various metrics tell different slices of the story. I hope you find it useful.

To make the most of it, you’ll need to understand the confusion matrix. With that knowledge, you can then see that each metric is a ratio of two numbers, each drawn from some part of a confusion matrix.

The Omnigraffle version is available on Github along with a PDF Version. All versions are licensed under the GPL v3, meaning that you can use it commercially with attribution. If you modify, please share your changes back. I welcome your suggestions via Github Issues or a pull request.