Bluemont Labs

data-driven decisions using models, simulation, and analysis

Statistical Classification Metrics

If you work with statistics or quantitative models, you might have run across a variety of terms: sensitivity, recall, power, precision, type I error, type II error, accuracy, specificity, and fall out.

If you want to speak specifically (not vaguely) about the performance of a binary classifier, you’ll want to become fluent. Or, at the least, you’ll want to have a reference at-the-ready.

Speaking of terminology, we’re just getting warmed up! There are more terms and symbols: α, β, true positive rate, false positive rate, false negative rate, true negative rate, positive predictive value, negative predictive value, true discovery rate, and false discovery rate.

Many of these are synonyms; but various communities have different preferences. Statisticians, machine learning practitioners, and academic researchers vary in their preferred terms.

Here, I offer a visual summary of how these metrics relate and how you calculate them, all without a single equation.

It shows how various metrics tell different slices of the story. I hope you find it useful.

To make the most of it, you’ll need to understand the confusion matrix. With that knowledge, you can then see that each metric is a ratio of two numbers, each drawn from some part of a confusion matrix.

The Omnigraffle version is available on Github along with a PDF Version. All versions are licensed under the GPL v3, meaning that you can use it commercially with attribution. If you modify, please share your changes back. I welcome your suggestions via Github Issues or a pull request.

Seeking Clojure Developers

I am seeking a few Clojure developers to build tools for Bluemont Labs. We’ll be using Clojure, Storm, Riak, Kafka, AWS and lots more. Projects will involve distributed data processing, machine learning, natural language processing, search, web services, and web applications.

If you are excited about Clojure, here is a great chance to dive in, get paid, and work on innovative projects! Are you interested in big data, small data, statistics, quantitative modeling, econometrics, analytics, forecasting, or simulation? Then apply. Together, we’ll build tools to help people make data-driven decisions. I have built a prototype (a niche search engine) and engaged target likely customers. Help us get to market faster.

Here’s what I’m looking for:

  • You are an enthusiastic beginner at Clojure or better.
  • Your software skills are intermediate or better.
  • Availability of at least 10 hours a week. The ideal would be 20 to 30 hours per week.
  • Local or remote workers are welcome.
  • This is a contract position.

In case it helps, here is the process:

  • I will ask you to do a ~1 hour problem set.
  • If that goes well, then we’ll do a video interview.
  • It that works, we’ll do a short, ~1 week trial project.
  • If the project goes well, we’ll line up a bigger project.
  • Be ready to pair as needed, particularly at a project kick-off.
  • I value results, dependability, clear thinking, and tested code.
  • If you are in the DC area, we’ll cowork as needed near Dupont Circle.

To apply, please email david+clj1bml at this domain. Please point me to some sample code on Github (or elsewhere) and your resume on LinkedIn (or similar).

If you have a range for your consulting or freelanceing services, please share that. Since I am a lean startup, I may not be able to match the dollar figure of a big consulting shop. But, I’ll offer a competitive rate, interesting technology, and lot of excitement for data and analysis.

Have you worked for a great startup? In my experience, you’ll know it when you see it. Maybe we’ll make a great team? If so, we’ll make a big dent in how people use data!

Notes

What do I mean by an enthusiastic beginner with Clojure?

  • You have completed two or more Clojure projects. (These do not need to be ‘work’ projects.)
  • You have read and understand most of the Clojure documentation.
  • You understand the rationale for Clojure.

What do I mean by an intermediate developer?

  • strong in two or more languages
  • writes readable, clean code
  • writes automated tests early and often
  • thinks logically; mindful of tradeoffs
  • uses source control effectively

When I say “us”, what do I mean?

The Bluemont Labs team currently includes myself, a Clojure developer from Europe (on contract), Dev Ops support, and advisors.

Updates since the original posting

  1. I originally wrote: “Any level of Clojure is ok, as long as you have gotten started and written some code.” I have learned that I need to set the requirement a bit higher.

  2. I originally wrote: “We’ll start with an in-person or Skype interview.” I have learned that an initial screening makes the process more manageable. So, I have started using a problem set to help in my evaluation.

Hello, Quantitative Modelers

Hello! I’m David, founder of Bluemont Labs. I’m interested in connecting with people interested in quantitative modeling.

By quantitative modeling, I mean: statistical, economic, financial, computational, and mathematical modeling.

When it comes to models, they don’t have to be fancy or academic. Models serve a wide range of purposes. I’m most interested in models that are useful for specific applications, insightful, or move a discussion forward.

One of my main goals is to promote the modeling and simulation ecosystem. So, if you build or use models, please reach out. Let’s talk.