measuring test coverage

Automated tests are great, from small unit tests up to bigger ones like integration tests.

Measuring coverage will help you understand

how good those tests are, and
how good the tested target code is.

A coverage report will sometimes point out we should write a currently “missing” test to exercise a recently added function, that’s simple. But looking at uncovered target lines, e.g. in an exception handler, can help with critiquing the Public API, can suggest Extract Helper on deeply buried code so now it’s exposed to unit tests. Often there will be an if condition that is “hard” or nearly impossible to make true, so tests never run the if clause, not till we break out a helper.

concept

The approach to measurement is pretty simple, and is available in many languages. In C / C++ land, GNU gcov has been around for decades, recently prettified by gcovr , and clang offers nice measurement support as well. Rust and most other languages offer similar support. Python programmers will want to rely on coverage .

How does it work? The coverage tool will expand your code by mixing in a bunch of counter increments. There’s more than one measurement granularity. Let’s start with the coarsest level.

granularity

To measure at “function” granularity, imagine that an update of a global was added just after each def foo(): in your python source:

def foo():
    counter[__file__ + get_line_num()] += 1
    ...

The inspect module offers convenient access to the current source code line number.

Now during any run, such as unit tests, we can easily tell whether foo() executed or not. The excellent coverage module updates a defaultdict(int) global in this way, and then produces a pretty report showing covered functions in green while uncovered is red.

We can readily move to “line” granularity by just adding a counter increment before every line of source code. This is enough to support fairly nice source reports painted red and green.

Moving to “expression” granularity is a bit more challenging. Consider this example:

if a() and b() and c():
    print(d(e(f())))

The and operator is short circuiting, so if e.g. b() is False we never call c() , we never cover it. One way out is to think of it in these terms:

if (a()
    and b()
    and c()):

Another way would use from dis import dis and then we work with the generated byte code.

An additional gotcha is that the call to f might raise an error so we never make the other calls to e , d , and print , which might leave d uncovered by our test suite.

prev next