Automated tests are great, from small unit tests up to bigger ones like integration tests.
Measuring coverage will help you understand
A coverage report will sometimes point out we should write a currently “missing” test to exercise a recently added function, that’s simple. But looking at uncovered target lines, e.g. in an exception handler, can help with critiquing the Public API, can suggest Extract Helper on deeply buried code so now it’s exposed to unit tests. Often there will be an
if
condition that is “hard” or nearly impossible to make true, so tests never run the if clause, not till we break out a helper.
The approach to measurement is pretty simple, and is available in many languages. In C / C++ land, GNU gcov has been around for decades, recently prettified by gcovr , and clang offers nice measurement support as well. Rust and most other languages offer similar support. Python programmers will want to rely on coverage .
How does it work? The coverage tool will expand your code by mixing in a bunch of counter increments. There’s more than one measurement granularity. Let’s start with the coarsest level.
To measure at “function” granularity, imagine that an update of a global was added just after each
def foo():
in your python source:
def foo():
counter[__file__ + get_line_num()] += 1
...
The inspect module offers convenient access to the current source code line number.
Now during any run, such as unit tests, we can easily tell whether
foo()
executed or not. The excellent
coverage
module updates a
defaultdict(int)
global in this way, and then produces a pretty report showing covered functions in green while uncovered is red.
We can readily move to “line” granularity by just adding a counter increment before every line of source code. This is enough to support fairly nice source reports painted red and green.
Moving to “expression” granularity is a bit more challenging. Consider this example:
if a() and b() and c():
print(d(e(f())))
The
and
operator is short circuiting, so if e.g.
b()
is
False
we never call
c()
, we never cover it. One way out is to think of it in these terms:
if (a()
and b()
and c()):
Another way would use
from dis import dis
and then we work with the generated byte code.
An additional gotcha is that the call to
f
might raise an error so we never make the other calls to
e
,
d
, and
print
, which might leave
d
uncovered by our test suite.
Copyright 2022 John Hanley. MIT licensed.