Better location for unit tests in Rust
Posted on Fri 06 January 2017 in Code • Tagged with Rust, unit tests, testing, modules • Leave a comment
For a unit test to be comprehensive, it must often access some private symbols from the module it checks.
In Rust, this is permitted for submodules:
they can freely refer to anything defined “upwards” in the module hierarchy.
The only requirement is that they import it explicitly by name,
using statements such as use super::foo
.
To illustrate this, here’s an example of a ridiculously well-factored FizzBuzz along with its accompanying unit test:
use std::borrow::Cow;
pub fn fizzbuzz(n: u32) {
for i in 1..n+1 {
println!("{}", fizzbuzz_string(i));
}
}
fn fizzbuzz_string(i: u32) -> Cow<'static, str> {
let by3 = i % 3 == 0;
let by5 = i % 5 == 0;
if by3 && by5 { "FizzBuzz".into() }
else if by3 { "Fizz".into() }
else if by5 { "Buzz".into() }
else { format!("{}", i).into() }
}
#[cfg(test)]
mod tests {
use super::fizzbuzz_string;
#[test]
fn single_numbers() {
assert_eq!("1", fizzbuzz_string(1));
assert_eq!("2", fizzbuzz_string(2));
assert_eq!("Fizz", fizzbuzz_string(3));
assert_eq!("Buzz", fizzbuzz_string(5));
assert_eq!("7", fizzbuzz_string(7));
assert_eq!("Fizz", fizzbuzz_string(9));
assert_eq!("Buzz", fizzbuzz_string(10));
assert_eq!("FizzBuzz", fizzbuzz_string(15));
# etc.
}
}
The internal function, as shown above, can be imported and verified independently
of the pub
lic one.
This is done through a #[test]
procedure in an inline submodule.
Such factorization and granular testing is commonplace, especially when the public API may cause unwanted side effects, such as printing stuff to stdout here.
The issue of length
But if you are like me and prefer your modules to be short and sweet, you may feel justifiably concerned about this inline submodule business.
In the toy example above, tests have already taken at least as many lines as the actual code. Real world usually matches this ratio. A module with a couple hundred lines of regular code starts to be measured in KLOCs if we also include its tests.
While this could be taken as a strong hint to split things up, it can just as easily disincentivize testing instead.
The obvious solution is to move those tests somewhere else. What is not so evident is how to preserve this crucial module-submodule relation, enabling us to write comprehensive tests in the first place.
Looking for inspiration
I must quickly disappoint anyone who would like to round up all their unit tests and sequester them in some distant tests/ directory. Such layout is reserved for crate-level (“integration”) tests. Unit tests, on the other hand, are predestined to live among production code1.
So let’s at least relocate them to separate files.
To make this goal more concrete, we will try to emulate the project layout described in Google’s C++ style guide. By this convention, a conceptual “module” or “unit” consists of the following files:
- foo.h
- foo.cc
- foo_test.cc
Translating this to Rust, we get:
- foo.rs
- foo_test.rs
The first one is obviously our production code.
The second file, foo_test.rs,
contains all the tests we would previously put in the mod tests { }
construct.
Seems pretty clean and straightforward, right? Unfortunately, Rust will not accept this setup without some convincing.
Family problems
To understand why, recall that the mere presence of some .rs files is not enough for the Rust compiler to care. If we want them picked up and included in the project, we also need to add some module declarations first.
In other words, there must also be a mod.rs file in this directory, containing at the very least the following content:
// (mod.rs)
mod foo;
#[cfg(test)]
mod foo_test;
Now it should be clearer that something is wrong.
We got two modules here, but they are siblings.
Both foo
and foo_test
are on the same level,
children of whatever parent module contains them both.
More to the point, it’s foo_test
that’s not a child module of foo
,
meaning it can only see the pub
lic symbols of the latter.
This is not quite enough to write a proper unit test.
It definitely isn’t for our initial FizzBuzz example,
because the fizzbuzz_string
function cannot even be imported!
Existential crises
Okay, so how about we move the mod foo_test;
declaration to foo.rs?
This should be enough to establish the proper hierarchy.
After all, this is how the module tree is
normally reconstructed:
from the appropriate placement of the mod
statements.
So, here we go:
// (foo.rs)
#[cfg(test)]
mod foo_test;
error: cannot declare a new module at this location
--> src/parent/foo.rs:4:5
|
4 | mod foo_test;
…Really?
Well, yes. A declaration like this simply isn’t allowed. The reason for this is actually much less arbitrary than the error message would indicate.
To put it bluntly, foo_test
simply cannot exist if it’s introduced there.
To deliver on its declaration promise,
the submodule would have to reside within foo
itself.
But of course, foo.rs is just a file, so this setup is evidently impossible.
All in all, Rust seems to be looking for our module in all the wrong places.
Perhaps we can just tell it where it should be going instead?…
The right path
Enter the #[path]
attribute,
which fulfills this exact purpose:
// (foo.rs)
#[cfg(test)]
#[path = "./foo_test.rs"]
mod foo_test;
#[path]
tells the Rust compiler where to look for the mod
ule it is attached to.
Its argument is relative to the location of the outer module (like foo
here),
and can be either a single file, or a directory with mod.rs.
Conceptually, this is similar to a custom ClassLoader
in Java,
or the common sys.path
hacks in Python.
Unlike those two languages, however,
the #[path]
attribute is only relevant at compile time.
Additionally, and somewhat confusingly,
#[path]
can also be applied retroactively
to a module that the compiler has already located.
In such case, it will affect the lookup of any child modules
by making rustc
search for them in the new location.
With #[path]
handy,
it is therefore possible to implement custom layouts
of regular source modules and test files.
But like with every tool that can be used to defy conventions, it should be used with the appropriate care. While a straightforward and self-documenting approach described here is unlikely to raise any eyebrows, rewriting module paths willy-nilly is most certainly a bad idea.
-
Okay, technically it is possible to completely isolate them, essentially by abusing the approach I describe later in this post. ↩