internal fragmentation

a personal journal of hacking, science, and technology

Mercurial and Unit Testing

Thu, 12 Feb 2009 22:35 in / link / 12 comments

Mercurial doesn't do unit testing.

This is something of a shocker to some people, who think that unit testing is the one true way to test code. But in fact, it's a bad fit for Mercurial.

First off, unit testing implies stable APIs. Unit tests test the internal APIs and make sure they perform as expected. If you want to evolve your APIs fluidly over time (and we do!), the tests have to be evolved along with them. Not only does this mean a lot of work, this can mean accidentally breaking or losing test cases and having previously fixed bugs reappear.

Second, the data structures we work with most are entire project histories with backing files and directory trees of working files. All but the simplest APIs take an entire repository as an input. Many produce a changed repository or other complex object (diff, bundle, log, changed working directory). Generating these objects programmatically is complex, tedious, and subject to change.

Third, we're very serious about backwards compatibility of our user-visible interface (aka the command line). People get upset if their tools stop working the way they used to and break their build process. So it's actually more important that we get the same correct end result than that any particular module work in a particular way.

So instead we take a more holistic approach to testing. Our tests consist of simple shell scripts exercising all the visible (and some invisible) behavior of Mercurial by building repositories and running commands against them just as a user would. To test, we compare the output of each of these scripts against known good output. This lets us freely refactor our internal systems without having to revisit old tests while maintaining confidence that we haven't broken anything.

Another advantage of this approach is that it has a great synergy with bug reports. If a users sends us a list of commands they ran to produce a bug, we have a ready-made test case. A good percentage of our test suite was developed this way: directly by users.

At this point, our test suite is fairly comprehensive. We've got over 300 test scripts, some of them quite extensive, weighing in at about 20k lines (not counting results). That's more than half as big as the Mercurial code itself (a mere 37k lines). And many of these tests have been unchanged for years even though much of the underlying code has changed completely.

(All that said, we do actually do a small amount of unit testing: about 3% of our test scripts are written in Python to exercise some of our simplest isolated interfaces. Unit testing is sometimes the right answer.)

Not sure I follow your logic. Nothing in what you say is a bar to using unit tests. Sure fluid APIs mean you will occasionally change unit tests. But there are surely many functions that are quite stable and could be tested for defects that might not appear in your scripts.

"We take a more holistic approach." Since unit tests by definition test the lowest unit level, any other approach is going to be more holistic. Nobody would run unit tests only, if they know what they're doing. So, unit tests should always combined with more "holistic" tests.

So, back to my original point. I don't understand the logic of your decision. It sounds like you're not oriented towards unit tests, which is fine; but the reasons you give really are not reasons for not using them.
Andrew Binstock @ Tue Apr 21 00:09:24 2009
How long does it take to run your full test suite? One important aspect of unit tests is fast execution to ensure quick feedback for the developer. So how fast does one of your developers get feedback from one of his changes?
David Dossot @ Tue Apr 21 00:24:48 2009
Andrew writes: "Nothing in what you said is a bar to using unit tests."

Except of course, the bits that are. Bits like:

"...the data structures we work with most are entire project histories with backing files and directory trees of working files. All but the simplest APIs take an entire repository as an input. Many produce a changed repository or other complex object (diff, bundle, log, changed working directory). Generating these objects programmatically is complex, tedious, and subject to change..."

This is a big deal. Anything that's even vaguely interesting in Mercurial needs an ENTIRE ON-DISK REPOSITORY thrown at it for the most basic of tests. That's hard to gloss over. When you need to pull in the bulk of the program just to build a test case, you're not unit-testing any more.

Points one and three are worth revisiting too. There are many points where we've done substantial refactorings which would have made us rewrite the majority of the test suite had it been implemented as unit tests.

You also glossed over my last point which is that we actually DO use unit tests for some things and our framework supports them. But we've found very little practical use for them.
Matt Mackall @ Tue Apr 21 00:40:42 2009
David: About 10 minutes for the full test suite.
Matt Mackall @ Tue Apr 21 00:48:14 2009
Having functional tests instead of unit tests is totally fine, as long as there are (automated) tests that actually cover the code. Do you have any numbers on test coverage, e.g. what % of SLOCs is covered through the functional tests that you run?
ak @ Tue Apr 21 01:17:13 2009
Why do you not call them unit tests?

You've defined your "unit" to be tested as the command line interface of 'hg', based on the logical and sensible notion of having an automated test system that yields maximum ROI.

That's what teams should do -- and what good teams do -- rather than sticking to some dogma about The Right Way To Test.
MD @ Tue Apr 21 01:29:23 2009
MD: why do you insist we call them unit tests?
Dirkjan Ochtman @ Tue Apr 21 03:50:48 2009
ak: We have a built-in way of getting coverage numbers, though it falls down in some cases where we use subprocesses, I think.
Dirkjan Ochtman @ Tue Apr 21 03:51:57 2009
See Where, Oh Where to Test? by Kent Beck. http://www.threeriversinstitute.org/WhereToTest.html
EP @ Tue Apr 21 04:05:18 2009
Dirkjan Ochtman:  Questioning whether tests are unit tests is a natural result of a an entire blog post dedicated to not calling your tests unit tests.  It has nothing to do with whether or not they are actually unit tests or not.
MC @ Tue Apr 21 07:26:57 2009
Any task can be broken down in such a way that it can be unit tested, but it is also true that this is not necessarily the best way to do it.

What were the reasons you decided upon this architecture rather than one which encouraged unit testing? What were the benefits that made up for their loss?
John Eikenberry @ Tue Apr 21 10:44:40 2009
This is similar to the testing approach we use for the Web page handling code in Mozilla, and I think it's a good approach.  We test correctness of that code primarily using two types of tests (though our testing documentation also describes others used mainly in other areas of code):  reftests, which test pixel-for-pixel equivalence of the display of Web pages, and mochitests, which use JavaScript to test the behavior of the APIs that Web authors can access from JavaScript.  Some earlier testing approaches in the project tested internal APIs, but they proved unmaintainable.
David Baron @ Tue Apr 21 10:49:30 2009

Name:


E-mail:


URL:


Comment: