internal fragmentation

a personal journal of hacking, science, and technology

Mercurial: tracking down a performance regression

Fri, 28 May 2010 16:49 by mpm in mercurial (tagged , , , ) (link)

Today on the Mercurial-devel list, Jason Harris of MacHg noted:

In reference to matching the current version of Mercurial I noticed in my quick
testing that for quick status commands 1.4.2 was some 20% faster than 1.5.3.

That definitely shouldn’t happen. So I hopped over to a handy large repository and tried a quick test. I keep around builds of each Mercurial release for quick testing:

$ for i in `seq 5` ; do time hg153 st --all > /dev/null; done

real	0m1.754s
user	0m1.403s
sys	0m0.333s

real	0m1.749s
user	0m1.467s
sys	0m0.273s

real	0m1.741s
user	0m1.413s
sys	0m0.320s

real	0m1.761s
user	0m1.373s
sys	0m0.360s

real	0m1.772s
user	0m1.437s
sys	0m0.303s

$ for i in `seq 5` ; do time hg143 st --all > /dev/null; done

real	0m1.598s
user	0m1.283s
sys	0m0.297s

real	0m1.427s
user	0m1.070s
sys	0m0.343s

real	0m1.434s
user	0m1.127s
sys	0m0.293s

real	0m1.468s
user	0m1.127s
sys	0m0.307s

real	0m1.436s
user	0m1.100s
sys	0m0.313s

And indeed there’s a performance regression here. But as it happens, our normal performance-measuring tool doesn’t see any significant difference:

$ hg14 perfstatus
! result: 4156
! wall 0.664630 comb 0.660000 user 0.420000 sys 0.240000 (best of 15)
$ hg153 perfstatus
! result: 4156
! wall 0.674376 comb 0.670000 user 0.440000 sys 0.230000 (best of 14)

‘Perfstatus’ is part of contrib/perf.py, which an extension we use to benchmark some of most performance-critical code. It automates the process of choosing a reasonable sample size, much like Python’s timeit module. Note that it reports only the fastest time: practically all sources of noise in performance measurement (scheduling, power-saving, interrupts, cache effects) only make things slower, not faster. Thus picking the fastest result lets you focus on the least noisy measurement.

So something interesting is happening outside the status fast path. And we also get a null result measuring normal startup time via ‘perfstartup’. This calls for running bisect on the Mercurial repo itself:

$ cd ~/hg
$ hg bisect -r  # reset the bisection state
$ hg bisect -b tip  # latest version is bad
$ hg bisect -g 1.4
Testing changeset 10549:b97736016273 (1359 changesets remaining, ~10 tests)
332 files updated, 0 files merged, 27 files removed, 0 files unresolved

After marking a bad and a good revision, Mercurial jumps me to a revision in the middle for me to test. To run a test, I simply hop over to another terminal and run my for loop from earlier, then hop back and report ‘hg bisect -g’ or ‘hg bisect -b’. After about 10 iterations, Mercurial reports:

The first bad revision is:
changeset:   10176:24ce8f0c0a39
user:        Augie Fackler <durin42@gmail.com>
date:        Thu Dec 31 17:19:30 2009 -0600
summary:     dirstate: don't check state of subrepo directories

Hmm, this changeset seems innocent enough. And the repo we’re testing against doesn’t use subrepositories anyway. But closer inspection finds:

+            subrepos = ctx1.substate.keys()

And looking at how substate works, we see that it’s doing:

if '.hgsub' in ctx:
     read('.hgsub')

Hidden in that ‘in’ operation is a call into the context class that demand-loads the manifest for the parent changeset to check if a file is present. This isn’t showing up in the perfstatus benchmark because the first call is priming our internal manifest cache. Reading a manifest is one of the more expensive operations in Mercurial’s core and on large repositories can take a substantial fraction of a second, so we generally try to avoid. Can we avoid it here?

The answer is: yes, sometimes. When subrepos are in use, we do need to consult the committed .hgsub file. But in most cases, we can do a quick check in the dirstate (which we already have to demand-load for status checks) before going deeper:

 if working: # we need to scan the working dir
-            subrepos = ctx1.substate.keys()
+            subrepos = []
+            if '.hgsub' in self.dirstate:
+                subrepos = ctx1.substate.keys()
 s = self.dirstate.status(match, subrepos, listignored,
 listclean, listunknown)

(This patch is now in stable and will be part of the scheduled monthly minor release of 1.5.4 on June 1st.)