internal fragmentation

a personal journal of hacking, science, and technology

Mercurial: tracking down a performance regression

Fri, 28 May 2010 16:49 by mpm in mercurial (tagged , , , ) (link)

Today on the Mercurial-devel list, Jason Harris of MacHg noted:

In reference to matching the current version of Mercurial I noticed in my quick
testing that for quick status commands 1.4.2 was some 20% faster than 1.5.3.

That definitely shouldn’t happen. So I hopped over to a handy large repository and tried a quick test. I keep around builds of each Mercurial release for quick testing:

$ for i in `seq 5` ; do time hg153 st --all > /dev/null; done

real	0m1.754s
user	0m1.403s
sys	0m0.333s

real	0m1.749s
user	0m1.467s
sys	0m0.273s

real	0m1.741s
user	0m1.413s
sys	0m0.320s

real	0m1.761s
user	0m1.373s
sys	0m0.360s

real	0m1.772s
user	0m1.437s
sys	0m0.303s

$ for i in `seq 5` ; do time hg143 st --all > /dev/null; done

real	0m1.598s
user	0m1.283s
sys	0m0.297s

real	0m1.427s
user	0m1.070s
sys	0m0.343s

real	0m1.434s
user	0m1.127s
sys	0m0.293s

real	0m1.468s
user	0m1.127s
sys	0m0.307s

real	0m1.436s
user	0m1.100s
sys	0m0.313s

And indeed there’s a performance regression here. But as it happens, our normal performance-measuring tool doesn’t see any significant difference:

$ hg14 perfstatus
! result: 4156
! wall 0.664630 comb 0.660000 user 0.420000 sys 0.240000 (best of 15)
$ hg153 perfstatus
! result: 4156
! wall 0.674376 comb 0.670000 user 0.440000 sys 0.230000 (best of 14)

‘Perfstatus’ is part of contrib/perf.py, which an extension we use to benchmark some of most performance-critical code. It automates the process of choosing a reasonable sample size, much like Python’s timeit module. Note that it reports only the fastest time: practically all sources of noise in performance measurement (scheduling, power-saving, interrupts, cache effects) only make things slower, not faster. Thus picking the fastest result lets you focus on the least noisy measurement.

So something interesting is happening outside the status fast path. And we also get a null result measuring normal startup time via ‘perfstartup’. This calls for running bisect on the Mercurial repo itself:

$ cd ~/hg
$ hg bisect -r  # reset the bisection state
$ hg bisect -b tip  # latest version is bad
$ hg bisect -g 1.4
Testing changeset 10549:b97736016273 (1359 changesets remaining, ~10 tests)
332 files updated, 0 files merged, 27 files removed, 0 files unresolved

After marking a bad and a good revision, Mercurial jumps me to a revision in the middle for me to test. To run a test, I simply hop over to another terminal and run my for loop from earlier, then hop back and report ‘hg bisect -g’ or ‘hg bisect -b’. After about 10 iterations, Mercurial reports:

The first bad revision is:
changeset:   10176:24ce8f0c0a39
user:        Augie Fackler <durin42@gmail.com>
date:        Thu Dec 31 17:19:30 2009 -0600
summary:     dirstate: don't check state of subrepo directories

Hmm, this changeset seems innocent enough. And the repo we’re testing against doesn’t use subrepositories anyway. But closer inspection finds:

+            subrepos = ctx1.substate.keys()

And looking at how substate works, we see that it’s doing:

if '.hgsub' in ctx:
     read('.hgsub')

Hidden in that ‘in’ operation is a call into the context class that demand-loads the manifest for the parent changeset to check if a file is present. This isn’t showing up in the perfstatus benchmark because the first call is priming our internal manifest cache. Reading a manifest is one of the more expensive operations in Mercurial’s core and on large repositories can take a substantial fraction of a second, so we generally try to avoid. Can we avoid it here?

The answer is: yes, sometimes. When subrepos are in use, we do need to consult the committed .hgsub file. But in most cases, we can do a quick check in the dirstate (which we already have to demand-load for status checks) before going deeper:

 if working: # we need to scan the working dir
-            subrepos = ctx1.substate.keys()
+            subrepos = []
+            if '.hgsub' in self.dirstate:
+                subrepos = ctx1.substate.keys()
 s = self.dirstate.status(match, subrepos, listignored,
 listclean, listunknown)

(This patch is now in stable and will be part of the scheduled monthly minor release of 1.5.4 on June 1st.)

Mercurial: extending revision specifiers and a dilemma

Wed, 26 May 2010 15:56 by mpm in mercurial (tagged ) (link)

Right now we’re working on a way to greatly extend the way sets of revisions are specified in Mercurial. Currently most commands can take one or more ‘-r’ arguments that take single revisions and ranges of the form ‘start:end’, and revisions can be specified as a revision number, a relative revision number (eg -2), a changeset hash, a tag, a bookmark, a branch name, or built-ins like ‘.’, ‘null’, and ‘tip’.

This is good enough for many purposes, but we often get requests for more flexibility. These requests usually show up as new options for the log command, which has now grown a small army of them. The more popular of these will also trickle over into other commands. This is less than ideal because it means we have lots of options and potentially lots of duplicated and hairy code.

The new plan is to have a simple query language that lets you easily specify more complex queries anywhere a revision option is taken. The query language looks like this:

  • all changes on a branch:  branch(foo)
  • all descendants of 1.0: descendants(1.0)
  • all ancestors of 1.1 and descendants of 1.0:
    ancestors(1.1 ) and descendants(1.0) or 1.0..1.1
  • all ancestors of the working directory: ancestors(.) or follow()
  • all changesets from May containing the keyword ‘bug’: date(“may”) and keyword(“bug”)
  • all changeset from Bob or Alice: user(“bob”) or user(“alice”)
  • a complicated query:
    (user(“bob”) or user(“alice”)) and not merge() and 1.0..stable and keyword(“foo.py”) and not date(“may 10-may 20”)

This just begins to scratch the surface of what we can do with this scheme. We’ve actually made quite a lot of progress on this in the past week. In fact we now have two nearly complete rival implementations, each with their own tokenizer, parser, evaluator, predicates, and query optimization strategy. The most interesting piece here is the approach to parsing.

My version uses a simple ‘Pratt Parser‘ that I wrote for this purpose. It makes parsing algebraic and query-like expressions very simple. On the upside, this parser is very small: the parser module itself is only 75 lines and can be driven entirely by data rather than by verbose code. On the downside, it’s rather mysterious to the uninitiated. The ‘grammar’ that I pass to the parsing engine looks like this in its entirety:

elements = {
    "(": (20, ("group", 1, ")"), ("func", 1, ")")),
    "..": (17, None, ("dagrange", 17)),
    "not": (15, ("not", 15)),
    ":": (10, None, ("range", 10)),
    "and": (5, None, ("and", 5)),
    "or": (4, None, ("or", 4)),
    ",": (2, None, ("list", 2)),
    ")": (0, None, None),
    "end": (0, None, None),
    "symbol": (0, ("symbol",), None),
    "string": (0, ("string",), None),
}

(The magic numbers here are operator precedence and associativity, with clauses specifying how the symbol works in infix or prefix situations.)

Henrik Stuart has taken a more conventional approach. He’s taken the full-featured Parsing.py LR(1) parser and built a BNF-style grammar based on it. On the upside, everyone who knows anything about parsing knows how BNF works. On the downside, the parser itself weighs in at ~2000 lines of third-party not-recently-updated code, with the Mercurial-specific query grammar adding a few hundred more.

It’s clear that both versions are well up to the task at hand (and the other parsing-related task I’m looking at, hgweb’s templating engine), which means we have to weigh the strangeness and simplicity of the Pratt Parser against the size and conventionality of the Parsing.py approach.

Blog revived

Sat, 22 May 2010 10:39 by mpm in Uncategorized (link)

I stopped writing here regularly due to several annoyances with PyBloxsom, chiefly its poor paging model (non-existent in the default install) and comment spam protection.  I’ve just migrated my blog from PyBloxsom to WordPress, which should improve things on both fronts. Sorry if you’ve just gotten some spew on your RSS feed.