performance issue with large repository
james woodyatt
jhw at conjury.org
Wed May 6 20:40:54 CDT 2009
everyone--
I have a large repository, i.e. about 650 MB with around 150k files.
It's a monstrous pile of monolithic source code integration. Huge.
And Mercurial can be a bit a slow on Mac OS X 10.5.6 to perform "hg
update -C" when the working set is already pre-loaded with source
files checked out from CVS. It's not CPU bound. It doesn't appear I/
O bound. It's just slow, and I think I know why.
Taking samples of it with Activity Monitor while it grinds away on my
filesystem shows that it seems to be traversing the directory
structure in a peculiar sequence, which to my untrained eye looks
rather like it's got a Python dictionary keyed by absolute pathname,
and it's iterating the values in hash-key order.
So, the sample shows that it's spending a lot of time in open, close,
read and lstat, and the lsof output shows it constantly opening,
reading and closing files in what seems like a random path every
time. I'm not surprised this is slow. It would go a lot faster if
the files were visited in something like lexicographical order. That
way the directory cache wouldn't be getting thrashed to bits.
The workaround for this problem is to blow away the source files
checked out from CVS first, like so:
$ find [A-Za-z]* -type d -name CVS -prune -o -exec rm {} \;
$ hg update -C
The difference in execution time is stark. If takes about five
minutes on my MacBook Pro to do the clean update from Mercurial if the
CVS sources are blown away first. Without removing those files, it
took about three hours before I finally got tired of waiting and
SIGINT'd the thing.
Is this a problem the developers have seen? Should I file a problem
report? Should I try to fix it? Would anyone else care if I did?
Thanks.
—
j h woodyatt <jhw at conjury.org>
http://jhw.vox.com/
More information about the Mercurial
mailing list