performance issue with large repository

james woodyatt jhw at conjury.org
Wed May 6 20:40:54 CDT 2009


everyone--

I have a large repository, i.e. about 650 MB with around 150k files.   
It's a monstrous pile of monolithic source code integration.  Huge.

And Mercurial can be a bit a slow on Mac OS X 10.5.6 to perform "hg  
update -C" when the working set is already pre-loaded with source  
files checked out from CVS.  It's not CPU bound.  It doesn't appear I/ 
O bound.  It's just slow, and I think I know why.

Taking samples of it with Activity Monitor while it grinds away on my  
filesystem shows that it seems to be traversing the directory  
structure in a peculiar sequence, which to my untrained eye looks  
rather like it's got a Python dictionary keyed by absolute pathname,  
and it's iterating the values in hash-key order.

So, the sample shows that it's spending a lot of time in open, close,  
read and lstat, and the lsof output shows it constantly opening,  
reading and closing files in what seems like a random path every  
time.  I'm not surprised this is slow.  It would go a lot faster if  
the files were visited in something like lexicographical order.  That  
way the directory cache wouldn't be getting thrashed to bits.

The workaround for this problem is to blow away the source files  
checked out from CVS first, like so:

	$ find [A-Za-z]* -type d -name CVS -prune -o -exec rm {} \;
	$ hg update -C

The difference in execution time is stark.  If takes about five  
minutes on my MacBook Pro to do the clean update from Mercurial if the  
CVS sources are blown away first.  Without removing those files, it  
took about three hours before I finally got tired of waiting and  
SIGINT'd the thing.

Is this a problem the developers have seen?  Should I file a problem  
report?  Should I try to fix it?  Would anyone else care if I did?   
Thanks.


—
j h woodyatt <jhw at conjury.org>
http://jhw.vox.com/




More information about the Mercurial mailing list