Converting from CVSNT

Greg Ward greg-hg at gerg.ca
Tue Jun 30 15:47:33 CDT 2009


On Sun, Jun 28, 2009 at 3:56 PM, Michael Haggerty<mhagger at alum.mit.edu> wrote:
> It is of course unnecessary to toposort the changesets read in
> fastimport format, since they are necessarily already toposorted

Depends on your definition of "unnecessary".  My experience is mostly
with our fair-sized CVS repository at work: ~26,000 files, ~100,000
commits.  (That becomes ~130,000 commits when I run it through
cvs2git, presumably because of tag and branch fixups.)

If I convert to Mercurial following chronological order (as
cvs2{svn,git} generate), then I get a ~5 GB manifest file.  If I let
Mercurial toposort the way it wants, I get a ~150 MB manifest file (if
memory serves).  (And if I tweak the toposort algorithm to generate a
more sensible but not quite space-optimal sort, I think I got a ~180
MB manifest.)  That makes the difference between "darn, Mercurial
looks neat but is unsuitable for us" and "Mercurial wins".

The annoying thing is that this this is all just an implementation
detail of Mercurial, but it's a *killer* implementation detail.

> But if the convert extension is also doing a toposort itself when
> converting from CVS

It is, and for the same reason.  The cvsps knockoff algorithm gives a
date-ordered record of commits, but if you let convert's toposort have
its way, you get a far more space-efficient conversion.  Same issue as
above.

> I don't quite understand why cvs2git+fastimport
> should be much slower than hg convert.  Is the cvs2git part just
> pathetically slow?

cvs2{svn,git} is definitely slow, but I would hesitate to say
"pathetically slow".  I prefer to be generous and assume that getting
anything useful out of CVS is such a mind-bogglingly difficult task
that the poor thing has to take its time.  ;-)  Right now,
hg-fastimport is very slow.  I mean, I used to think bzr-fastimport
was slow, and then I started "improving" hg-fastimport.  Sigh.  I have
not analyzed it to determine where the bottleneck is.

> By the way, I've done some work (not yet published) on changing cvs2git
> to generate the revision contents much more efficiently (by using the
> internal checkout code instead of calling "cvs co" each time).  This
> would not save so much time for cvs2hg because hg-fastimport requires
> inline blobs

That last bit is no longer true.  I fixed hg-fastimport to accomodate
blob refs months ago:
http://vc.gerg.ca/hg/hg-fastimport/rev/9e9c215fcbd8 .  (That's why I
mostly use cvs2git for testing hg-fastimport; I pretty much ignore
cvs2hg.  I think the only difference between cvs2hg and cvs2git needs
to be max number of merge parents, since hg-fastimport does not handle
octopus merges yet.)

Greg



More information about the Mercurial mailing list