AW: bdiff/bsdiff: comprehensive performance measurements for large files [long, but nice graphics attached]
Christoph.Spiel at partner.bmw.de
Christoph.Spiel at partner.bmw.de
Mon Sep 10 00:25:00 CDT 2007
Ralf,
>These are interesting results. As stated in my patch-mail I initially
>wrote this patch as I couldn't commit at all due to internal stack
>overflow in mercurial.
The "recurse" function in the "bdiff" module could be optimized
for a smaller stack footprint. It would make the code a bit
uglier, though.
>So another interesting experiment would be to
>have large files of same size but with increasing number of changed
>lines. I would assume that the internal bdiff algo gets to its
>limits so that at some time the bsdiff algo is faster.
My tests show that bsdiff _could_ have an advantage for files
larger than 32MB. However, most of the time bdiff wins even
for this size.
>Did you also measure the resulting repository size? If so: Did you get
>smaller or bigger repositories with bsdiff?
Yes, I did over the weekend. bdiff and bsdiff produce repositories
of nearly the same sizes. Neither is better. This holds for small
files as well as for large ones (say 64MB).
>Did the patch proved to be stable for you?
Yes, it looks absolutely rock solid.
The big problem of bsdiff is its bad performance. It is an order
of magnitude slower than bdiff. This means that it takes ten times
as many operations to arrive at the same difference set. If a
commit takes an hour instead of just 6 minutes, well your
productivity is heading for the South Pole.
/cls
--
Dr. Christoph L. Spiel
BMW Forschungs- und Innovationszentrum, EA-410
Lauchstaedterstrasse 5, 80995 Muenchen
More information about the Mercurial
mailing list