Performance with binary-heavy repositories

Matt Mackall mpm at selenic.com
Fri Aug 3 10:27:59 CDT 2007


On Fri, Aug 03, 2007 at 12:07:21PM +0200, Christoph.Spiel at partner.bmw.de wrote:
> Bryan -
> 
> > Could you give me an idea of the
> > sizes of your files, please?
> 
> I give you even more. ;)  Here comes a histogram of sizes.

And I'll add a column here of cumulative size:
 
>    Size/Bytes   Occurrencies       Total
>    ==========   ============       =====
>         5977        1355           8098835
>        46882         108          13162091
>        84918          42          16728647
>       107234          18          18658859
>       144558          13          20682671
>       196372           6          21860903
>       245490           2          22351883
>       256062           3          23120069
>       320656           3          24082037
>       450022           1          24532059
>       737280           2          26006619
>       975360           1
>      1167872           1
>      1624576           1
>      2211809           1
>      2694375           1
>      5317610           1
>      5505148           1
>     12460544           1
>     14458072           1
>     24047618           1
>     27227648           1

What this tells us is that the 11 files in the tail of your
distribution completely swamp the 1553 files at the head in terms of
file size. Even with a linear algorithm, you'd spend more time
compressing that last file than the first 1553. 

It'd be interesting to get a graph of bdiff performance against file
size.

--
Mathematics is the supreme nostalgia of our time.


More information about the Mercurial mailing list