Performance with binary-heavy repositories
Jens Alfke
jens at mooseyard.com
Thu Aug 2 12:07:54 CDT 2007
On Aug 2, 2007, at 9:39 AM, Matt Mackall wrote:
> Mercurial's bdiff algorithm treats all files as strings of bytes and
> breaks them on newline characters. For low-entropy "pure binary" files
> like JPEGs, those should occur roughly every 256 characters so the
> average "line length" for a binary file is a bit longer than for text,
> but not outrageously so.
Really? I thought, from reading the [excellent] paper on the innards
of Mercurial, that it used a binary-delta algorithm (the old first
version of xdelta, IIRC) for binary files.
I would imagine that a line-oriented text diff algorithm would
achieve pretty poor compression on a binary file, much less than one
designed for binary data.
The current xdelta, version 3 <http://xdelta.org/>, appears to be the
state of the art in delta compression, and emits standard VCDIFF [RFC
3284] format. (Although for my own work I've been using zdelta,
largely because the license is more flexible.)
--Jens
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://selenic.com/pipermail/mercurial/attachments/20070802/66181c17/attachment-0001.htm
More information about the Mercurial
mailing list