revlog data redundancy

Benoit Boissinot benoit.boissinot at ens-lyon.org
Wed Oct 7 14:48:19 CDT 2009


On Wed, Oct 07, 2009 at 11:52:07AM -0700, chadrik wrote:
> >> in your estimation, how difficult would it be to change this behavior
> >> to make mercurial more efficient?  i'm looking into the bfiles
> >> extension, but what i'd really like to do is just teach mercurial how
> >> to be more space efficient.
> >
> > Quite difficult from a backwards-compatibility point of view.
> 
> my hope was to create a third type of store structure in addition to  
> index (*.i) and data (*.d).  this store would be a local cache for  
> putting large data files (compressed but not delta'd), with hashed  
> names based on file content only (so that two files have the same hash  
> iff they are the same file, regardless of parents).   for any revision  
> involving these large files, the index file would contain the hash to  
> find the file, instead of an offset and length.  so, in a way, it's an  
> optional git-like file store with a mercurial index.   any number of  
> mechanisms could be used to specify a "big file" -- a binary test, a  
> regex, a size threshold, or just manual specification -- but unless  
> its enabled, the current default behavior remains intact.  i think  
> this is the logical extension of the split between inlining and non- 
> inlining that is already in place.

Actually if the files are identical, it should be possible for them to
share the same entry in the revlog.

It's probably simpler to extend the index and add a flag that says "this
rev is the same as that one, go there instead".

-- 
:wq


More information about the Mercurial mailing list