Dealing with binary files (was Re: [PATCH]Make hg diff go nice on binary files)

Thomas Arendsen Hein thomas at intevation.de
Thu Jul 28 00:17:04 CDT 2005


* Matt Mackall <mpm at selenic.com> [20050727 22:27]:
> On Wed, Jul 27, 2005 at 11:20:21AM -0700, Bryan O'Sullivan wrote:
> > On Wed, 2005-07-27 at 10:53 -0700, Matt Mackall wrote:
> > > There are three ways to do it:
> > > 
> > > a) by file contents
> > 
> > > b) by file extension
> > 
> > > c) by per-file flag
> > 
> > I'd strongly, strongly, strongly prefer A, backed up by C at add time.
>
> Let me play devil's advocate a bit..

Please have a look at <20050703095327.GB1674 at intevation.de>, too.
Related to the binary question is the charset question. As a file is
either binary or text+charset we may have the following 'charsets':
- text with unknown charset
- text with charset X with X being UTF-8 or ASCII or whatever
- binary

> We'll still need to allow overriding at times other than commit for
> the cases where the user got it wrong at commit time. Bear in mind
> that such a flag will be per file revision so you won't be able to go
> back and correct it.

Of course, or to change the charset.

> So by doing c), we've made binary handling much more complicated and
> fixed less than 50% of a problem that was very small to start with.

The most important thing really is "text or binary", but as you
said, most files are text, so the charset question might be very
important, too.

Of course hg should ignore the charset, this is only intended for
the already mentioned ci/co filters or maybe for displaying raw
files in hgweb.

> And the hgweb case is probably a separate problem too. Arguably we
> should be doing some MIME magic but we might use is_binary as a hint
> that we need to do that.

MIME is another question, but this is even more complicated. Maybe
this is work for a .hgmime file? ;-)

Thomas

-- 
Email: thomas at intevation.de
http://intevation.de/~thomas/


More information about the Mercurial mailing list