Line ending translation extension

Mads Kiilerich mads at kiilerich.com
Sun Sep 6 18:44:16 CDT 2009


Martin Geisler wrote, On 09/07/2009 12:48 AM:
> Dj Gilcrease<digitalxero at gmail.com>  writes:
>
>    
>> On Sat, Sep 5, 2009 at 5:46 PM, Martin Geisler<mg at lazybytes.net>  wrote:
>>      
>>> Anyway, for improving win32text or for a new extension -- I'm
>>> attaching a rough beginning of an extension which will parse a .hgeol
>>> file in the tip changeset and adds encode/decode filters based on
>>> that. The idea is that everybody can enable the extension globally
>>> since it will only add safe filters. One can add a section like this
>>> to Mercurial.ini to override the native line-ending:
>>>        

I suggest that you put it on bitbucket so we know where to find the 
latest version to hack on.

>> I decided to play with this a bit and changed the tocrlf to;
>>
>> def tocrlf(s, *args):
>>      s = s.replace('\r\n', '\n').replace('\r', '\n').replace('\n', '\r\n')
>>      return s
>>
>> which was an order of magnitude faster (on a 5mb file) then the regex
>> version thats in win32text
>>      
> Very nice! It also does what I would expect it to do: when I say that I
> want a file to be in CRLF format, I expect the extension to normalize
> all line-endings to '\r\n', regardless of what they were before. This
> makes mixed line-endings impossible in the repository.
>    

AFAICS the discussion so far haven't distinguished clearly between repo 
format and local format.

For CR and CRLF files an obvious solution would be to do no translation 
and store the file as-is in the repo. Some could however in some cases 
want to store all repo files in the other format and convert on checkout.

For "native" the "native" obviously refers to the local format. The repo 
format is less obvious, and it seems like there is a need to be able to 
specify how the files should be stored in the repo.

And related: Why do people say that it is a problem related to windows? 
It only becomes a windows problem if it has been defined that the repo 
is in unix format. If someone decides that the repo is in windows 
format, then we suddenly have a unix problem and not a windows problem ...

And less related: The current Mercurial filter functionality seems to be 
over-generalized and too strong and thus too dangerous to follow the 
repos and (thus) seems to not be used very much. Perhaps a "better" 
filtering functionality could be used for both lineendings and other use 
cases (such as keyword expansion and character set conversion and other 
ugly but real problems).

/Mads


More information about the Mercurial mailing list