Line ending translation extension
Mads Kiilerich
mads at kiilerich.com
Sun Sep 6 18:44:16 CDT 2009
Martin Geisler wrote, On 09/07/2009 12:48 AM:
> Dj Gilcrease<digitalxero at gmail.com> writes:
>
>
>> On Sat, Sep 5, 2009 at 5:46 PM, Martin Geisler<mg at lazybytes.net> wrote:
>>
>>> Anyway, for improving win32text or for a new extension -- I'm
>>> attaching a rough beginning of an extension which will parse a .hgeol
>>> file in the tip changeset and adds encode/decode filters based on
>>> that. The idea is that everybody can enable the extension globally
>>> since it will only add safe filters. One can add a section like this
>>> to Mercurial.ini to override the native line-ending:
>>>
I suggest that you put it on bitbucket so we know where to find the
latest version to hack on.
>> I decided to play with this a bit and changed the tocrlf to;
>>
>> def tocrlf(s, *args):
>> s = s.replace('\r\n', '\n').replace('\r', '\n').replace('\n', '\r\n')
>> return s
>>
>> which was an order of magnitude faster (on a 5mb file) then the regex
>> version thats in win32text
>>
> Very nice! It also does what I would expect it to do: when I say that I
> want a file to be in CRLF format, I expect the extension to normalize
> all line-endings to '\r\n', regardless of what they were before. This
> makes mixed line-endings impossible in the repository.
>
AFAICS the discussion so far haven't distinguished clearly between repo
format and local format.
For CR and CRLF files an obvious solution would be to do no translation
and store the file as-is in the repo. Some could however in some cases
want to store all repo files in the other format and convert on checkout.
For "native" the "native" obviously refers to the local format. The repo
format is less obvious, and it seems like there is a need to be able to
specify how the files should be stored in the repo.
And related: Why do people say that it is a problem related to windows?
It only becomes a windows problem if it has been defined that the repo
is in unix format. If someone decides that the repo is in windows
format, then we suddenly have a unix problem and not a windows problem ...
And less related: The current Mercurial filter functionality seems to be
over-generalized and too strong and thus too dangerous to follow the
repos and (thus) seems to not be used very much. Perhaps a "better"
filtering functionality could be used for both lineendings and other use
cases (such as keyword expansion and character set conversion and other
ugly but real problems).
/Mads
More information about the Mercurial
mailing list