Consequences for use of hg for other applications than SCM was Re: German umlauts in file names

Shun-ichi GOTO shunichi.goto at gmail.com
Mon Jun 23 19:17:52 CDT 2008


2008/6/24 Mads Kiilerich <mads at kiilerich.com>:
> Hans Meine wrote, On 06/23/2008 03:15 PM:
>>
>> As Matt wrote, hg does *not* use the unicode API (which is also available
>> in Python, see the link I posted above), but uses only 8-bit functions.
>>  This way, unicode filenames cannot be preserved.  IMO this qualifies as a
>> bug - OK, call it a documented, clean, but for certain users unexpected (and
>> undesired) behavior which cannot be changed.
>>
>> However, I think this should not be hard to fix for people like Marko.
>
> Isn't that almost and not entirely unlike what the win32mbcs extension does?

No. Win32mbcs does hook and replace only some path-manupilation functions
to manupilate with decoded unicode string then re-encode.

> It seems to be a few-liner to change it to a "win32utf8repo" extension which
> assumes that the repo uses utf-8 encoding and the filesystem uses raw
> unicode. win32mbcs seems to be a special case of that. I think that except
> for existing windows-only repos then utf-8 as repo encoding is a fair
> assumption.

I've tried to implement such a conversion before, but unfortunately it
was not so easy
because there are many file-system access functions like (ex. file())
and hard coded
in many places.

Of course It may be possible by someone's clever idea. I am a one of the persons
who want to use filename conversion like svn.

-- 
Shun-ichi GOTO


More information about the Mercurial mailing list