Consequences for use of hg for other applications than SCM was Re: German umlauts in file names

Adrian Buehlmann adrian at cadifra.com
Thu Jun 26 13:20:29 CDT 2008


On 26.06.2008 11:08, Adrian Buehlmann wrote:
> On 21.06.2008 02:22, Matt Mackall wrote:
> A somewhat related question:
>
> I have a problem with encoding of filenames in my experimental
> long path patch [1].
>
> What is the correct way to convert from / to unicode filename strings
> on Windows, if I want to mimic the current behavior of Mercurial on
> Windows (which is surely needed for compatibility with current repos)?

Ok. I came up with the modification pasted at the end [3]. Most likely very
inefficient but it seems to do the same encoding/decoding so far as current
Mercurial:

> dir
 Volume in drive W is Sys
 Volume Serial Number is 8017-C29E

 Directory of W:\tmp\aa

26.06.2008  20:01    <DIR>          .
26.06.2008  20:01    <DIR>          ..
26.06.2008  19:44    <DIR>          .hg
26.06.2008  20:01                12 äöü.txt
               1 File(s)             12 bytes
               3 Dir(s)  30'946'508'800 bytes free

> hg sta
? Σ÷ⁿ.txt

> hgt sta
--- running hg from W:\hg-longpath
? Σ÷ⁿ.txt


[3]:

diff --git a/mercurial/osutil.py b/mercurial/osutil.py
--- a/mercurial/osutil.py
+++ b/mercurial/osutil.py
@@ -12,27 +12,29 @@

 def listdir(path, stat=False):
     '''listdir(path, stat=False) -> list_of_tuples

     Return a sorted list containing information about the entries
     in the directory.

     If stat is True, each element is a 3-tuple:

       (name, type, stat object)

     Otherwise, each element is a 2-tuple:

       (name, type)
     '''
     result = []
     prefix = path + os.sep
     names = os.listdir(util.longpath(path)) # returns unicode strings on Windows
     names.sort()
     for fn in names:
-        fn = fn.encode()
+        def shrink(unicodestring):
+            return ''.join([chr(ord(c)) for c in unicodestring])
+        fn = shrink(fn)
         st = os.lstat(util.longpath(prefix + fn))
         if stat:
             result.append((fn, _mode_to_kind(st.st_mode), st))
         else:
             result.append((fn, _mode_to_kind(st.st_mode)))
     return result
diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -1109,41 +1109,43 @@
             msvcrt.setmode(fd.fileno(), os.O_BINARY)

     def pconvert(path):
         return '/'.join(splitpath(path))

     def localpath(path):
         return path.replace('/', '\\')

     _longpathprefix = "\\\\?\\"
     def longpath(path):
         '''convert path to a Windows long path
         needed to call Windows api with paths longer than 260'''
         if path.startswith(_longpathprefix):
             res = path
         else:
             path = path.replace('/', '\\').replace('\\.\\', '\\')
             if path[-1] == '.':
                 path = path[:-1]
             if not os.path.isabs(path):
                 path = os.path.abspath(path)
-            res = unicode(_longpathprefix + path)
+            def expand(s):
+               return u''.join([unichr(ord(c)) for c in s])
+            res = expand(_longpathprefix + path)
         return res

     def normpath(path):
         return pconvert(os.path.normpath(path))

     makelock = _makelock_file
     readlock = _readlock_file

     def samestat(s1, s2):
         return False

     # A sequence of backslashes is special iff it precedes a double quote:
     # - if there's an even number of backslashes, the double quote is not
     #   quoted (i.e. it ends the quoted region)
     # - if there's an odd number of backslashes, the double quote is quoted
     # - in both cases, every pair of backslashes is unquoted into a single
     #   backslash
     # (See http://msdn2.microsoft.com/en-us/library/a1y7w461.aspx )
     # So, to quote a string, we must surround it in double quotes, double
     # the number of backslashes that preceed double quotes and add another






More information about the Mercurial mailing list