feature or bug?

Jens Alfke jens at mooseyard.com
Sat Jan 5 00:19:34 CST 2008


On 3 Jan '08, at 8:56 PM, Matt Mackall wrote:

> Meh. I hate 'em too. But they exist. The new "store" format largely
> exists to deal with that, but then people go and make unreasonably  
> long
> filenames.


Not as "unreasonable" as you may think. Characters in most Asian  
alphabets tend to decompose into three bytes each in UTF-8. If the  
filesystem's limit on filenames is 255 *bytes* (as it is in ext2/3 and  
xfs), then that's only 85 *characters* in such an alphabet. That means  
such limits get hit a lot earlier in such languages. (I've run into  
this issue myself in my job, initially reported by Korean QA engineers.)

I'm disappointed to see people jumping to value-judgments about things  
like filename length or filename equivalence rules. The implication is  
that behaviors you personally dislike should be ignored or treated as  
user errors (e.g. "use shorter filenames!") instead of being handled  
correctly. Please remember that people are using Mercurial and other  
VCS's for storing things other than program source code. Designers are  
storing client projects, and there are programs that use a VCS under  
the hood for wikis or for synchronizing folders of user documents. 85  
characters may be unreasonable for a C or Python filename, but it's  
not at all for general-purpose documents.

The fact is that if this can corrupt a Mercurial repository [as  
reported in the email that started this thread], it's a problem  
Mercurial needs to deal with. In the innards of a repository,  
Mercurial creates files whose names are based on the filenames under  
version control. This means that the data structure of the repository  
itself now depends on the details of how the local filesystem handles  
filenames, which I think is where the corruption comes from.

So while I agree with Mark's point that, short of using a virtual  
filesystem, you can't abstract away all the issues that might arise in  
a *working directory*, I don't think it follows that it's impossible  
to make the *repository* itself robust. The previously-suggested  
approach of using names based on a digest of the filename seems like a  
good one; I don't know why Mark blew it off with "go knock yourself  
out".

--Jens


More information about the Mercurial mailing list