feature or bug?
Jens Alfke
jens at mooseyard.com
Sat Jan 5 00:19:34 CST 2008
On 3 Jan '08, at 8:56 PM, Matt Mackall wrote:
> Meh. I hate 'em too. But they exist. The new "store" format largely
> exists to deal with that, but then people go and make unreasonably
> long
> filenames.
Not as "unreasonable" as you may think. Characters in most Asian
alphabets tend to decompose into three bytes each in UTF-8. If the
filesystem's limit on filenames is 255 *bytes* (as it is in ext2/3 and
xfs), then that's only 85 *characters* in such an alphabet. That means
such limits get hit a lot earlier in such languages. (I've run into
this issue myself in my job, initially reported by Korean QA engineers.)
I'm disappointed to see people jumping to value-judgments about things
like filename length or filename equivalence rules. The implication is
that behaviors you personally dislike should be ignored or treated as
user errors (e.g. "use shorter filenames!") instead of being handled
correctly. Please remember that people are using Mercurial and other
VCS's for storing things other than program source code. Designers are
storing client projects, and there are programs that use a VCS under
the hood for wikis or for synchronizing folders of user documents. 85
characters may be unreasonable for a C or Python filename, but it's
not at all for general-purpose documents.
The fact is that if this can corrupt a Mercurial repository [as
reported in the email that started this thread], it's a problem
Mercurial needs to deal with. In the innards of a repository,
Mercurial creates files whose names are based on the filenames under
version control. This means that the data structure of the repository
itself now depends on the details of how the local filesystem handles
filenames, which I think is where the corruption comes from.
So while I agree with Mark's point that, short of using a virtual
filesystem, you can't abstract away all the issues that might arise in
a *working directory*, I don't think it follows that it's impossible
to make the *repository* itself robust. The previously-suggested
approach of using names based on a digest of the filename seems like a
good one; I don't know why Mark blew it off with "go knock yourself
out".
--Jens
More information about the Mercurial
mailing list