hgwebdir, Apache, CGI and character encodings (and the Wiki)
Paul Boddie
paul.boddie at biotek.uio.no
Tue Nov 10 11:34:38 CST 2009
Hello,
I've recently been deploying another hgwebdir instance and I noticed
that the non-ASCII characters in the commit messages were very broken.
At first, I thought that my system's bizarre locale arrangements
(combined with Python's bizarre logic around encodings detection, and
maybe vim's defaults) had conspired to store nonsense in the actual
commit messages, but it turned out that the messages are intact and it
is hgwebdir which is not managing to communicate the encoding to the
browser correctly.
What appears to happen is that hgwebdir (I'm using 1.0.2, but it appears
to be the case in 1.3.1 as well) appears to emit the "Content-type"
header in the mercurial.hgweb.request.wsgirequest.httphdr method, but
omits any "charset" qualifier. Apache, I presume, then decides to
embellish the header, adding locale-related information to it,
indicating that, at least for my system, the page uses ASCII as its
encoding.
However, it does seem to be the case that the commit messages are
emitted as UTF-8 by hgwebdir, even without setting the HGENCODING
environment variable. A simple fix for this appears to be a modification
to the method mentioned above, as follows:
- headers.append(('Content-Type', type))
+ headers.append(('Content-Type', "%s; charset=UTF-8" % type))
I'm uncertain that this is a proper fix, given that I don't really know
enough about what hgwebdir or Mercurial are doing internally, but this
fixed my problem. (Maybe the HGENCODING gets propagated onto a "ctype"
variable somewhere which then has its value sent to the above method,
but I can't really tell after a couple of minutes looking.)
On another matter, I noticed that people had been discussing the Wiki
pages, with a particular mention of the hgwebdir pages and related
material. I have no problem going in and tidying these pages up if
no-one objects, and I have a more general comment to make on
Wiki-related matters. I see a lot of skepticism around Wiki solutions in
many of the projects I've been involved with, but I disagree with the
opinion that Wikis must necessarily be untidy or contain low-quality
information. No tool magically helps to make great documentation, and
despite what many champions of "gold-plated" content management systems
would have you believe, no matter how much you are able to remix
existing content, tag it, track it, present it in numerous formats or
combinations, such tools don't really help people write and edit good,
readable text.
What Wiki solutions sometimes suffer from is the "uncertain editor"
syndrome: people writing stuff like "I'm not sure about this, but it
seemed to work", leading to some kind of dialogue where you actually
want to present a coherent statement. The solution to this is not to
advocate super-advanced workflow (which in many situations guarantees
that no-one will contribute), but to have active editing by people who
are confident doing so and don't mind being corrected from time to time.
As I wrote above, if people don't mind, I can do this kind of editing in
those areas where I probably know enough not to discard useful content.
If you see me making corrections and don't like them, feel free to
revert and/or improve my efforts.
Paul
More information about the Mercurial
mailing list