hgwebdir, Apache, CGI and character encodings (and the Wiki)

Paul Boddie paul.boddie at biotek.uio.no
Tue Nov 10 11:34:38 CST 2009


Hello,

I've recently been deploying another hgwebdir instance and I noticed 
that the non-ASCII characters in the commit messages were very broken. 
At first, I thought that my system's bizarre locale arrangements 
(combined with Python's bizarre logic around encodings detection, and 
maybe vim's defaults) had conspired to store nonsense in the actual 
commit messages, but it turned out that the messages are intact and it 
is hgwebdir which is not managing to communicate the encoding to the 
browser correctly.

What appears to happen is that hgwebdir (I'm using 1.0.2, but it appears 
to be the case in 1.3.1 as well) appears to emit the "Content-type" 
header in the mercurial.hgweb.request.wsgirequest.httphdr method, but 
omits any "charset" qualifier. Apache, I presume, then decides to 
embellish the header, adding locale-related information to it, 
indicating that, at least for my system, the page uses ASCII as its 
encoding.

However, it does seem to be the case that the commit messages are 
emitted as UTF-8 by hgwebdir, even without setting the HGENCODING 
environment variable. A simple fix for this appears to be a modification 
to the method mentioned above, as follows:

-            headers.append(('Content-Type', type))
+            headers.append(('Content-Type', "%s; charset=UTF-8" % type))

I'm uncertain that this is a proper fix, given that I don't really know 
enough about what hgwebdir or Mercurial are doing internally, but this 
fixed my problem. (Maybe the HGENCODING gets propagated onto a "ctype" 
variable somewhere which then has its value sent to the above method, 
but I can't really tell after a couple of minutes looking.)

On another matter, I noticed that people had been discussing the Wiki 
pages, with a particular mention of the hgwebdir pages and related 
material. I have no problem going in and tidying these pages up if 
no-one objects, and I have a more general comment to make on 
Wiki-related matters. I see a lot of skepticism around Wiki solutions in 
many of the projects I've been involved with, but I disagree with the 
opinion that Wikis must necessarily be untidy or contain low-quality 
information. No tool magically helps to make great documentation, and 
despite what many champions of "gold-plated" content management systems 
would have you believe, no matter how much you are able to remix 
existing content, tag it, track it, present it in numerous formats or 
combinations, such tools don't really help people write and edit good, 
readable text.

What Wiki solutions sometimes suffer from is the "uncertain editor" 
syndrome: people writing stuff like "I'm not sure about this, but it 
seemed to work", leading to some kind of dialogue where you actually 
want to present a coherent statement. The solution to this is not to 
advocate super-advanced workflow (which in many situations guarantees 
that no-one will contribute), but to have active editing by people who 
are confident doing so and don't mind being corrected from time to time.

As I wrote above, if people don't mind, I can do this kind of editing in 
those areas where I probably know enough not to discard useful content. 
If you see me making corrections and don't like them, feel free to 
revert and/or improve my efforts.

Paul


More information about the Mercurial mailing list