a GUI front-end author's problems parsing mercurial output

Bryan O'Sullivan bos at serpentine.com
Sat May 20 14:13:57 CDT 2006


On Fri, 2006-05-19 at 11:24 -0700, Elliott Hughes wrote:
> hi. i'm the main author of the "SCM" front-end to Bazaar, BitKeeper,  
> CVS, Mercurial, and Subversion (http://software.jessies.org/scm/).

Welcome aboard :-)
  
> (the most obvious example being "hg log -v", which works, is  
> necessary for my purposes, but which isn't mentioned in the "hg log -- 
> help" output.)

Yes, someone else pointed exactly this out to me the other day.  We
think of options like "-v" as "global options", since they are available
on all commands, but it is not helpful to users to suppress them from
help output because we know they're global.

> * as far as i can tell, the character encoding is nowhere defined. i  
> haven't tested yet, but hopefully it's all UTF-8 internally? it would  
> be beneficial to tool authors if this were documented.

At the moment, it's actually all 8-bit bytes, with no encoding of any
kind.  We know we need to fix that.  The plan is to store committer
names, file names, and changeset comments internally as UTF-8, and
convert them appropriately for the user's system.

> * is there a reason why Mercurial doesn't use ISO date format (http:// 
> www.cl.cam.ac.uk/~mgk25/iso-time.html)?

I think it "just does".  However, you can use an output template to
control the format used.  This will also help you with file names that
contain spaces, and of course the output format will stay fixed.

I've attached an example that should be easy to parse.  It spits out a
record for each changeset in RFC822 header style.  Each record consists
of a series of "name: value" pairs.  Committer names and changeset
descriptions are XML-escaped, with "<br/>" as the newline separator.
Values can span multiple lines, but if they do, the second and
subsequent lines are indented.  File names are spit out one per line.
Dates are printed in ISO 8601 format.  Oh, and records are separated by
empty lines.

It's very easy to use this, and the output format will stay stable.
Just save the attachment, then "hg log --style ~/easyparse.tmpl".

> to end on a more positive note, as a long-time BitKeeper user i'm  
> impressed by Mercurial's inclusion of the "files:" header lines.

Well, hopefully you'll be even more impressed by the easily-parsed style
file :-)

	<b
-------------- next part --------------
changeset = 'rev: {rev}\nchangeset: {node}\n{tags}author: {author|escape|addbreaks|tabindent}\ndate: {date|isodate}\ndesc: {desc|strip|escape|addbreaks|tabindent}\n{files}{file_adds}{file_dels}\n'
start_tags = 'tags:'
tag = '\t{tag}\n'
start_files = 'modified:'
file = '\t{file}\n'
start_file_adds = 'added:'
file_add = '\t{file_add}\n'
start_file_adds = 'removed:'
file_del = '\t{file_del}\n'


More information about the Mercurial mailing list