internal fragmentation

a personal journal of hacking, science, and technology

Unifying the Mercurial wire protocol

Thu, 15 Jul 2010 17:18 by mpm in Uncategorized (tagged ) (link)

One Crazy Summer

The Mercurial ‘wire protocol’ is the set of commands that Mercurial uses to discover changesets to push and pull and actually exchange changesets with remote hosts. It was developed surprisingly rapidly, given that everything else in the project was also brand new:

  • April 20 2005: first Mercurial release
  • April 21 2005: static http pull support
  • May 10 2005: changeset bundle format created
  • May 11 2005: hgweb merged
  • May 12 2005: added smart http protocol to hgweb
  • May 25 2005: hg serve stand-alone server added
  • June 12 2005: crazy tunnel-http-over-ssh push hack
  • July 5 2005: ssh smart server support
  • July 6 2005: real ssh push

And that gets us up to the 0.6b release, less than three months after the 0.1 release.

Steady As She Goes

Because ssh push was good enough for most of our users at the time, not much happened with our wire protocol until about a year later when we finally got around to working on http push:

  • June 10 2006: add capability detection to protocol
  • June 15 2006: new push approach for ssh
  • June 20 2006: real http/https push
  • July 14 2006: add support for –uncompressed clones for fast networks

And again, the wire protocol hasn’t changed much since then. A branchmap command was added in May 2009 to improve named branch support, and pushkey was added a couple weeks ago to support bookmarks. We’ve also been careful to keep the protocol backwards compatible: Mercurial 0.5 can interoperate with even the most recent versions.

Preparing for the New Frontier

But there are a number of big projects on the horizon that will require extending the wire protocol, including:

  • a new, faster discovery algorithm
  • improved compression with parent deltas
  • support for shallow clones
  • support for lightweight copies

Which means its high time that the wire protocol got cleaned up! Currently adding a new command to the protocol means adding code in four different places: the ssh client code, the ssh server code, the http client code, and the http server code. And those interfaces aren’t exactly pretty. Here’s what the old ssh and http server support for lookup looks like:

ssh:
    def do_lookup(self):
        arg, key = self.getarg()
        assert arg == 'key'
        try:
            r = hex(self.repo.lookup(key))
            success = 1
        except Exception, inst:
            r = str(inst)
            success = 0
        self.respond("%s %s\n" % (success, r))

http:
def lookup(repo, req):
    try:
        r = hex(repo.lookup(req.form['key'][0]))
        success = 1
    except Exception, inst:
        r = str(inst)
        success = 0
    resp = "%s %s\n" % (success, r)
    req.respond(HTTP_OK, HGTYPE, length=len(resp))
    yield resp

Fortunately, we’ve been careful to keep the ssh and http ‘calling convention’ very similar. The bulk of the code is similar but there are some basic differences in how arguments are retrieved and how results are returned. Similar things need to be changed on the client side. Adding new commands meant writing one protocol implementation, then copy&pasting it into the other protocol and tweaking it.

Unification

So a long-standing wishlist item has been to unify these implementations, which is what I’ve been working on this week. Now the above command is implemented just once like this:

def lookup(repo, proto, key):
    try:
         r = hex(repo.lookup(key))
         success = 1
    except Exception, inst:
         r = str(inst)
         success = 0
    return "%s %s\n" % (success, r)

Note that the output is now delivered via return and the argument is now passed in as a parameter. A central dispatcher now handles argument unpacking and delivering results and has a table of all the function parameter lists (including support for variable-length argument lists!). Each function also has a ‘proto’ option which gives it access to an abstracted interface for various http- or ssh-specific methods, like doing bulk data transfers.  Similarly, the client-side support is also unified, and both the client and server support is now all in one file: wireproto.py. This will hopefully make future extensions to the wire protocol much less painful.

The tricky part of course with a big change like this is how to break it down into pieces. Simply trying to refactor it all in one go might be possible, but it’s much safer to do it in a step by step process, testing each step along the way. So I started by making a new ssh argument parser, hooking that into a generic command dispatcher, and hooking that dispatcher into the existing ssh infrastructure. Now I could move commands out of the ssh framework and into the generic framework one by one. Once I’d moved a bunch of the simpler ssh commands, I went and tackled merging them on the http side with a similar approach. Overall, the process took a total of 18 changesets.

3 Comments

  1. […] This post was mentioned on Twitter by Ollivier Robert, Matt Mackall. Matt Mackall said: A pair of #Mercurial blog posts today: http://www.selenic.com/blog/?p=650 http://www.selenic.com/blog/?p=647 […]

    Pingback by Tweets that mention Unifying the Mercurial wire protocol « internal fragmentation -- Topsy.com — Fri, 16 Jul 2010 @ 02:00

  2. Did you already consider support for batching requests? For discovery, for instance, it would be very handy to be able to batch a request for remote’s current heads with a request for a yes/no bitmap for a sample. We can use optional parameters for this, but batched separate requests and responses would be more flexible.

    Comment by Peter Arrenbrecht — Fri, 16 Jul 2010 @ 02:47

  3. Peter, I haven’t looked at batching yet. It’s doable, but it’s slightly problematic on the client side which is using an imperative model: pack arguments, call, get results, then parse. Unserializing the client methods would mean either breaking them into two parts or wrapping some crazy coroutine around them.

    Comment by mpm — Fri, 16 Jul 2010 @ 07:55

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.