Proposal for detecting history rewriting on shared repos
Gregory Szorc
gregory.szorc at gmail.com
Wed Feb 12 18:20:28 CST 2014
The share extension and workflow is very fragile. If rewriting occurs on
the original repository, there's a good chance shared clones of that
repo will get corrupted. While there is a giant warning in the output of
`hg help share` to warn you about this, Mercurial currently offers
little to no assistance to detect and recover from this.
At Mozilla, our automation was apparently deleting the original source
repo and re-cloning/pulling it. Clones from the original repo still
seemed to work. But under the right circumstances those clones were
getting corrupted - complaining about missing parents, presumably due to
revlogs not matching exactly.
I propose that Mercurial be a bit more robust detecting known issues
with the share extension.
I propose that we introduce a .hg/storeid (or similar) file whose
content is a randomly-generated value. Let's say a UUID. We call this
file/content the "store ID."
The store ID is created at clone time or when a client opens a local
repo that doesn't have a store ID.
The store ID is *not* transferred between repos at clone time. Instead,
each repo has distinct store IDs. Even local filesystem clones will have
separate store IDs.
The store ID is changed whenever the store experiences revlog rewriting
that isn't a transaction rollback. The store ID is thus a guarantee that
any seen revisions in all committed revlogs in the store still exist. If
the store ID changes, all bets about preexisting revlog content are off.
The intended use case for the store ID is for shared repos to detect
incompatible revlog/store changes (but it may not be limited to that -
anyone have other ideas?). When a repo with .hg/sharedpath is opened,
Mercurial will compare the store ID for both repositories. If there is a
mismatch, the client will abort immediately (presumably with an error
message explaining what happened). This will enable clients to
proactively detect bad practices and stop the world from exploding.
The store ID is backwards compatible. Old clients will simply ignore the
existence of this file.
There is a slight performance penalty to this feature, as opening a
shared repo will need to read the contents of 2 new files. But those
files should be small and serviced from the page cache, so I don't see a
major problem here.
The 1 edge case I can think of is how to deal with opening a shared repo
where both repos don't have a store ID. I reckon we should let that
proceed silently, or perhaps with a one-time notification. The reasoning
is that's what old clients do, it's been good enough so far, so why
change it.
If adopted, this proposal could be implemented in 2 major phases. Phase
1 involves creating store IDs and consulting them in the shared repo
open code path. Phase 2 involves adding code to modify the store ID when
revlog rewriting occurs. There are immediate benefits to just phase 1.
There are incremental benefits to making the feature/detection more
robust through phase 2.
Thoughts?
Gregory Szorc
gregory.szorc at gmail.com
More information about the Mercurial-devel
mailing list