Regular repository corruption -- help needed.

Alexander Krauss krauss at in.tum.de
Wed Dec 19 17:50:39 CST 2012


Dear list,

In the Isabelle project (http://isabelle.in.tum.de) we've been using
Mercurial without problems since 2008, but since this summer we are
experiencing regular corruption of our central push/pull area.

I am looking for help on how to investigate this issue, which happens
sporadically but often enough to be really worrying, since we must
re-clone the whole repository when it happens -- a stop-the-world
administrative operation.

The setup:

- The central repository sits on an NFS mount, which is accessed from
   a number of machines. (I know that this is not nice, but it is not
   so easy to change at the moment.)

- Developers usually push via ssh, connecting to one of the machines
   which has access to the NFS mount, i.e.:

      hg push ssh://somemachine//nfs/central/repos

   but today I have seen the issue occur also on a plain local push.

- Before the push, the repository is ok, and afterwards it is
   corrupted:

     $ hg log
     abort: integrity check failed on 00changelog.i:50603!

   hg verify displays a "first damaged changeset" n.  Here,
   n is a revision that was already present before the push, not just a
   newly pushed revision.
   We must then re-clone up to revision n - 1.

- For analysis, I can provide tarballs (130M each) of

  (a) the corrupted repository:
      http://www21.in.tum.de/~krauss/isabelle-corrupt-2012-12-19.tar.gz
  (b) the (intact) origin of the push:
      http://www21.in.tum.de/~krauss/isabelle-push-origin-2012-12-19.tar.gz

   Unfortunately, I do not have the original intact state of the push 
destination anymore.

- Due to the NFS, concurrent operations may be part of the
   problem. However, I am rather sure that there were no concurrent
   push or other write attempts. But some automated tools regularly pull 
from this source.


- Some more info:

  - hg version: 2.4, Python 2.7.3, Linux 3.6.10 (some SuSE version)
  - We have an older repository format:

   $ cat /nfs/central/repos/.hg/requires
   revlogv1
   fncache
   store

  - Active extensions from ~/.hgrc

   [extensions]
   extdiff =
   transplant=
   color =
   hgext.graphlog =
   hgext.record =
   hgext.convert=
   mq =
   share =


I appreciate any help on how to get to the source of the problem.

We are also looking into moving to a hosting service like Bitbucket,
to eliminate potential NFS issues, but nevertheless, I think this
issue is worth pursuing on its own.

Alex


More information about the Mercurial mailing list