Created on 2007-05-01.16:26:21 by ThomasAH, last changed 2008-05-07.00:29:21 by mpm.
| msg6016 (view) |
Author: mpm |
Date: 2008-05-07.00:29:19 |
|
Changing subject/priority.
|
| msg4751 (view) |
Author: ThomasAH |
Date: 2007-12-30.18:50:30 |
|
Assigning to mpm to coordinate further fixing.
|
| msg3185 (view) |
Author: ThomasAH |
Date: 2007-06-09.09:21:36 |
|
With current crew tip (b51a8138292a) I get the errors reported in msg3043 and
msg3047 (see superseded issue585).
|
| msg3052 (view) |
Author: mpm |
Date: 2007-05-02.13:58:37 |
|
While this is a good patch, it's by no means complete.
Pulls can take an arbitrarily long time. If we begin a pull, start reading
changelogs, manifests, and files, and another user does rollback (by failed
commit or otherwise) somewhere in the middle, it will potentially hit files that
pull is reading.
The only way to be sure this hasn't happened is to make sure the tip at the
start of the pull still exists in the revlog at the end.
Actually, even this isn't foolproof. A rollback that removes a commit followed
by an identical commit can cause a pull that appears to be complete but is
missing some file revisions.
|
| msg3051 (view) |
Author: alexis |
Date: 2007-05-02.07:12:57 |
|
We may indeed have some problems if somebody does a pull from a repo that is in
the middle of a rollback, but this is not the problem here - the rollback
happens in the local repo.
As a fix for issue132, we've started rereading the changelog and the manifest
when we grab the repo lock. To avoid unnecessary work, we check whether it
looks like these revlogs have changed. Right now this check consists of doing a
stat and comparing st_dev, st_ino, st_mtime and st_ctime with the values we
saved earlier. If they all match, we skip the reloading (this is revlog.load in
mercurial/revlog.py).
With Brendan's patch, the commit and the rollback end up happening in the same
second, and so we're unable detect any change by looking just at mtime and ctime.
Changing revlog.load to also compare st_size (see below) fixes this case and the
2-integrity-errors case from crew-stable. I haven't seen the 4-integrity-errors
version.
BTW, another way to make this easier to reproduce is to change the sleeps to
something less than a second (you'll need a non-ancient version of coreutils).
Changing the hook sleep to 0.3 and the other one to 0.2 makes it work for me
60-70% of the time (which is expected).
diff -r 15289406f89c mercurial/revlog.py
--- a/mercurial/revlog.py Tue May 01 13:23:32 2007 -0700
+++ b/mercurial/revlog.py Wed May 02 04:07:01 2007 -0300
@@ -347,7 +347,8 @@ class revlog(object):
if (oldst and st.st_dev == oldst.st_dev
and st.st_ino == oldst.st_ino
and st.st_mtime == oldst.st_mtime
- and st.st_ctime == oldst.st_ctime):
+ and st.st_ctime == oldst.st_ctime
+ and st.st_size == oldst.st_size):
return
self.indexstat = st
if len(i) > 0:
|
| msg3050 (view) |
Author: mpm |
Date: 2007-05-01.22:20:59 |
|
The diagnosis is that rollback races with pull. As pull is a reader, it takes no
locks, so it's hard to prevent.
One solution is to have pull check that the changelog tip didn't vanish at the
end, and if it did, abort the changegroup.
|
| msg3049 (view) |
Author: mpm |
Date: 2007-05-01.22:15:13 |
|
Brendan managed to make it immediately reproduceable with this patch:
diff -r e33ad7cea15f tests/test-clone-pull-corruption
--- a/tests/test-clone-pull-corruption Wed Apr 25 13:14:01 2007 -0700
+++ b/tests/test-clone-pull-corruption Tue May 01 14:13:51 2007 -0700
@@ -15,8 +15,9 @@ hg ci -m 'change foo'
# Add a hook to wait 5 seconds and then abort the commit
cd ../corrupted
+mkfifo pwatch
echo '[hooks]' >> .hg/hgrc
-echo 'pretxncommit = sleep 5; exit 1' >> .hg/hgrc
+echo 'pretxncommit = echo foo > pwatch; while ! test -s log; do true; done;
exit 1' >> .hg/hgrc
# start a commit...
touch bar
@@ -24,8 +25,8 @@ hg ci -m 'add bar' &
hg ci -m 'add bar' &
# ... and start a pull while the commit is still running
-sleep 1
-hg pull ../source 2>/dev/null
+cat pwatch > /dev/null
+hg pull ../source > log 2> /dev/null &
# see what happened
wait
|
| msg3048 (view) |
Author: mpm |
Date: 2007-05-01.20:51:28 |
|
Can you try adding a sleep before the verify? If it is a race here, this might
make it disappear.
Also, can you try on tmpfs? This should rule out a large set of filesystem issues.
|
| msg3047 (view) |
Author: ThomasAH |
Date: 2007-05-01.20:43:08 |
|
I still get failures with 47ef2d228375 (which holds lock on verify):
ERROR: test-clone-pull-corruption output changed
--- Expected output
+++ Test output
@@ -7,10 +7,15 @@
adding changesets
adding manifests
adding file changes
-added 1 changesets with 1 changes to 1 files
-(run 'hg update' to get a working copy)
+added 1 changesets with 1 changes to 1 files (+1 heads)
+(run 'hg heads' to see heads, 'hg merge' to merge)
checking changesets
+incorrect link (2) for changeset revision 1
+unpacking changeset 6fa9e873119c: list index out of range
checking manifests
+bad manifest link (2) at revision 1
crosschecking files in changesets and manifests
checking files
+foo:4c6c04b2bf00 points to unexpected changeset 2
1 files, 2 changesets, 2 total revisions
+4 integrity errors encountered!
ERROR: test-clone-pull-corruption failed with error code 256
|
| msg3046 (view) |
Author: mpm |
Date: 2007-05-01.19:12:01 |
|
Perhaps rollback is racing with verify? Verify should hold the write lock to
ensure the repo is quiescent when we're checking it.
--- a/mercurial/verify.py Sun Apr 29 12:33:24 2007 -0500
+++ b/mercurial/verify.py Tue May 01 14:13:40 2007 -0500
@@ -16,6 +16,8 @@ def verify(repo):
errors = [0]
warnings = [0]
neededmanifests = {}
+
+ lock = repo.lock()
def err(msg):
repo.ui.warn(msg + "\n")
|
| msg3045 (view) |
Author: ThomasAH |
Date: 2007-05-01.18:58:22 |
|
<mpm> Failure happened on 2.6.16-xen SMP with Python 2.4 here.
<mpm> 256MB of RAM in dom0.
|
| msg3044 (view) |
Author: ThomasAH |
Date: 2007-05-01.18:57:52 |
|
mpm hit this problem (4 integrity errors on ext3 on 2.6.16) when generating
artificial load in 1/6400 runs of this test.
Setting to need-eg until there are new ideas or other boxes where it fails more
than once in a million.
|
| msg3043 (view) |
Author: ThomasAH |
Date: 2007-05-01.17:58:57 |
|
The previous runs were on XFS (Linux kernel 2.6.18), the following happened with
ext3 on the same machine (this time python 2.4 instead of python 2.3, but the
other problems occured with both versions, too):
ERROR: test-clone-pull-corruption output changed
--- Expected output
+++ Test output
@@ -7,10 +7,17 @@
adding changesets
adding manifests
adding file changes
-added 1 changesets with 1 changes to 1 files
-(run 'hg update' to get a working copy)
+added 1 changesets with 1 changes to 1 files (+1 heads)
+(run 'hg heads' to see heads, 'hg merge' to merge)
checking changesets
+incorrect link (2) for changeset revision 1
+unpacking changeset 4d5da2c89859: list index out of range
checking manifests
+bad manifest link (2) at revision 1
+reading delta for manifest d8265428f0b7: list index out of range
crosschecking files in changesets and manifests
checking files
+foo: 1:4c6c04b2bf00 not in manifests
+foo:4c6c04b2bf00 points to unexpected changeset 2
1 files, 2 changesets, 2 total revisions
+6 integrity errors encountered!
ERROR: test-clone-pull-corruption failed with error code 256
|
| msg3042 (view) |
Author: ThomasAH |
Date: 2007-05-01.17:56:19 |
|
The last of the three output changes is because I changed sleep 5 in the test to
sleep 2, please ignore.
|
| msg3040 (view) |
Author: ThomasAH |
Date: 2007-05-01.16:26:17 |
|
Probably related to issue132:
Wenn running 100 tests (all or only test-clone-pull-corruption doesn't matter) I
get the following problem in crew (tested with a6436e9333db) in 0-8% of the runs:
ERROR: test-clone-pull-corruption output changed
--- Expected output
+++ Test output
@@ -7,10 +7,15 @@
adding changesets
adding manifests
adding file changes
-added 1 changesets with 1 changes to 1 files
-(run 'hg update' to get a working copy)
+added 1 changesets with 1 changes to 1 files (+1 heads)
+(run 'hg heads' to see heads, 'hg merge' to merge)
checking changesets
+incorrect link (2) for changeset revision 1
+unpacking changeset 6f47a25083d9: list index out of range
checking manifests
+bad manifest link (2) at revision 1
crosschecking files in changesets and manifests
checking files
+foo:4c6c04b2bf00 points to unexpected changeset 2
1 files, 2 changesets, 2 total revisions
+4 integrity errors encountered!
ERROR: test-clone-pull-corruption failed with error code 256
With 80d3f6f0d8e5 (crew-stable) I get different problems even less common
(between 0.1% and 1% of all runs):
ERROR: test-clone-pull-corruption output changed
--- Expected output
+++ Test output
@@ -11,6 +11,9 @@
(run 'hg update' to get a working copy)
checking changesets
checking manifests
+reading delta for manifest d8265428f0b7: list index out of range
crosschecking files in changesets and manifests
checking files
+foo: 1:4c6c04b2bf00 not in manifests
1 files, 2 changesets, 2 total revisions
+2 integrity errors encountered!
ERROR: test-clone-pull-corruption failed with error code 256
ERROR: test-clone-pull-corruption output changed
--- Expected outputĀ·
+++ Test outputĀ·
@@ -2,15 +2,10 @@
pulling from ../source
abort: pretxncommit hook exited with status 1
transaction abort!
+searching for changes
rollback completed
-searching for changes
-adding changesets
-adding manifests
-adding file changes
-added 1 changesets with 1 changes to 1 files
-(run 'hg update' to get a working copy)
checking changesets
checking manifests
crosschecking files in changesets and manifests
checking files
-1 files, 2 changesets, 2 total revisions
+1 files, 1 changesets, 1 total revisions
|
|
| Date |
User |
Action |
Args |
| 2008-05-07 00:29:21 | mpm | set | priority: urgent -> bug nosy:
mpm, ThomasAH, tonfa, brendan, alexis, DomQ messages:
+ msg6016 title: Errors in test-clone-pull-corruption -> Rollback can race with pull |
| 2007-12-30 18:50:31 | ThomasAH | set | nosy:
mpm, ThomasAH, tonfa, brendan, alexis, DomQ messages:
+ msg4751 assignedto: ThomasAH -> mpm |
| 2007-12-08 02:43:29 | mpm | set | status: in-progress -> chatting nosy:
mpm, ThomasAH, tonfa, brendan, alexis, DomQ |
| 2007-10-19 10:44:28 | DomQ | set | nosy:
+ DomQ |
| 2007-06-22 20:30:42 | mpm | unlink | issue132 superseder |
| 2007-06-22 20:30:17 | mpm | link | issue132 superseder |
| 2007-06-09 09:21:36 | ThomasAH | set | nosy:
mpm, ThomasAH, tonfa, brendan, alexis messages:
+ msg3185 |
| 2007-06-09 09:19:23 | ThomasAH | link | issue585 superseder |
| 2007-05-02 13:58:38 | mpm | set | nosy:
mpm, ThomasAH, tonfa, brendan, alexis messages:
+ msg3052 |
| 2007-05-02 07:12:58 | alexis | set | nosy:
mpm, ThomasAH, tonfa, brendan, alexis messages:
+ msg3051 |
| 2007-05-01 22:21:20 | mpm | set | status: need-eg -> in-progress nosy:
mpm, ThomasAH, tonfa, brendan, alexis |
| 2007-05-01 22:20:59 | mpm | set | nosy:
mpm, ThomasAH, tonfa, brendan, alexis messages:
+ msg3050 |
| 2007-05-01 22:15:16 | mpm | set | nosy:
mpm, ThomasAH, tonfa, brendan, alexis messages:
+ msg3049 |
| 2007-05-01 20:51:28 | mpm | set | nosy:
mpm, ThomasAH, tonfa, brendan, alexis messages:
+ msg3048 |
| 2007-05-01 20:43:22 | ThomasAH | set | status: testing -> need-eg nosy:
mpm, ThomasAH, tonfa, brendan, alexis |
| 2007-05-01 20:43:08 | ThomasAH | set | nosy:
mpm, ThomasAH, tonfa, brendan, alexis messages:
+ msg3047 |
| 2007-05-01 19:12:01 | mpm | set | status: need-eg -> testing nosy:
mpm, ThomasAH, tonfa, brendan, alexis messages:
+ msg3046 |
| 2007-05-01 18:58:22 | ThomasAH | set | nosy:
mpm, ThomasAH, tonfa, brendan, alexis messages:
+ msg3045 |
| 2007-05-01 18:57:52 | ThomasAH | set | status: chatting -> need-eg nosy:
mpm, ThomasAH, tonfa, brendan, alexis messages:
+ msg3044 assignedto: ThomasAH |
| 2007-05-01 17:58:57 | ThomasAH | set | nosy:
mpm, ThomasAH, tonfa, brendan, alexis messages:
+ msg3043 |
| 2007-05-01 17:56:19 | ThomasAH | set | status: unread -> chatting nosy:
mpm, ThomasAH, tonfa, brendan, alexis messages:
+ msg3042 |
| 2007-05-01 16:26:21 | ThomasAH | create | |
|