Issue1320

Title Race condition pulling over HTTP during push: 'abort: 00changelog.i@...: unknown parent!'
Priority bug Status in-progress
Superseder Nosy List abuehl, djc, gozer, jglick, mpm, mzlamal, pmezard, tonfa
Assigned To tonfa Topics http_proto

Created on 2008-10-01.19:36:25 by jglick, last changed 2008-10-29.11:44:02 by djc.

Messages
msg7660 (view) Author: tonfa Date: 2008-10-23.22:18:35
server fix in crew: 9f0e52e1df77

bad news is that it has some overheads, we will be sending unneeded nodes (or
whole branches, if only a branch is not up-to-date)

using -r (but then it's more expensive for the server to compute it), or
separate repos per branch will avoid this.
msg7623 (view) Author: tonfa Date: 2008-10-21.15:25:40
Fix for the serverside:
http://www.bitbucket.org/bboissin/mercurial-crew-tonfa-testing/changeset/5efa5abd797b/

for the client side, the simple fix is to use changegroupsubset with
remote.heads() as heads.
msg7342 (view) Author: tonfa Date: 2008-10-08.12:27:03
testcase:
hg init master
cd master
echo 0 > 0
hg ci -Am 0
echo 1 > 1
hg ci -Am 1
echo 2 > 2
hg ci -Am 2
hg up 1
echo 3 > 3
hg ci -Am 3
hg merge
hg ci -m 4
hg up 0
echo 5 > 5
hg ci -Am 5
hg merge -r 2
hg ci -m 6
hg merge
hg ci -m 7
cd ..
hg clone -r 0 master client # it's important to choose the latest branch
hg clone -r 4 master server
cd master

Then you have to 'hg pull ../master' from server between the findincoming and
changegroup.
I would appreciate any help to automate that (hooks, proxy, etc).
msg7316 (view) Author: djc Date: 2008-10-06.11:36:01
I believe this has been biting the Mozilla project, as well.
msg7295 (view) Author: mpm Date: 2008-10-02.23:47:37
We've already got the protocol capability to do this right, as shown by the -r
tip trick.

To turn this into a solution we've got to teach the client to use -r tip (fix
the bug) and teach the server to optimize when the passed revision is the tip
(keep it fast).
msg7294 (view) Author: tonfa Date: 2008-10-02.23:30:00
The problem is that changegroup() takes as argument "basenode" which are the
"lowest" nodes from the server which aren't in the client.
If the DAG changes (with a new branch) between the discovery and the
changegroup() call, then the wrong changegroup will be transfered (with some
missing nodes).

A way to mitigate this is to send the "lowest" common nodes instead (but then
the server will send some useless stuff, unless the semantic of basenode changes
in the server, maybe via a capability).
msg7293 (view) Author: jglick Date: 2008-10-02.23:23:42
The repository is publicly available, so you should be able to reproduce using
the information here. Probably just need to

hg clone -r 5b3f13f2bbd5 -U http://hg.netbeans.org/main/ master
hg clone -r 676790ff94c9 -U master client
hg clone -r 692ee115553c -U master server
hg -R server serve &
hg -R client in server &
# wait for pause to start
hg -R server pull master
# wait for pause to end
msg7292 (view) Author: jglick Date: 2008-10-02.23:18:27
The same bug occurs when you 'hg pull' during the push (but not when you 'hg
pull -r tip'). Confirmed also with current crew sources.
msg7291 (view) Author: jglick Date: 2008-10-02.23:07:15
After adding a similar pause to changegroupsubset, and using hg in -r tip from
the client, I see no problem even if I push during the pause: the incoming list
ends at the old tip.

So if I can summarize so far: httprepo.changegroup is vulnerable to producing
changesets with missing parents in case it is called with the wrong bases.
msg7290 (view) Author: mpm Date: 2008-10-02.22:57:24
Yep, this makes sense. Try your test again with in -r tip, please, and see if it
goes away?
msg7289 (view) Author: jglick Date: 2008-10-02.22:40:39
The bug can also be reproduced using Hg 1.0.2 for both client and server.
msg7288 (view) Author: jglick Date: 2008-10-02.22:37:52
Hah! Finally I can reproduce. Here is how:

1. Set "server" repo to have 692ee115553c as tip, and start HTTP service.

2. Insert time.sleep(60) to beginning of httprepository.changegroup.

3. Initiate pull from the "client". It will get through all but the last HTTP
request, then pause.

4. Pull up to 5b3f13f2bbd5 in the "server" repo.

5. Wait for the client to resume. It aborts with

abort: 00changelog.i@61eac836a4a7: unknown parent!

(as before, the node mentioned in the error is actually wrong, because it is the
second parent which is unknown).
msg7287 (view) Author: jglick Date: 2008-10-02.22:22:33
The actual log is identical to that produced by the simulation in which the new
changegroup does not exist on the server at all - but the logs only reflect the
requests, not the responses. (The actual log seems to include content lengths
but the simulation does not appear to record these.)

Perhaps the final cmd=changegroup is run after the new changelog has been
written and thus gets unexpected stuff back, with some gaps? This would be
consistent with the observed behavior: that the missing changesets are those
branches whose roots are older than the old tip.
msg7286 (view) Author: jglick Date: 2008-10-02.22:15:38
For comparison, I have captured HTTP logs from two cases: when the new tip
exists on the server, and when only the old tip exists on the server. This is
using my local setup.

New tip exists (my original failed attempt to reproduce):

"GET /?cmd=heads HTTP/1.1" 200 -
"GET /?nodes=5b3f13f2bbd51c50a0c92d1751b57647d29b8069&cmd=branches HTTP/1.1" 200 -
"GET
/?nodes=abad5bf2c02145c223c026c54729db8a75773776+692ee115553c669056d036f631ed2faafde789a6&cmd=branches
HTTP/1.1" 200 -
"GET
/?nodes=3c053211e51d615818b8c40e5fe16008cb25915f+f72cedf5282c1d99f17d5de5dd377cd4517a8434+5f9624d7ee3900b937c145a9c2df3c13c9f43527+1720ab541c54ff09c08157fc0c81ee4d123c1126&cmd=branches
HTTP/1.1" 200 -
"GET
/?nodes=2772746bd9aadc60fd0c96a86498c2a25635819b+3be507481f44fff2db24f3f8ae738dcdf3ca9471+d884e50453271b47d0a144dfb5ad6a9d7eed8a36+5bcbaa9f084260063eb91970c2e15bd4f6cdc345+61eac836a4a781c99c9df607de2eb86bf501fc7d+107ef09635dcf2face2100551af8c64518079642+1d317a285f1b3169ec6ec958591e45590da79b3a+10a661711df64bf657b103fc93defb59f226b5fe&cmd=branches
HTTP/1.1" 200 -
"GET
/?nodes=dae79924bc41cc4a83d6dc071dd32a09881566c9+aa7d20707f8caf6b26281319a9cc68a8ae73ac27+6e60f34b2ce6625e67b45a239f712c2207df8497+d434f3aeb79dff0722c02be145d8318d144e0e3b+67ba2291a73d3f8b2729ce4253f3ed9ab2870a70+5b66c8fb328dd6f1e240a28ecf0892dc33e0bea5+388002a6ffc0deea4c73fc2e6f3b4a978056b656+122bb32e53b37de68c435aaf3bd7cbf6602fe4cd+cdc4f8a1ce82f6dd58235ef298662ac287c20499+7f7fda77fa80c1e95710570d08510a0e0c218790&cmd=branches
HTTP/1.1" 200 -
"GET
/?nodes=2c9472137fd4a9d5727df20d9b217efb5c6736eb+c56e848747da2319e87fa45abd39274c2268f8f8+703c9ac71b8efa81249a92007b3e677487f7f663+7fdb147d60d348caebe71df0f7c8be6bf7c8fa51+14294176953ebded8f059d988efc50770777eea2+f2842e815d1203f2666175cad711a467fb8c6576+863806dc2bb868942f86c7567af43be6dd952311+b931272d3ee9708e6ec5cef90a0d7d634a4a5498+a1b22f09c4372c01ae3fb4d1f539c9eee504a587&cmd=branches
HTTP/1.1" 200 -
"GET
/?nodes=6383d321e6d9ec2d32e76e6df5c52cb6ca71cb9f+06c92eb9658163cf1c37a75075f9cd45b0419b45+1c920dd80243eb218d425f8c66a2b76d09f0140d+64dd033827227a57657edab4e1fb2aea98fb08d1+b8b73f2977cf6b235be0be8fb73c9125dd88499e+0db6812b8e2eb5fc2a55e2fcff7c4ff5ef64c32a+8af88f7a8d7c2ced4c7fd3ef439c03bc618c2f62+908eac84bcea5951ea95bb6ab59652d529ec1d15+4651aa77a1d1fa653b08562aa8c4c8838a00a2a8+327faaf85fbc794f09eb3ae49e388f7f0c3eb19b&cmd=branches
HTTP/1.1" 200 -
"GET
/?nodes=5dd024e5c6c26017e13cbe557de6d90cde40c16d+4e27258cc1d526b53ccac0cde0fb088e6c17f49f+eb2d62d9c4f573b307246897c13979dd63571e1f+deb6eba4e936326c0bdc86659538c3b08d6769b4+5923965ae2ff490c29bd43197a061dbbf3f6544d+0e958cb7d52fd524c9b58f70c7a2adfbb90892a1&cmd=branches
HTTP/1.1" 200 -
"GET
/?nodes=aeeebce2f5953d3647b826de59599bd4abade921+b4c0af16c47ed59432a5ce4993eca5db41c1452c+5bb4de0b9c5293f6f74eddb93735eaab72e30b09+dabaacb8e0c288aa1240520245acc45224c5d1d2+98ea797daf11b7cdd11199bc18be0e5709c5a0ef+59413eb2e377ec5e3eb931ee0aac693f305db79f&cmd=branches
HTTP/1.1" 200 -
"GET
/?nodes=cb7bbce9541c29aac25629220da80adf60d3d3fa+bc338d826981c327160b3629d8b5d214b910fdb3&cmd=branches
HTTP/1.1" 200 -
"GET
/?pairs=5bcbaa9f084260063eb91970c2e15bd4f6cdc345-6e142979c5ab567ca57a3160e70fd83ac1a3b47a&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=dae79924bc41cc4a83d6dc071dd32a09881566c9-185e296b11696ff8b477be86a4fde46045429ac3&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=d434f3aeb79dff0722c02be145d8318d144e0e3b-435447e48d7d7a760c70e64bf87ac05e44c344fc&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=122bb32e53b37de68c435aaf3bd7cbf6602fe4cd-28e57d97969d69ee3f726e200aa6291804e82a74&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=2c9472137fd4a9d5727df20d9b217efb5c6736eb-c2bc404050fb7fe5aecd1ba48a3560f413581e1c&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=c56e848747da2319e87fa45abd39274c2268f8f8-dbb25b77ddc0ef24f360e0d24147dd9ee307aad6&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=f2842e815d1203f2666175cad711a467fb8c6576-28e57d97969d69ee3f726e200aa6291804e82a74&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=a1b22f09c4372c01ae3fb4d1f539c9eee504a587-dc38af424bca1fa9a49d7f392a158516bdd90f90&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=6383d321e6d9ec2d32e76e6df5c52cb6ca71cb9f-5b8870a6d50bbd7b38d51c75b5c39a0e9a61e7e9&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=8af88f7a8d7c2ced4c7fd3ef439c03bc618c2f62-a33bf42d4a7d70d6140aff1d1db8d125c62e0c85&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=327faaf85fbc794f09eb3ae49e388f7f0c3eb19b-cf311c27f93ae367429b617340a67f68df7f0e9a&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=eb2d62d9c4f573b307246897c13979dd63571e1f-b05558ed4c777d403cfd6449ad46ebe2d0352be9&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=deb6eba4e936326c0bdc86659538c3b08d6769b4-af222f6260c1fa188f36b12dee95148ff95b2047&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=aeeebce2f5953d3647b826de59599bd4abade921-5b8870a6d50bbd7b38d51c75b5c39a0e9a61e7e9&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=59413eb2e377ec5e3eb931ee0aac693f305db79f-6a65498eee9aa533a009fe95c007526fec270887&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=bc338d826981c327160b3629d8b5d214b910fdb3-bf2f77f62abe8b6d14f0a37dce8d805dde318400&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=031c97aab2a42be74fb5de5411afdd55f63fc553-185e296b11696ff8b477be86a4fde46045429ac3&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=100920f6c9fd5c562e74d8020901236e576e93ff-054baa9e4c656f589045e2586490b92b7574be44&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=60dd113a00d0e58c329d27cca630cdda7d6b99ee-5b8870a6d50bbd7b38d51c75b5c39a0e9a61e7e9&cmd=between
HTTP/1.1" 200 -
"GET
/?cmd=changegroup&roots=0009c898252fb0ad1c67b7b48c00485e09dd9e6b+8af88f7a8d7c2ced4c7fd3ef439c03bc618c2f62+031c97aab2a42be74fb5de5411afdd55f63fc553+c56e848747da2319e87fa45abd39274c2268f8f8+deb6eba4e936326c0bdc86659538c3b08d6769b4+5bcbaa9f084260063eb91970c2e15bd4f6cdc345+bc338d826981c327160b3629d8b5d214b910fdb3+2c9472137fd4a9d5727df20d9b217efb5c6736eb+a1b22f09c4372c01ae3fb4d1f539c9eee504a587+eb2d62d9c4f573b307246897c13979dd63571e1f+122bb32e53b37de68c435aaf3bd7cbf6602fe4cd+59413eb2e377ec5e3eb931ee0aac693f305db79f+60dd113a00d0e58c329d27cca630cdda7d6b99ee+f2842e815d1203f2666175cad711a467fb8c6576+327faaf85fbc794f09eb3ae49e388f7f0c3eb19b
HTTP/1.1" 200 -

Just the old tip exists, i.e. push has not occurred:

"GET /?cmd=heads HTTP/1.1" 200 -
"GET /?nodes=692ee115553c669056d036f631ed2faafde789a6&cmd=branches HTTP/1.1" 200 -
"GET
/?nodes=5f9624d7ee3900b937c145a9c2df3c13c9f43527+1720ab541c54ff09c08157fc0c81ee4d123c1126&cmd=branches
HTTP/1.1" 200 -
"GET
/?nodes=61eac836a4a781c99c9df607de2eb86bf501fc7d+107ef09635dcf2face2100551af8c64518079642+1d317a285f1b3169ec6ec958591e45590da79b3a+10a661711df64bf657b103fc93defb59f226b5fe&cmd=branches
HTTP/1.1" 200 -
"GET
/?nodes=d884e50453271b47d0a144dfb5ad6a9d7eed8a36+5bcbaa9f084260063eb91970c2e15bd4f6cdc345+67ba2291a73d3f8b2729ce4253f3ed9ab2870a70+5b66c8fb328dd6f1e240a28ecf0892dc33e0bea5+388002a6ffc0deea4c73fc2e6f3b4a978056b656+122bb32e53b37de68c435aaf3bd7cbf6602fe4cd+cdc4f8a1ce82f6dd58235ef298662ac287c20499+7f7fda77fa80c1e95710570d08510a0e0c218790&cmd=branches
HTTP/1.1" 200 -
"GET
/?nodes=6e60f34b2ce6625e67b45a239f712c2207df8497+d434f3aeb79dff0722c02be145d8318d144e0e3b+703c9ac71b8efa81249a92007b3e677487f7f663+7fdb147d60d348caebe71df0f7c8be6bf7c8fa51+14294176953ebded8f059d988efc50770777eea2+f2842e815d1203f2666175cad711a467fb8c6576+863806dc2bb868942f86c7567af43be6dd952311+b931272d3ee9708e6ec5cef90a0d7d634a4a5498+a1b22f09c4372c01ae3fb4d1f539c9eee504a587&cmd=branches
HTTP/1.1" 200 -
"GET
/?nodes=c56e848747da2319e87fa45abd39274c2268f8f8+6383d321e6d9ec2d32e76e6df5c52cb6ca71cb9f+06c92eb9658163cf1c37a75075f9cd45b0419b45+1c920dd80243eb218d425f8c66a2b76d09f0140d+64dd033827227a57657edab4e1fb2aea98fb08d1+b8b73f2977cf6b235be0be8fb73c9125dd88499e+0db6812b8e2eb5fc2a55e2fcff7c4ff5ef64c32a+8af88f7a8d7c2ced4c7fd3ef439c03bc618c2f62+908eac84bcea5951ea95bb6ab59652d529ec1d15+4651aa77a1d1fa653b08562aa8c4c8838a00a2a8&cmd=branches
HTTP/1.1" 200 -
"GET /?nodes=327faaf85fbc794f09eb3ae49e388f7f0c3eb19b&cmd=branches HTTP/1.1" 200 -
"GET
/?nodes=5dd024e5c6c26017e13cbe557de6d90cde40c16d+4e27258cc1d526b53ccac0cde0fb088e6c17f49f+eb2d62d9c4f573b307246897c13979dd63571e1f+deb6eba4e936326c0bdc86659538c3b08d6769b4+5923965ae2ff490c29bd43197a061dbbf3f6544d+0e958cb7d52fd524c9b58f70c7a2adfbb90892a1&cmd=branches
HTTP/1.1" 200 -
"GET
/?nodes=aeeebce2f5953d3647b826de59599bd4abade921+b4c0af16c47ed59432a5ce4993eca5db41c1452c+5bb4de0b9c5293f6f74eddb93735eaab72e30b09+dabaacb8e0c288aa1240520245acc45224c5d1d2+98ea797daf11b7cdd11199bc18be0e5709c5a0ef+59413eb2e377ec5e3eb931ee0aac693f305db79f&cmd=branches
HTTP/1.1" 200 -
"GET
/?nodes=cb7bbce9541c29aac25629220da80adf60d3d3fa+bc338d826981c327160b3629d8b5d214b910fdb3&cmd=branches
HTTP/1.1" 200 -
"GET
/?pairs=5bcbaa9f084260063eb91970c2e15bd4f6cdc345-6e142979c5ab567ca57a3160e70fd83ac1a3b47a&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=122bb32e53b37de68c435aaf3bd7cbf6602fe4cd-28e57d97969d69ee3f726e200aa6291804e82a74&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=d434f3aeb79dff0722c02be145d8318d144e0e3b-435447e48d7d7a760c70e64bf87ac05e44c344fc&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=f2842e815d1203f2666175cad711a467fb8c6576-28e57d97969d69ee3f726e200aa6291804e82a74&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=a1b22f09c4372c01ae3fb4d1f539c9eee504a587-dc38af424bca1fa9a49d7f392a158516bdd90f90&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=c56e848747da2319e87fa45abd39274c2268f8f8-dbb25b77ddc0ef24f360e0d24147dd9ee307aad6&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=6383d321e6d9ec2d32e76e6df5c52cb6ca71cb9f-5b8870a6d50bbd7b38d51c75b5c39a0e9a61e7e9&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=8af88f7a8d7c2ced4c7fd3ef439c03bc618c2f62-a33bf42d4a7d70d6140aff1d1db8d125c62e0c85&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=327faaf85fbc794f09eb3ae49e388f7f0c3eb19b-cf311c27f93ae367429b617340a67f68df7f0e9a&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=eb2d62d9c4f573b307246897c13979dd63571e1f-b05558ed4c777d403cfd6449ad46ebe2d0352be9&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=deb6eba4e936326c0bdc86659538c3b08d6769b4-af222f6260c1fa188f36b12dee95148ff95b2047&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=aeeebce2f5953d3647b826de59599bd4abade921-5b8870a6d50bbd7b38d51c75b5c39a0e9a61e7e9&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=59413eb2e377ec5e3eb931ee0aac693f305db79f-6a65498eee9aa533a009fe95c007526fec270887&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=bc338d826981c327160b3629d8b5d214b910fdb3-bf2f77f62abe8b6d14f0a37dce8d805dde318400&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=100920f6c9fd5c562e74d8020901236e576e93ff-054baa9e4c656f589045e2586490b92b7574be44&cmd=between
HTTP/1.1" 200 -
"GET
/?pairs=60dd113a00d0e58c329d27cca630cdda7d6b99ee-5b8870a6d50bbd7b38d51c75b5c39a0e9a61e7e9&cmd=between
HTTP/1.1" 200 -
"GET
/?cmd=changegroup&roots=0009c898252fb0ad1c67b7b48c00485e09dd9e6b+8af88f7a8d7c2ced4c7fd3ef439c03bc618c2f62+deb6eba4e936326c0bdc86659538c3b08d6769b4+5bcbaa9f084260063eb91970c2e15bd4f6cdc345+bc338d826981c327160b3629d8b5d214b910fdb3+c56e848747da2319e87fa45abd39274c2268f8f8+a1b22f09c4372c01ae3fb4d1f539c9eee504a587+eb2d62d9c4f573b307246897c13979dd63571e1f+122bb32e53b37de68c435aaf3bd7cbf6602fe4cd+59413eb2e377ec5e3eb931ee0aac693f305db79f+60dd113a00d0e58c329d27cca630cdda7d6b99ee+f2842e815d1203f2666175cad711a467fb8c6576+327faaf85fbc794f09eb3ae49e388f7f0c3eb19b
HTTP/1.1" 200 -
msg7284 (view) Author: mpm Date: 2008-10-02.21:45:08
If you're using a basic CGI config, that rules out any sort of threading issues.
msg7283 (view) Author: jglick Date: 2008-10-02.21:38:38
Apache 2.2.6 using hgwebdir from 0.9.5 (yes I know we should upgrade), Solaris 11.
msg7282 (view) Author: mpm Date: 2008-10-02.21:33:03
Ok, sounds like that rules out the discovery process.
What can you tell me about your web server configuration? Apache? CGI?
msg7281 (view) Author: jglick Date: 2008-10-02.21:26:43
Note that the authorized push appears in the log immediately after the end of
the pull. So the log records

18:29:56 - failed push
18:30:00 - first request of pull
18:30:11 - last request of pull
18:30:00 - successful push

I presume the server records the timestamp when the request came in, but logs it
when the response has been sent. This is consistent with the push occurring
concurrently with the pull.
msg7280 (view) Author: jglick Date: 2008-10-02.21:22:03
There is no record of any attempted POST around this time period (except for a
401-failed POST from the same host a few seconds earlier, before retrying with
HTTPS authentication information included). So I think we can rule out the
hypothesis of a rejected pretxnchangegroup hook running around this time - it
would have appeared in the log. The problem really seems to be that the pull
started at exactly the same time as the (authorized) push.
msg7279 (view) Author: jglick Date: 2008-10-02.21:11:03
The client's clock is about 2min 40sec slow, so the pull really began around
18:30:00 according to the Hudson timestamp. The push of the changegroup:

[01/Oct/2008:11:30:00 -0700] "POST
/main?cmd=unbundle&heads=692ee115553c669056d036f631ed2faafde789a6 HTTP/1.1" 200
265 "-" "mercurial/proto-1.0"

The pull starts off with the old head, and the requests do not mention the new head:

[01/Oct/2008:11:30:00 -0700] "GET /main?cmd=heads HTTP/1.1" 200 41 "-"
"mercurial/proto-1.0"
[01/Oct/2008:11:30:01 -0700] "GET
/main?nodes=692ee115553c669056d036f631ed2faafde789a6&cmd=branches HTTP/1.1" 200
164 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:02 -0700] "GET
/main?nodes=5f9624d7ee3900b937c145a9c2df3c13c9f43527+1720ab541c54ff09c08157fc0c81ee4d123c1126&cmd=branches
HTTP/1.1" 200 328 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:02 -0700] "GET
/main?nodes=61eac836a4a781c99c9df607de2eb86bf501fc7d+107ef09635dcf2face2100551af8c64518079642+1d317a285f1b3169ec6ec958591e45590da79b3a+10a661711df64bf657b103fc93defb59f226b5fe&cmd=branches
HTTP/1.1" 200 656 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:03 -0700] "GET
/main?nodes=d884e50453271b47d0a144dfb5ad6a9d7eed8a36+5bcbaa9f084260063eb91970c2e15bd4f6cdc345+67ba2291a73d3f8b2729ce4253f3ed9ab2870a70+5b66c8fb328dd6f1e240a28ecf0892dc33e0bea5+388002a6ffc0deea4c73fc2e6f3b4a978056b656+122bb32e53b37de68c435aaf3bd7cbf6602fe4cd+cdc4f8a1ce82f6dd58235ef298662ac287c20499+7f7fda77fa80c1e95710570d08510a0e0c218790&cmd=branches
HTTP/1.1" 200 1312 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:03 -0700] "GET
/main?nodes=6e60f34b2ce6625e67b45a239f712c2207df8497+d434f3aeb79dff0722c02be145d8318d144e0e3b+703c9ac71b8efa81249a92007b3e677487f7f663+7fdb147d60d348caebe71df0f7c8be6bf7c8fa51+14294176953ebded8f059d988efc50770777eea2+f2842e815d1203f2666175cad711a467fb8c6576+863806dc2bb868942f86c7567af43be6dd952311+b931272d3ee9708e6ec5cef90a0d7d634a4a5498+a1b22f09c4372c01ae3fb4d1f539c9eee504a587&cmd=branches
HTTP/1.1" 200 1476 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:03 -0700] "GET
/main?nodes=c56e848747da2319e87fa45abd39274c2268f8f8+6383d321e6d9ec2d32e76e6df5c52cb6ca71cb9f+06c92eb9658163cf1c37a75075f9cd45b0419b45+1c920dd80243eb218d425f8c66a2b76d09f0140d+64dd033827227a57657edab4e1fb2aea98fb08d1+b8b73f2977cf6b235be0be8fb73c9125dd88499e+0db6812b8e2eb5fc2a55e2fcff7c4ff5ef64c32a+8af88f7a8d7c2ced4c7fd3ef439c03bc618c2f62+908eac84bcea5951ea95bb6ab59652d529ec1d15+4651aa77a1d1fa653b08562aa8c4c8838a00a2a8&cmd=branches
HTTP/1.1" 200 1640 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:04 -0700] "GET
/main?nodes=327faaf85fbc794f09eb3ae49e388f7f0c3eb19b&cmd=branches HTTP/1.1" 200
164 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:04 -0700] "GET
/main?nodes=5dd024e5c6c26017e13cbe557de6d90cde40c16d+4e27258cc1d526b53ccac0cde0fb088e6c17f49f+eb2d62d9c4f573b307246897c13979dd63571e1f+deb6eba4e936326c0bdc86659538c3b08d6769b4+5923965ae2ff490c29bd43197a061dbbf3f6544d+0e958cb7d52fd524c9b58f70c7a2adfbb90892a1&cmd=branches
HTTP/1.1" 200 984 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:04 -0700] "GET
/main?nodes=aeeebce2f5953d3647b826de59599bd4abade921+b4c0af16c47ed59432a5ce4993eca5db41c1452c+5bb4de0b9c5293f6f74eddb93735eaab72e30b09+dabaacb8e0c288aa1240520245acc45224c5d1d2+98ea797daf11b7cdd11199bc18be0e5709c5a0ef+59413eb2e377ec5e3eb931ee0aac693f305db79f&cmd=branches
HTTP/1.1" 200 984 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:05 -0700] "GET
/main?nodes=cb7bbce9541c29aac25629220da80adf60d3d3fa+bc338d826981c327160b3629d8b5d214b910fdb3&cmd=branches
HTTP/1.1" 200 328 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:05 -0700] "GET
/main?pairs=5bcbaa9f084260063eb91970c2e15bd4f6cdc345-6e142979c5ab567ca57a3160e70fd83ac1a3b47a&cmd=between
HTTP/1.1" 200 1 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:06 -0700] "GET
/main?pairs=122bb32e53b37de68c435aaf3bd7cbf6602fe4cd-28e57d97969d69ee3f726e200aa6291804e82a74&cmd=between
HTTP/1.1" 200 1 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:06 -0700] "GET
/main?pairs=d434f3aeb79dff0722c02be145d8318d144e0e3b-435447e48d7d7a760c70e64bf87ac05e44c344fc&cmd=between
HTTP/1.1" 200 123 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:06 -0700] "GET
/main?pairs=f2842e815d1203f2666175cad711a467fb8c6576-28e57d97969d69ee3f726e200aa6291804e82a74&cmd=between
HTTP/1.1" 200 1 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:07 -0700] "GET
/main?pairs=a1b22f09c4372c01ae3fb4d1f539c9eee504a587-dc38af424bca1fa9a49d7f392a158516bdd90f90&cmd=between
HTTP/1.1" 200 1 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:07 -0700] "GET
/main?pairs=c56e848747da2319e87fa45abd39274c2268f8f8-dbb25b77ddc0ef24f360e0d24147dd9ee307aad6&cmd=between
HTTP/1.1" 200 82 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:07 -0700] "GET
/main?pairs=6383d321e6d9ec2d32e76e6df5c52cb6ca71cb9f-5b8870a6d50bbd7b38d51c75b5c39a0e9a61e7e9&cmd=between
HTTP/1.1" 200 82 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:08 -0700] "GET
/main?pairs=8af88f7a8d7c2ced4c7fd3ef439c03bc618c2f62-a33bf42d4a7d70d6140aff1d1db8d125c62e0c85&cmd=between
HTTP/1.1" 200 1 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:08 -0700] "GET
/main?pairs=327faaf85fbc794f09eb3ae49e388f7f0c3eb19b-cf311c27f93ae367429b617340a67f68df7f0e9a&cmd=between
HTTP/1.1" 200 1 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:08 -0700] "GET
/main?pairs=eb2d62d9c4f573b307246897c13979dd63571e1f-b05558ed4c777d403cfd6449ad46ebe2d0352be9&cmd=between
HTTP/1.1" 200 1 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:09 -0700] "GET
/main?pairs=deb6eba4e936326c0bdc86659538c3b08d6769b4-af222f6260c1fa188f36b12dee95148ff95b2047&cmd=between
HTTP/1.1" 200 1 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:09 -0700] "GET
/main?pairs=aeeebce2f5953d3647b826de59599bd4abade921-5b8870a6d50bbd7b38d51c75b5c39a0e9a61e7e9&cmd=between
HTTP/1.1" 200 82 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:09 -0700] "GET
/main?pairs=59413eb2e377ec5e3eb931ee0aac693f305db79f-6a65498eee9aa533a009fe95c007526fec270887&cmd=between
HTTP/1.1" 200 1 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:10 -0700] "GET
/main?pairs=bc338d826981c327160b3629d8b5d214b910fdb3-bf2f77f62abe8b6d14f0a37dce8d805dde318400&cmd=between
HTTP/1.1" 200 1 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:10 -0700] "GET
/main?pairs=100920f6c9fd5c562e74d8020901236e576e93ff-054baa9e4c656f589045e2586490b92b7574be44&cmd=between
HTTP/1.1" 200 41 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:11 -0700] "GET
/main?pairs=60dd113a00d0e58c329d27cca630cdda7d6b99ee-5b8870a6d50bbd7b38d51c75b5c39a0e9a61e7e9&cmd=between
HTTP/1.1" 200 41 "-" "mercurial/proto-1.0"
[01/Oct/2008:11:30:11 -0700] "GET
/main?cmd=changegroup&roots=0009c898252fb0ad1c67b7b48c00485e09dd9e6b+8af88f7a8d7c2ced4c7fd3ef439c03bc618c2f62+deb6eba4e936326c0bdc86659538c3b08d6769b4+5bcbaa9f084260063eb91970c2e15bd4f6cdc345+bc338d826981c327160b3629d8b5d214b910fdb3+c56e848747da2319e87fa45abd39274c2268f8f8+a1b22f09c4372c01ae3fb4d1f539c9eee504a587+eb2d62d9c4f573b307246897c13979dd63571e1f+122bb32e53b37de68c435aaf3bd7cbf6602fe4cd+59413eb2e377ec5e3eb931ee0aac693f305db79f+60dd113a00d0e58c329d27cca630cdda7d6b99ee+f2842e815d1203f2666175cad711a467fb8c6576+327faaf85fbc794f09eb3ae49e388f7f0c3eb19b
HTTP/1.1" 200 216934 "-" "mercurial/proto-1.0"
msg7278 (view) Author: jglick Date: 2008-10-02.20:48:47
(Of course I meant pretxnchangegroup hooks, not pretxncommit.) As I said,
rollbacks by these hooks are likely rare these days, but that could be the problem.

Michal do we have any logs that would indicate pretxnchangegroup-induced
rollbacks with timestamps? http_error.log seems to show nothing of the sort
since January. I am still looking to see if http.log has any useful information
from that period.

I'm still not sure I see how such a rollback could result in the kind of gap
seen here.
msg7277 (view) Author: mpm Date: 2008-10-02.20:03:05
Yes, I agree with your race-condition assessment. The discovery algorithm should
be nicely deterministic given the same input topologies.

Mercurial depends on a filesystem property I'll call 'causality'. If client X
writes to file B and then A, client Y should not see writes to A before writes
to B are visible. This should hold for all local filesystems in any sensible
operating system, and also for clients running on the same host using NFS, which
sounds like what you're describing?

pretxncommit: yes, this might be the culprit. This hook gets called at the very
end of a transaction, after the changelog has been updated which makes all of
the changes visible to all readers. Otherwise the 'hook' wouldn't be able to see
the new changesets to inspect them. So when the hook is running, every other
reader that has access to the repo will also be able to see the change.

If a rollback happens in the middle of the graph discovery process, the client
may get an incorrect picture of what the server DAG looks like.
msg7276 (view) Author: jglick Date: 2008-10-02.19:19:42
Unfortunately the problem is not so easily reproducible. I have created a clone
of the repo up to 5b3f13f2bbd5 to simulate the server, and run 'hg serve' using
Hg 0.9.5 on that; created a clone up to 676790ff94c9 to simulate the client, and
run 'hg in --bundle ... http://localhost:8000/' using Hg 1.0 on that. The
command succeeds, and dae79924bc41 is included in the results as it should be.

The inability to reproduce, the sporadic nature of the error in the past, and
the fact that the changegroup containing the missing changesets was being pushed
around the same time as the pull (according to possibly skewed clocks), makes me
think the problem is a race condition.

NFS is a plausible culprit. Note that

http://hgbook.red-bean.com/hgbookch4.html#x8-810004.5

claims that

"Mercurial uses locks to ensure that only one process can write to a repository
at a time (the locking mechanism is safe even over filesystems that are
notoriously hostile to locking, such as NFS)."

but it is does not specify whether _reads_ are always safe when a writer is
active on an NFS-hosted repo. Your comments in issue1313 lead me to believe that
this is not safe in general. However our server repository is only accessed
directly in either read or write mode from the one machine with the HTTPS
server, which would seem to make stale caches irrelevant.

We do have a few pretxncommit hooks active on the server, e.g. forbidcrlf. While
these run pretty quickly, there is a window during which a changegroup could be
pushed but then rejected. Such rejections should be rare, but would it be
possible for a pull during such a window to result in an error like this? I know
a pull at the wrong time could result in the client getting changesets which
shortly thereafter disappear from the server, but that is quite different from
receiving a long list of changesets with a few gaps in the middle, and all of
the changesets which appear relevant to this case were accepted by the server.
msg7275 (view) Author: mpm Date: 2008-10-02.15:46:11
Re changeset ordering on pushing: the sequence for adding changesets is to
append all the file data, then all the manifest data, and finally all the
changeset data, which makes the new changesets visible to readers. Changesets
(and all other data) are all naturally stored causally sorted so parents always
precede their children. So it's hard to imagine a situation where a reader would
ever read a repo and 'miss' entries without some interesting filesystem bug
coming into play.

More likely is that you've found a strange corner case in the graph discovery
code used by pull ("searching for changes") that's somehow forgetting to ask for
a parent. It's possible that adding individual changesets upstream would make
this problem vanish.

If you can reproduce the behaviour, perhaps we can try to isolate it a bit:

- make local filesystem clones of both the target and its upstream to take NFS,
http, etc., out of the equation
- try pulling again with --debug to get a trace of the discovery process
msg7274 (view) Author: mzlamal Date: 2008-10-02.12:02:43
It could be somehow related but the only common thing here is that we are using NFS.
We have only one server running Solaris 11 (on one zone) and only this server is
using said NFS where the repositories are stored.
The pushes are being done through the https (apache 2 is on server).
msg7273 (view) Author: tonfa Date: 2008-10-02.09:43:39
Could it be related to issue1313? How do you push to the NFS space ?

(adding mpm in the nosy list)
msg7272 (view) Author: mzlamal Date: 2008-10-02.09:36:13
Server is on NFS.
There is no load balancing, it is only one zone.
msg7271 (view) Author: tonfa Date: 2008-10-02.01:34:20
Is the server on nfs ? Do you use load balancing with several servers ?
msg7269 (view) Author: jglick Date: 2008-10-01.22:37:22
Attaching a compressed copy of the bundle file with missing ranges, in case it
is useful somehow. But I guess the real question is why it got written with
missing ranges to begin with.
msg7268 (view) Author: jglick Date: 2008-10-01.22:24:42
The bundle then contains a second gap (revealed by patching bundlerepo.py to
treat a missing parent a nonfatal warning): after the normally fatal read of
2772746bd9aa, 2c9472137fd4 and aa7d20707f8c are skipped over, before resuming
with 3be507481f44 and continuing without further interruptions to 5b3f13f2bbd5,
presumably the tip of main. Again the first skipped changeset in this gap has
parent 18191cbac7cc, made the day before and present in the local clone.
msg7267 (view) Author: jglick Date: 2008-10-01.22:19:37
Note that the first changeset to be missing, 031c97aab2a4, had as its parent the
day-old changeset 185e296b1169 which was an ancestor of the local tip 676790ff94c9.
msg7266 (view) Author: jglick Date: 2008-10-01.22:15:34
I have gathered some more information about this particular example - I was
fortunate to have a login for the machine in question, and no later builds have run.

The workspace was set to use http://hg.netbeans.org/main/ as its parent (thus
pull path). Shortly after Oct 1, 2008 18:27:19 UTC (the timestamp recorded for
the start of the build), with local tip 676790ff94c9, it tried to pull from
main. While I do not know for sure what was in main at the time it was pulling,
5b3f13f2bbd5 (18:29:38) seems most likely; the previous tip on the server looks
to have been 692ee115553c, recorded at 18:23:22 and probably pushed immediately.
5b3f13f2bbd5 was a merge of a team workspace, so would have been pushing in ten
other changesets from the team branch repository (and it records 692ee115553c as
a parent).

I was able to recover the bundle file that it pulled into. (804544 bytes so I
will not try to attach it for now.) If I make a clone -r 676790ff94c9 of main
and try to pull from this bundle, I indeed get the same error. I can confirm
that the error message in bundlerepo.py is wrong - p2, not p1, is the missing
parent. When I improve the error to show local vars and replace 'p1' with 'p', I get

00changelog.i@dae79924bc41: unknown parent in bundle; node=2772746bd9aa
p1=61eac836a4a7 p2=dae79924bc41 n=104754 size=239 start=26747

Logging from a clone that has this changeset, it looks normal enough:

changeset:   104757:2772746bd9aa
parent:      104748:61eac836a4a7
parent:      104756:dae79924bc41

Looks like the bundle file has the merge changeset but not its parent.

'hg in -r 5b3f13f2bbd5' does not produce any error now (on either my local test
workspace or the actual Hudson server).

Is it possible that changesets would be written out of order to the server
during an HTTPS push, such that a call to incoming at just the wrong time could
see a merge changeset but not its parent? Or that the code which writes a bundle
file to disk (run in this case immediately before rereading it) would sometimes
get changesets out of order?

I can also log the local vars for each changeset being read, and change the code
to continue with just a warning if a parent is missing; then I get

node=eb2d62d9c4f5 p1=b05558ed4c77 p2=000000000000 n=104687 size=257 start=90
node=1c920dd80243 p1=676790ff94c9 p2=eb2d62d9c4f5 n=104688 size=142 start=431
node=deb6eba4e936 p1=af222f6260c1 p2=000000000000 n=104689 size=402 start=657
node=1175d96caede p1=889e773a7105 p2=deb6eba4e936 n=104690 size=142 start=1143
node=c5fd7dfc1c15 p1=1175d96caede p2=000000000000 n=104691 size=235 start=1369
node=64dd03382722 p1=c5fd7dfc1c15 p2=000000000000 n=104692 size=201 start=1688
node=b8b73f2977cf p1=1c920dd80243 p2=64dd03382722 n=104693 size=142 start=1973
node=0db6812b8e2e p1=1c920dd80243 p2=000000000000 n=104694 size=337 start=2199
node=14294176953e p1=b8b73f2977cf p2=0db6812b8e2e n=104695 size=142 start=2620
node=f2842e815d12 p1=28e57d97969d p2=000000000000 n=104696 size=270 start=2846
node=57c79f588a8f p1=14294176953e p2=f2842e815d12 n=104697 size=143 start=3200
node=4651aa77a1d1 p1=57c79f588a8f p2=000000000000 n=104698 size=226 start=3427
node=327faaf85fbc p1=cf311c27f93a p2=000000000000 n=104699 size=475 start=3737
node=b931272d3ee9 p1=4651aa77a1d1 p2=327faaf85fbc n=104700 size=142 start=4296
node=a1b22f09c437 p1=dc38af424bca p2=000000000000 n=104701 size=333 start=4522
node=cb7bbce9541c p1=b931272d3ee9 p2=a1b22f09c437 n=104702 size=160 start=4939
node=bc338d826981 p1=bf2f77f62abe p2=000000000000 n=104703 size=255 start=5183
node=14c62faaf2e1 p1=cb7bbce9541c p2=bc338d826981 n=104704 size=143 start=5522
node=b4c0af16c47e p1=14c62faaf2e1 p2=000000000000 n=104705 size=396 start=5749
node=60dd113a00d0 p1=c585b061d0fe p2=000000000000 n=104706 size=2274 start=6229
node=aeeebce2f595 p1=60dd113a00d0 p2=000000000000 n=104707 size=1229 start=8587
node=95dc097fd42d p1=aeeebce2f595 p2=b4c0af16c47e n=104708 size=252 start=9900
node=98ea797daf11 p1=95dc097fd42d p2=000000000000 n=104709 size=1134 start=10236
node=5bb4de0b9c52 p1=98ea797daf11 p2=000000000000 n=104710 size=405 start=11454
node=dabaacb8e0c2 p1=1c920dd80243 p2=000000000000 n=104711 size=329 start=11943
node=5923965ae2ff p1=5bb4de0b9c52 p2=dabaacb8e0c2 n=104712 size=142 start=12356
node=59413eb2e377 p1=6a65498eee9a p2=000000000000 n=104713 size=323 start=12582
node=f8f7ac2a28fb p1=98ea797daf11 p2=59413eb2e377 n=104714 size=154 start=12989
node=0e958cb7d52f p1=f8f7ac2a28fb p2=000000000000 n=104715 size=117 start=13227
node=d52d3cfcfd72 p1=5923965ae2ff p2=0e958cb7d52f n=104716 size=142 start=13428
node=009730953135 p1=d52d3cfcfd72 p2=000000000000 n=104717 size=535 start=13654
node=908eac84bcea p1=009730953135 p2=000000000000 n=104718 size=233 start=14273
node=8af88f7a8d7c p1=a33bf42d4a7d p2=000000000000 n=104719 size=197 start=14590
node=863806dc2bb8 p1=8af88f7a8d7c p2=908eac84bcea n=104720 size=228 start=14871
node=c56e848747da p1=74219ff0b562 p2=000000000000 n=104721 size=205 start=15183
node=bb48023ecf98 p1=6e142979c5ab p2=c56e848747da n=104722 size=148 start=15472
node=6e60f34b2ce6 p1=bb48023ecf98 p2=000000000000 n=104723 size=284 start=15704
node=0009c898252f p1=054baa9e4c65 p2=000000000000 n=104724 size=259 start=16072
node=100920f6c9fd p1=0009c898252f p2=000000000000 n=104725 size=433 start=16415
node=42618a2f2a85 p1=100920f6c9fd p2=000000000000 n=104726 size=393 start=16932
node=d434f3aeb79d p1=42618a2f2a85 p2=000000000000 n=104727 size=241 start=17409
node=53770f0e7e60 p1=6e60f34b2ce6 p2=d434f3aeb79d n=104728 size=147 start=17734
node=d884e5045327 p1=53770f0e7e60 p2=000000000000 n=104729 size=455 start=17965
node=388002a6ffc0 p1=d884e5045327 p2=863806dc2bb8 n=104730 size=197 start=18504
node=122bb32e53b3 p1=28e57d97969d p2=000000000000 n=104731 size=474 start=18785
node=cdc4f8a1ce82 p1=388002a6ffc0 p2=122bb32e53b3 n=104732 size=142 start=19343
node=1d317a285f1b p1=cdc4f8a1ce82 p2=000000000000 n=104733 size=397 start=19569
node=7f7fda77fa80 p1=cb7bbce9541c p2=000000000000 n=104734 size=225 start=20050
node=10a661711df6 p1=cdc4f8a1ce82 p2=7f7fda77fa80 n=104735 size=160 start=20359
node=5dd024e5c6c2 p1=1d317a285f1b p2=10a661711df6 n=104736 size=82 start=20603
node=4e27258cc1d5 p1=98ea797daf11 p2=000000000000 n=104737 size=287 start=20769
node=06c92eb96581 p1=5dd024e5c6c2 p2=4e27258cc1d5 n=104738 size=142 start=21140
node=6383d321e6d9 p1=aeeebce2f595 p2=000000000000 n=104739 size=554 start=21366
node=703c9ac71b8e p1=6383d321e6d9 p2=06c92eb96581 n=104740 size=264 start=22004
node=4f4e0fbae331 p1=b8b73f2977cf p2=000000000000 n=104741 size=223 start=22352
node=7fdb147d60d3 p1=4f4e0fbae331 p2=000000000000 n=104742 size=217 start=22659
node=72fd4230554e p1=703c9ac71b8e p2=7fdb147d60d3 n=104743 size=142 start=22960
node=67ba2291a73d p1=72fd4230554e p2=000000000000 n=104744 size=284 start=23186
node=5b66c8fb328d p1=57c79f588a8f p2=000000000000 n=104745 size=231 start=23554
node=107ef09635dc p1=67ba2291a73d p2=5b66c8fb328d n=104746 size=143 start=23869
node=5bcbaa9f0842 p1=6e142979c5ab p2=000000000000 n=104747 size=264 start=24096
node=61eac836a4a7 p1=d884e5045327 p2=5bcbaa9f0842 n=104748 size=147 start=24444
node=5f9624d7ee39 p1=61eac836a4a7 p2=107ef09635dc n=104749 size=197 start=24675
node=1720ab541c54 p1=5dd024e5c6c2 p2=000000000000 n=104750 size=352 start=24956
node=e5f7d19b3955 p1=5f9624d7ee39 p2=1720ab541c54 n=104751 size=143 start=25392
node=470d25c6e76c p1=e5f7d19b3955 p2=000000000000 n=104752 size=338 start=25619
node=692ee115553c p1=470d25c6e76c p2=000000000000 n=104753 size=622 start=26041
node=2772746bd9aa p1=61eac836a4a7 p2=dae79924bc41 n=104754 size=239 start=26747
did not find parent dae79924bc41
node=3be507481f44 p1=61eac836a4a7 p2=aa7d20707f8c n=104754 size=165 start=27070
did not find parent aa7d20707f8c
node=3c053211e51d p1=2772746bd9aa p2=3be507481f44 n=104754 size=82 start=27319
did not find parent 2772746bd9aa
did not find parent 3be507481f44
node=f72cedf5282c p1=61eac836a4a7 p2=000000000000 n=104754 size=418 start=27485
node=abad5bf2c021 p1=3c053211e51d p2=f72cedf5282c n=104755 size=148 start=27987
did not find parent 3c053211e51d
node=5b3f13f2bbd5 p1=abad5bf2c021 p2=692ee115553c n=104755 size=339 start=28219
did not find parent abad5bf2c021

If I look for the 'node' fields in a full clone of main, they proceed in logical
order up through 692ee115553c, then suddenly skip over three changesets
(031c97aab2a4, b18757413284, and dae79924bc41) and go straight to 2772746bd9aa.
msg7265 (view) Author: jglick Date: 2008-10-01.19:36:23
We have a number of Hg repositories (all served over HTTP(S)) and various Hudson
continuous builders routinely polling them for changes. Hudson's Hg plugin uses
the following idiom:

hg incoming --quiet --bundle hg.bundle --template "...."
hg pull -u hg.bundle

(The build's changelog is computed from the template output, while the use of
hg.bundle ensures that the changes are pulled over the network just once.)

This works fine most of the time. Once in a while, for no apparent reason, the
incoming call dies with e.g.

abort: 00changelog.i@61eac836a4a7: unknown parent!

If you look up #61eac836a4a7 on the repository being served, it exists and is a
normal-looking changeset (in this case a merge done by 'hg fetch').

The error is sporadic - generally the next build of the same job proceeds normally.

The error message appears to come from bundlerepo.bundlerevlog's constructor:

            for p in (p1, p2):
                if not p in self.nodemap:
                    raise revlog.LookupError(p1, self.indexfile,
                                             _("unknown parent"))

First of all, the error message looks wrong to me; I would assume that p, not
p1, is the node which should be passed to the message (compare similar code in
revlog.py).

The loop seems to assume that parent revisions precede children in the iterator.
Is this a safe assumption?
History
Date User Action Args
2008-10-29 11:44:02djcsetnosy: mpm, tonfa, pmezard, jglick, djc, abuehl, mzlamal, gozer
assignedto: tonfa
2008-10-29 11:43:54djcsetstatus: chatting -> in-progress
nosy: mpm, tonfa, pmezard, jglick, djc, abuehl, mzlamal, gozer
2008-10-23 22:18:35tonfasetnosy: mpm, tonfa, pmezard, jglick, djc, abuehl, mzlamal, gozer
messages: + msg7660
2008-10-21 15:25:40tonfasetnosy: mpm, tonfa, pmezard, jglick, djc, abuehl, mzlamal, gozer
messages: + msg7623
2008-10-18 19:00:52mpmsetnosy: + gozer
2008-10-18 19:00:28mpmlinkissue1313 superseder
2008-10-09 19:44:47pmezardsetnosy: + pmezard
2008-10-08 12:28:49tonfasetfiles: - hg.bundle.bz2
nosy: mpm, tonfa, jglick, djc, abuehl, mzlamal
2008-10-08 12:27:04tonfasetnosy: mpm, tonfa, jglick, djc, abuehl, mzlamal
messages: + msg7342
2008-10-06 11:36:01djcsetnosy: mpm, tonfa, jglick, djc, abuehl, mzlamal
messages: + msg7316
2008-10-06 11:35:45djcsetnosy: + djc
2008-10-02 23:47:37mpmsetnosy: mpm, tonfa, jglick, abuehl, mzlamal
messages: + msg7295
2008-10-02 23:30:00tonfasetnosy: mpm, tonfa, jglick, abuehl, mzlamal
messages: + msg7294
2008-10-02 23:23:42jglicksetnosy: mpm, tonfa, jglick, abuehl, mzlamal
messages: + msg7293
2008-10-02 23:18:31jglicksetnosy: mpm, tonfa, jglick, abuehl, mzlamal
messages: + msg7292
title: Failures pulling from HTTP repo: 'abort: 00changelog.i@...: unknown parent!' -> Race condition pulling over HTTP during push: 'abort: 00changelog.i@...: unknown parent!'
2008-10-02 23:07:15jglicksetnosy: mpm, tonfa, jglick, abuehl, mzlamal
messages: + msg7291
2008-10-02 22:57:24mpmsetnosy: mpm, tonfa, jglick, abuehl, mzlamal
messages: + msg7290
2008-10-02 22:40:39jglicksetnosy: mpm, tonfa, jglick, abuehl, mzlamal
messages: + msg7289
2008-10-02 22:37:54jglicksettopic: + http_proto
nosy: mpm, tonfa, jglick, abuehl, mzlamal
messages: + msg7288
title: Failures in 'hg in --bundle': 'abort: 00changelog.i@...: unknown parent!' -> Failures pulling from HTTP repo: 'abort: 00changelog.i@...: unknown parent!'
2008-10-02 22:22:34jglicksetnosy: mpm, tonfa, jglick, abuehl, mzlamal
messages: + msg7287
2008-10-02 22:15:38jglicksetnosy: mpm, tonfa, jglick, abuehl, mzlamal
messages: + msg7286
2008-10-02 21:45:08mpmsetnosy: mpm, tonfa, jglick, abuehl, mzlamal
messages: + msg7284
2008-10-02 21:38:38jglicksetnosy: mpm, tonfa, jglick, abuehl, mzlamal
messages: + msg7283
2008-10-02 21:33:04mpmsetnosy: mpm, tonfa, jglick, abuehl, mzlamal
messages: + msg7282
2008-10-02 21:26:43jglicksetnosy: mpm, tonfa, jglick, abuehl, mzlamal
messages: + msg7281
2008-10-02 21:22:03jglicksetnosy: mpm, tonfa, jglick, abuehl, mzlamal
messages: + msg7280
2008-10-02 21:11:08jglicksetnosy: mpm, tonfa, jglick, abuehl, mzlamal
messages: + msg7279
2008-10-02 20:48:50jglicksetnosy: mpm, tonfa, jglick, abuehl, mzlamal
messages: + msg7278
2008-10-02 20:03:08mpmsetnosy: mpm, tonfa, jglick, abuehl, mzlamal
messages: + msg7277
2008-10-02 19:19:46jglicksetnosy: mpm, tonfa, jglick, abuehl, mzlamal
messages: + msg7276
2008-10-02 15:46:12mpmsetnosy: mpm, tonfa, jglick, abuehl, mzlamal
messages: + msg7275
2008-10-02 12:02:44mzlamalsetnosy: mpm, tonfa, jglick, abuehl, mzlamal
messages: + msg7274
2008-10-02 09:43:39tonfasetnosy: + mpm
messages: + msg7273
2008-10-02 09:36:13mzlamalsetstatus: unread -> chatting
nosy: tonfa, jglick, abuehl, mzlamal
messages: + msg7272
2008-10-02 01:34:21tonfasetstatus: chatting -> unread
nosy: + tonfa
messages: + msg7271
2008-10-01 22:37:22jglicksetfiles: + hg.bundle.bz2
nosy: jglick, abuehl, mzlamal
messages: + msg7269
2008-10-01 22:24:42jglicksetnosy: jglick, abuehl, mzlamal
messages: + msg7268
2008-10-01 22:19:51jglicksetnosy: + mzlamal
2008-10-01 22:19:37jglicksetmessages: + msg7267
2008-10-01 22:15:37jglicksetstatus: unread -> chatting
messages: + msg7266
2008-10-01 20:07:39abuehlsetnosy: + abuehl
2008-10-01 19:36:25jglickcreate