ancestor() multiple-valued (long)

Peter Arrenbrecht peter.arrenbrecht at gmail.com
Tue Jan 20 01:32:49 CST 2009


On Mon, Jan 19, 2009 at 2:22 AM, Pierre Asselin <pa at panix.com> wrote:
> While trying to figure out the exact semantics of 'merge' I think
> I found a problem.
>
> The nodes in the revision graph are partially ordered by the relation
> "x<y iff x is a descendent of y".  I see a widespread assumption in the
> code and the docs that any two nodes have a least upper bound under this
> ordering, but this is not true.  Consider the following revision graph:
>
>                   +--A1---+
>                  /         \
>           +--A--+           +------AB1
>          /       \         /
>         /         +--A2-- / ---+
>    fork+                 /      \
>         \         +--B1-+        +-AB2
>          \       /              /
>           +--B--+              /
>                  \            /
>                   +--B2------+
>
> The common ancestors of AB1, AB2 are the "tuning fork" [fork, A, B].
> A and B are minimal common ancestors (no other common ancestors descend
> from them) but neither is a least common ancestor (descendent of every
> other common ancestor).  What happens when you merge AB1 and AB2 ?
> The code in ancestor.py picks among A and B nondeterministically,
> whichever is first to come out of a 'generation' dictionary.
>
> The script below creates two repositories 'ambiguous-one' and
> 'ambiguous-two' with the above revision graph, such that the
> 'debugancestor AB1 AB2' differ.  Please confirm, I'm not sure how
> reproducible this is but I get it everytime:
>
>    $ hg -R ambiguous-one log --template '{node|short}: {tags}\n' -r A -r B
>    2a959d4069c5: A
>    b6bb7e77431c: B
>    $ hg -R ambiguous-two log --template '{node|short}: {tags}\n' -r A -r B
>    2a959d4069c5: A
>    b6bb7e77431c: B
>    $ hg -R ambiguous-one debugancestor AB1 AB2
>    1:2a959d4069c5a536e1668ae68a1fcb862e0f37e4
>    $ hg -R ambiguous-two debugancestor AB1 AB2
>    1:b6bb7e77431c8e7453b7ac61070163f282713814
>
> This would only be a semi-bug, except that 'hg merge' makes multiple calls
> to ancestor.ancestor() and gets inconsistent results.  If I attempt the
> merge in ambiguous-two I get:
>
>    $ cd ambiguous-two
>    $ hg --debug merge
>    merge: warning: conflicts during merge
>    resolving manifests
>     overwrite None partial False
>>>>  ancestor b6bb7e77431c local 48111963ff2f+ remote 7df35e5fb308
>      searching for copies back to rev 3
>     testfile: versions differ -> m
>    picked tool 'filemerge' for testfile (binary False symlink False)
>    merging testfile
>>>> my testfile at 48111963ff2f+ other testfile at 7df35e5fb308 ancestor testfile at 2a959d4069c5
>    merging testfile failed!
>    0 files updated, 0 files merged, 0 files removed, 1 files unresolved
>    There are unresolved merges, you can redo the full merge using:
>      hg update -C 8
>      hg merge 5
>
> Now IMHO we have a bug.  There is a superficial resemblance to issue
> 1327 but this is different:  the trigger of 1327 is not present and,
> unlike 1327, the merge base chosen in the end *is* a common ancestor of
> the two changesets being merged.  Still, the merge base could vary from
> file to file.  It's not fatal, but it just doesn't seem right.

If I read this output correctly, I would still be wary of calling it a
bug. Choosing a different, but equally valid ancestor for the
tool-based merge when the internal merge fails is not necessarily a
bug, as long as both merge attempts use their respective chosen
ancestors consistently.

>
> (If your revision graph looks like the above your organization has
> communication problems, but those are beyond Mercurial's jurisdiction;
> plus, you never know what you'll get when you pull.  I think this is a
> real bug.)
>
> In the example above, neither A nor B is the right merge base for AB1
> and AB2.  The best one would be the merge of A and B, if it existed.
> That said, both A and B are probably good enough, so an acceptable
> band-aid would be to break ties in an arbitrary but consistent manner,
> preferably independent of accidents like the local changeset order
> in the repo.
>
> For example, in ancestor.ancestor(), intersect matching generations until
> the result is nonempty.  If the intersection has more than one element,
> arbitrarily pick the one with the smallest changeset hash.  Easy to
> say, but I don't know enough Mercurial (or python?) to disentangle the
> various class.ancestor()'s that call into ancestor.py and how to pass
> the changeset info down that pipe.
>
> The documentation of the various .ancestor()'s should also be changed
> to say "return a minimal common ancestor of nodes a and b or None if
> there is no such ancestor;  if many such ancestors exist, choose one
> arbitrarily but consistently across calls."

We might also choose to just augment the docs to say that if many
exist, one is chosen in a not necessarily predictable fashion. Then we
should look through the callers to see if any currently might run into
trouble because of this.

My gut feeling, though, tells me you're right and we should fix the
behaviour. An API should not hold such surprises. And if for whatever
reason it needs to, maybe the name should be changed to something like
randomclosestancestor().

In any case, thanks for the detailed analysis! And please note that
this is just my opinion.
-parren


>
> Test script follows.
>
> --------------
> #! /bin/sh
> set -x
>
> rm -rf ambiguous-one ambiguous-two
>
> hg init ambiguous-one
> cd ambiguous-one
>
> for i in 1 2 3 4; do echo base $i >> testfile; done
> hg add testfile
> hg commit -m fork
> hg tag -l fork
>
> for I in A B; do
>    hg update --clean --rev fork
>    sed -i -e 2achange\ $I testfile
>    hg commit -m $I
>    hg tag -l $I
>    for J in 1 2; do
>        hg update --clean --rev $I
>        sed -i -e 3asubchange\ $I$J testfile
>        hg commit -m $I$J
>        hg tag -l $I$J
>    done
> done
>
> for J in 1 2; do
>    hg update --clean A$J
>    hg merge --rev B$J
>    # "fix" conflicts
>    sed -i \
>        -e '/^<<<<<<</d' -e '/^>>>>>>>/d' \
>        -e '/^=======/d' -e '/^|||||||/d' \
>        testfile \
>    ;
>    hg commit -m AB$J
>    hg tag -l AB$J
> done
>
> cd ..
> hg init ambiguous-two
> for rev in B AB1 AB2; do
>    hg --repository ambiguous-one push --force --rev $rev ambiguous-two
> done
> #
> # Hack, copy the tags for convenience.
> cp ambiguous-one/.hg/localtags ambiguous-two/.hg
> #
> # Put the new repo on the same rev as the first.
> (cd ambiguous-two && hg update --rev AB2)
>
> set +x
> echo
> echo '#######################################################################'
> echo Consistency check: ambiguous-one and ambiguous-two have same changesets
> for action in incoming outgoing; do
>    cmd="hg --repository ambiguous-one $action ambiguous-two"
>    echo $cmd
>    eval $cmd
> done
>
> echo
> echo Ancestor of AB1 and AB2:
> for repo in ambiguous-one ambiguous-two; do
>    echo -n $repo:\
>    hg --repository $repo debugancestor AB1 AB2
> done
>
> exit 0
>
>
>
>
> --
> pa at panix dot com
>
> _______________________________________________
> Mercurial mailing list
> Mercurial at selenic.com
> http://selenic.com/mailman/listinfo/mercurial
>


More information about the Mercurial mailing list