Strategies for push/merge problem?

Mon Jul 14 16:29:15 CDT 2008

On Mon, 2008-07-14 at 13:14 -0700, Greg Lindahl wrote:
> On Mon, Jul 14, 2008 at 11:38:55AM -0500, Matt Mackall wrote:
> 
> > If we have n people trying to update the counter
> > themselves, they'll all have to first get the counter, update it, try to
> > put it back, and if it's changed in the mean time, they have to start
> > over.
> 
> This is a fine example of the case where there are genuine conflicts
> over changes.

No, this example was explicitly laid out to be a case where there are no
conflicts. The operation is '+', not '3-way merge' - there is no such
thing as conflict. There is only resource contention. But I see that
since I defined it in terms of modifying a single file, people got
confused.

So let me start over. Imagine an empty repo. Each user x wants to add a
unique file x to the tip revision of the repo. Same discussion applies.

> If the merge tool is smart enough to resolve the conflict without
> human help, then why should Mercurial require extra hoop jumping for a
> false conflict?

There's real contention on push over what changeset is the tip. Merging
is merely a side-effect of that. If you want to automate the process of
pulling, merging, pushing, and potentially retrying, go right ahead. But
forgive me if I'm not terribly interested in making such unbounded
behavior the default.

> There is waiting: the developer has to wait for the single person
> doing pulling to get around to pulling new changes. ("can you please
> pull my bugfix?" "hey, you didn't pull from me yet, and this is an
> important fix!")

Compare: 

"I wasted the whole day waiting to check my code in to the central
server"

"I took 3 seconds to check my code in locally, then I sent out a pull
request and moved right along to my next project"

> There's also a merge problem: the person best equipped to merge real
> conflicts are the developers who changed the code, not the single
> person whose job it is to pull. ("hey Fred and Bob, I don't understand
> these 2 conflicting changes, can you please come by my cube and tell
> me how I should fix it?")

That's one way to do it. Another is "hey Fred and Bob, your code doesn't
merge cleanly automatically, moving on, tell me when you've sorted it
out." There's a guy I know named Linus who does precisely this for 50+
people every morning after breakfast. In the next couple weeks, he'll
probably merge on the order of 8000 changesets from 1000 contributors.
It works.

> > ps: The Linux kernel used to do something very similar to the scenario
> > above when counting network packets, etc. It all worked fine until
> > someone showed up with a multi-million dollar machine with 1024 CPUs and
> > networking interfaces. With all 1024 CPUs trying to update the same
> > counter, not much actual work got done. Switching to a pull model
> > naturally made the problem vanish.
> 
> It could also have been solved by pushing groups of counts, e.g.  push
> an update every ncpu/1024 seconds.

And that way sucks. The problem is still "uncoordinated clients
experience contention". Your way is "let's push the problem off a bit by
updating a thousand times slower", mine is "let's avoid contention
entirely by coordination". The latter is known as "scalable".

>  There's more than one way to do it,
> and not every system has to be engineered for millions of updaters.

Sure, but both Mercurial and the Linux kernel -do-.

-- 
Mathematics is the supreme nostalgia of our time.