[PATCH] 1321 (partial): run Python pretxnchangegroup hooks before disk flush

Fri Jan 16 14:14:15 CST 2009

On Fri, 2009-01-16 at 00:42 -0600, Matt Mackall wrote:
> On Thu, 2009-01-15 at 11:22 -0500, Jesse Glick wrote:
> > Doug Philips wrote:
> > >> 1321 (partial): run Python pretxnchangegroup hooks before disk flush.
> > > 
> > > Other issues aside, this would imply that an in memory python hook could not "call out" (subprocess, etc) to an external program?
> > 
> > It could. But if for some reason it happened to (perhaps indirectly)
> > call out to the 'hg' executable, that instance of Hg would not see the
> > pending changesets.
> > 
> > Having an in-process hook which winds up calling the external hg
> > executable seems perverse to me - you will lose the power & speed of
> > the in-process hook, so you might as 
> > well have used an external hook to begin with - but it is possible
> > someone is doing it. If so, it would be straightforward to have an
> > option to run the in-process hooks 
> > after the flush as in current releases.
> > 
> > Another option would be to run pretxnchangegroup hooks exactly as they
> > are now, but introduce a new hook such as pretxnflushchangegroup,
> > documented to run before the 
> > flush and aborting in case you tried to pass an external hook (unless
> > and until this is implemented somehow).
> 
> I'm convinced we need to do something here.
> 
> I'd probably prefer a new hook with a less unfortunate name. I'd rather
> not introduce the new python-only hook distinction either.
> 
> I might also be convinced to consider a way to make shell hg commands
> work with the not-quite-committed changegroup, perhaps via a global flag
> like --pending. Or maybe even an environment variable.
> 
> The environment variable would be something like
> HGPENDING=path/to/new/changelog.i
> 
> We'd set this in the hook environment and when this was present, hg
> inside the hook would use it rather than the default changelog.i path
> (provided it was in a sensible place).
> 
> This would still need a new hook name, and we could provide a way for
> in-process hooks to see the updated repo similar to what you've done.

Ok, looking at this more closely, I think we want to go the following
route:

- move changelog finalization after the pretxnchangegroup hook
- implement a scheme for letting hg processes use pending transaction
data, probably through a hook environment variable
- let in-process hooks access this data too (more or less default
behavior)

I think it should be possible to do this without exposing any
incompatibilities (except in the silly case where the hook is running an
earlier version of hg).

What needs to be done:

- add a pendingfile() function to changelog to dump an index file for
pending changes (relatively painless)

- add a way to pass PENDING=pendingfile() to hook() with lazy evaluation
(writing the temporary index may be expensive, we don't want to do it
unless necessary)

- add a check for HG_PENDING in localrepository.__init__

- add a method to changelog to load an alternate index file
(perhaps by creating a temporary revlog object with it and copying the
index)

Here's a very lightly tested proof of concept:

diff -r 44b3f7bbe2f3 mercurial/changelog.py

--- a/mercurial/changelog.py	Thu Jan 15 01:38:52 2009 +0100
+++ b/mercurial/changelog.py	Fri Jan 16 14:12:22 2009 -0600
@@ -86,6 +86,31 @@
         self._delaybuf = []
         self._delayname = None
 
+    def readpending(self, file):
+        r = revlog.revlog(self.opener, file)
+        self.index = r.index
+        self.nodemap = r.nodemap
+        self._chunkcache = r._chunkcache
+
+    def writepending(self):
+        "return a file containing the unfinalized state for pretxnchangegroup"
+        if self._delaybuf:
+            # make a temporary copy of the index
+            fp1 = self._realopener(self.indexfile)
+            fp2 = self._realopener(self.indexfile + ".a", "a")
+            fp2.write(fp1.read())
+            # add pending data
+            fp2.write("".join(self._delaybuf))
+            fp2.close()
+            # switch modes so finalize can simply rename
+            del self._delaybuf
+            self._delayname = self.indexfile
+
+        if self._delayname:
+            return self._delayname + ".a"
+
+        return self.indexfile
+
     def finalize(self, tr):
         "finalize index updates"
         self.opener = self._realopener
diff -r 44b3f7bbe2f3 mercurial/hook.py
--- a/mercurial/hook.py	Thu Jan 15 01:38:52 2009 +0100
+++ b/mercurial/hook.py	Fri Jan 16 14:12:22 2009 -0600
@@ -70,7 +70,13 @@
 
 def _exthook(ui, repo, name, cmd, args, throw):
     ui.note(_("running hook %s: %s\n") % (name, cmd))
-    env = dict([('HG_' + k.upper(), v) for k, v in args.iteritems()])
+
+    env = {}
+    for k, v in args.iteritems():
+        if callable(v):
+            v = v()
+        env['HG_' + k.upper()] = v
+
     if repo:
         cwd = repo.root
     else:
diff -r 44b3f7bbe2f3 mercurial/localrepo.py
--- a/mercurial/localrepo.py	Thu Jan 15 01:38:52 2009 +0100
+++ b/mercurial/localrepo.py	Fri Jan 16 14:12:22 2009 -0600
@@ -88,6 +88,13 @@
     def __getattr__(self, name):
         if name == 'changelog':
             self.changelog = changelog.changelog(self.sopener)
+            if 'HG_PENDING' in os.environ:
+                p = os.environ['HG_PENDING']
+                if p.startswith(self.path):
+                    print "using pending"
+                    self.changelog.readpending('00changelog.i.a')
+                else:
+                    print "Nope", p, self.path
             self.sopener.defversion = self.changelog.version
             return self.changelog
         if name == 'manifest':
@@ -2036,9 +2043,6 @@
                 revisions += len(fl) - o
                 files += 1
 
-            # make changelog see real files again
-            cl.finalize(trp)
-
             newheads = len(self.changelog.heads())
             heads = ""
             if oldheads and newheads != oldheads:
@@ -2051,7 +2055,10 @@
             if changesets > 0:
                 self.hook('pretxnchangegroup', throw=True,
                           node=hex(self.changelog.node(cor+1)), source=srctype,
-                          url=url)
+                          url=url, pending=self.changelog.writepending)
+
+            # make changelog see real files again
+            cl.finalize(trp)
 
             tr.close()
         finally:


-- 
http://selenic.com : development and support for Mercurial and Linux