Ignoring (certain) symlinks

Greg Ward greg-hg at gerg.ca
Mon Jul 6 13:54:41 CDT 2009


I'm converting a large-ish source tree, with its hairy build system
intact, from CVS to Hg.  My amusing diversion for today was to try to
create a useful .hgignore file.  It's not as easy as it sounds.  Here
are some stats:

* files under Hg control (hg manif | wc -l): 17,616
* files created by the normal build process
  (with no .hginore, do a full build and run hg st -nu | wc -l): 96,484
* number of unknown files that can be easily ignored
  (*.class, *.o, lib*.a): ~25,000

That leaves ~71,000 unknown files that are not easily captured by
filename patterns.  However, ~68,500 of those unknown files are easily
distinguished by code: they are relative symlinks to Java source
files.  I.e. the link matches *.java and readlink() returns a string
matching "../*/*.java".

There's no easy way to do this in .hgignore, since unfortunately a
great many of those 68,500 symlinks are jumbled in with regular source
files.  (Also our Java package-space is wildly inconsistent, so
there's no easy way to distinguish things at that level either.)

I toyed with the idea of writing an extension for this, but got a bit
lost in the thickets of dirstate.walk() and friends.

So, just to see how hard it is in principle, I hacked the inner guts
of dirstate.walk() (please hold your noses, this stinks):

--- a/mercurial/dirstate.py
+++ b/mercurial/dirstate.py
@@ -531,6 +531,8 @@
                             wadd(nf)
                         if nf in dmap and matchfn(nf):
                             results[nf] = None
+                    elif kind == lnkkind and nf.endswith(".java"):
+                        pass
                     elif kind == regkind or kind == lnkkind:
                         if nf in dmap:
                             if matchfn(nf):

Blecch.  That's evil.  The good news is that it drops my number of
unknown files to 27,900 *and* cuts the time to run "hg status" by
several seconds.  Throw in 3 or 4 trivial patterns (eg. *.class) and
it's down to 3,000 unknown files.  I can deal with that many by adding
more patterns to .hgignore.

Can anyone think of a non-evil way to do this?  My extension idea was
to wrap ignore.ignore() to return a function that checks for *.java
symlinks before passing control to the original ignore func.  But that
means a second stat() call on every unknown file, since
dirstate.walk() has a stat object but does not pass it to ignore().
That stat() call is also hard, since ignore() gets a repo-relative
path, which means a repo.wjoin() is necessary.  Ugh.  ;-(

Thanks --

Greg


More information about the Mercurial mailing list