Ignoring (certain) symlinks
Greg Ward
greg-hg at gerg.ca
Mon Jul 6 13:54:41 CDT 2009
I'm converting a large-ish source tree, with its hairy build system
intact, from CVS to Hg. My amusing diversion for today was to try to
create a useful .hgignore file. It's not as easy as it sounds. Here
are some stats:
* files under Hg control (hg manif | wc -l): 17,616
* files created by the normal build process
(with no .hginore, do a full build and run hg st -nu | wc -l): 96,484
* number of unknown files that can be easily ignored
(*.class, *.o, lib*.a): ~25,000
That leaves ~71,000 unknown files that are not easily captured by
filename patterns. However, ~68,500 of those unknown files are easily
distinguished by code: they are relative symlinks to Java source
files. I.e. the link matches *.java and readlink() returns a string
matching "../*/*.java".
There's no easy way to do this in .hgignore, since unfortunately a
great many of those 68,500 symlinks are jumbled in with regular source
files. (Also our Java package-space is wildly inconsistent, so
there's no easy way to distinguish things at that level either.)
I toyed with the idea of writing an extension for this, but got a bit
lost in the thickets of dirstate.walk() and friends.
So, just to see how hard it is in principle, I hacked the inner guts
of dirstate.walk() (please hold your noses, this stinks):
--- a/mercurial/dirstate.py
+++ b/mercurial/dirstate.py
@@ -531,6 +531,8 @@
wadd(nf)
if nf in dmap and matchfn(nf):
results[nf] = None
+ elif kind == lnkkind and nf.endswith(".java"):
+ pass
elif kind == regkind or kind == lnkkind:
if nf in dmap:
if matchfn(nf):
Blecch. That's evil. The good news is that it drops my number of
unknown files to 27,900 *and* cuts the time to run "hg status" by
several seconds. Throw in 3 or 4 trivial patterns (eg. *.class) and
it's down to 3,000 unknown files. I can deal with that many by adding
more patterns to .hgignore.
Can anyone think of a non-evil way to do this? My extension idea was
to wrap ignore.ignore() to return a function that checks for *.java
symlinks before passing control to the original ignore func. But that
means a second stat() call on every unknown file, since
dirstate.walk() has a stat object but does not pass it to ignore().
That stat() call is also hard, since ignore() gets a repo-relative
path, which means a repo.wjoin() is necessary. Ugh. ;-(
Thanks --
Greg
More information about the Mercurial
mailing list