Issue886

Title hg status -ui shows unknowns; just -u doesn't
Priority bug Status resolved
Superseder Nosy List alexis, djc, evanp, jglick, mde, mpm, pmezard
Assigned To Topics documentation, patch

Created on 2007-12-20.04:45:55 by evanp, last changed 2008-03-22.18:44:03 by mpm.

Files
File name Uploaded Type Edit Remove
ignore.patch alexis, 2008-01-28.15:34:30 text/plain
Messages
msg5279 (view) Author: alexis Date: 2008-02-14.23:57:06
Changesets b41f0d6a74fc and a1ebd5cd7e55 in crew-stable should fix this.

We still need to fix doc/hgignore.5.txt
msg5002 (view) Author: alexis Date: 2008-01-28.15:52:12
As you noticed, we really want to use the ignore patterns to avoid walking
some directories, effectively ignoring all files anywhere inside that tree,
so I think it's consistent to say that a file is ignored if it matches an
ignore pattern or if its dirname is ignored.

We give a name without a trailing / to the ignore function, so regexps should
not end with a slash.  IIRC glob patterns won't care about this (we first
normalize them and then append something like (?:/|$) when converting to
regexps).
msg5001 (view) Author: jglick Date: 2008-01-28.15:41:13
If we decide that ignore patterns should match directories - i.e. that a file
will be considered ignored if just some ancestor matches an ignored pattern even
though the file's full path does not - then we should also document whether the
directory name must end in /, must not end in /, or may end in / but need not.

Obviously the recommended syntax ought to make it efficient to skip over all
ignored directories (not statwalking them). E.g. in my case, the repo I work on
has dozens of ignore patterns as regexps, two of which each match hundreds of
ignorable subdirs, some of which may in turn contain hundreds of files which are
build products in a deep directory structure. Clearly these build dirs need to
pruned at the root for operations like 'hg stat' to be fast.
msg5000 (view) Author: alexis Date: 2008-01-28.15:34:30
Yes, we should document that ignore patterns also match directories.

I'm attaching a patch that:

- adds a helper function to also call the ignore function on the parent
  directories

- avoids walking ignored directories even if you use "hg status ignored-dir"

- if you run "hg status ignored-file" be careful to avoid putting
  ignored-file in the list of unknown files.

(I still have to update the output of test-status which is "fixed" by this
patch and add some test.)

This should make us more consistent, but won't help with evanp's original
case of wanting to ignore files in the repo root, but not in subdirs.
msg4995 (view) Author: jglick Date: 2008-01-28.02:31:32
Another possibly related bug: in the NB repo we have in .hgignore

^xtest/instance/results$

There is a whole subtree of files here after a build. Curiously, this works
unless you name a particular file in that subtree:

$ hg stat
$ hg stat xtest/instance/results/index.html 
? xtest/instance/results/index.html

Changing the ignore to say

^xtest/instance/results/

or

^xtest/instance/results/.*

or even

^xtest/instance/results/index\.html$

does not fix the results of 'hg stat xtest/instance/results/index.html'.

I would like some clear statement of exactly what the semantics of an .hgignore
entry is supposed to be. With that I might be able to work on patches to fix the
current confused behavior.
msg4679 (view) Author: evanp Date: 2007-12-24.18:33:29
Yes, I think documentation is necessary. I've always written my rooted regexps
as "^path/stuff/" with a trailing slash (presumably causing an extra level of
recursion and therefore reduced performance) because I didn't realize matching
the parent directory was sufficient, or indeed did anything at all.
msg4676 (view) Author: pmezard Date: 2007-12-24.12:42:29
So, should we document that "a file is ignored its path or the path of any of
its subcomponents is matched by a pattern in .hgignore" ?

hgignore.5 does not really mention directories but assuming the following
repository:

$ hg st
? a
? a.c
? x.c/b
? x.c/b.c

$ cat >.hgignore <<EOF
syntax: glob
*.c
EOF

$ hg st
? .hgignore
? a

$ hg st -i
I a.c
I x.c/b
I x.c/b.c

Reading the man page, it's not clear that x.c/b should be ignored.

The current issue comes from assuming that if the ignore() match function
matches a path, it matches any of its subpaths. Which can be wrong with rooted
regexps.
msg4663 (view) Author: mde Date: 2007-12-20.21:27:52
For the record, my use case is trying to store $HOME in hg.  So "sparseness" in
status is important (viz having the right regex).  The same applies to storing
say /etc, which a lot of people talk about doing these days.  In $HOME, you very
likely don't want dirs like Mail or .mozilla under version control.

The FAQ entry on this subject (on the wiki) doesn't go into any depth on how to
ignore all the extra top level files and dirs, while still giving accurate
status on unknown files/dirs down in controlled subdirs.
msg4661 (view) Author: mde Date: 2007-12-20.21:02:55
$ hg st -u .

does appear to work.  I'll stick with it for now.

It also surprised me a bit to see the various "hg st" issues apply to the whole
WD when no arg is given.  So I'm learning to use '.', which is fine.  Just
getting up to speed.  Thanks for a great tool!
msg4660 (view) Author: mpm Date: 2007-12-20.20:56:42
"What appears to be happening is that your ignore pattern is
ignoring the x/ directory. As it ought to. Adding in the -ui flags is revealing
a bug where we still traverse the ignored directory and don't ignore its contents."
msg4659 (view) Author: mde Date: 2007-12-20.20:51:48
Upon deeper inspection, bzr just doesn't recurse into "unknown" dirs, so it just
appeared to do the Right Thing (sorry for the cliche).

IMO, the Right Thing (extending evanp's test case) would be for hg to report
that c is missing, after x and x/b get added.

$ hg add x
adding x/b
$ cd x
$ touch c
$ hg st -u
<oops, nothing reported, but should be "? x/c">
$ hg st -A
A x/b
? x/c
I .hgignore
I a

The last command (-A) looks like the Right Thing: the regex appears to work, and
hg knew to include it as "not tracked".  I just wonder why the -u isn't
reporting anything.
msg4658 (view) Author: mpm Date: 2007-12-20.20:17:25
What, are you claiming, is the Right Thing?
msg4657 (view) Author: mde Date: 2007-12-20.20:12:02
Not that you need to be *just like* bzr here, but I'll add this tidbit for
reference...

bzr has an "ignore/ignored" pair of commands.  For the said RE, it does the
Right Thing.  And "ignored" gives a nice dump of which pattern matched which
file (my test case is slightly different but you get the idea):

$ bzr ignored
.hg                                                RE:^[^/]*$
.hgignore                                          RE:^[^/]*$
.hgignore.swp                                      *.sw[nop]
A                                                  RE:^[^/]*$
C                                                  RE:^[^/]*$
z                                                  RE:^[^/]*$
msg4654 (view) Author: mpm Date: 2007-12-20.08:03:38
Interesting. What appears to be happening is that your ignore pattern is
ignoring the x/ directory. As it ought to. Adding in the -ui flags is revealing
a bug where we still traverse the ignored directory and don't ignore its contents.
msg4652 (view) Author: evanp Date: 2007-12-20.04:45:54
The following is hg version 9d6ad26fab10 (tip) from http://selenic.com/hg,
although I originally noticed it in 0.9.5:

$ hg init
$ touch a
$ mkdir x
$ touch x/b
$ hg status
? a
? x/b
$ echo -e 'syntax: regexp\n^[^/]*$' >.hgignore
$ hg status
$ hg status -u
$ hg status -i
I .hgignore
I a
$ hg status -ui
? x/b
I .hgignore
I a

Looks like there's some sort of bug preventing unknown files from being included
in status output, but only for certain option combinations.

(The intended effect of the .hgignore pattern was to ignore files in the
repository root, but not those in any subdirectory.)
History
Date User Action Args
2008-03-22 18:44:03mpmsetstatus: testing -> resolved
nosy: mpm, alexis, pmezard, evanp, jglick, mde, djc
2008-03-10 07:37:17djcsettopic: + documentation
nosy: mpm, alexis, pmezard, evanp, jglick, mde, djc
2008-03-10 07:36:58djcsetnosy: + djc
2008-02-14 23:57:06alexissetstatus: chatting -> testing
nosy: mpm, alexis, pmezard, evanp, jglick, mde
messages: + msg5279
2008-02-11 13:13:18djcsettopic: + patch
nosy: mpm, alexis, pmezard, evanp, jglick, mde
2008-01-28 15:52:12alexissetnosy: mpm, alexis, pmezard, evanp, jglick, mde
messages: + msg5002
2008-01-28 15:41:13jglicksetnosy: mpm, alexis, pmezard, evanp, jglick, mde
messages: + msg5001
2008-01-28 15:34:31alexissetfiles: + ignore.patch
nosy: + alexis
messages: + msg5000
2008-01-28 02:31:33jglicksetnosy: mpm, pmezard, evanp, jglick, mde
messages: + msg4995
2008-01-21 19:12:07mpmlinkissue938 superseder
2008-01-21 19:12:00mpmsetnosy: + jglick
2007-12-24 18:33:30evanpsetnosy: mpm, pmezard, evanp, mde
messages: + msg4679
2007-12-24 12:42:30pmezardsetnosy: + pmezard
messages: + msg4676
2007-12-20 21:27:53mdesetnosy: mpm, evanp, mde
messages: + msg4663
2007-12-20 21:02:56mdesetnosy: mpm, evanp, mde
messages: + msg4661
2007-12-20 20:56:44mpmsetnosy: mpm, evanp, mde
messages: + msg4660
2007-12-20 20:51:50mdesetnosy: mpm, evanp, mde
messages: + msg4659
2007-12-20 20:17:25mpmsetnosy: mpm, evanp, mde
messages: + msg4658
2007-12-20 20:12:03mdesetnosy: + mde
messages: + msg4657
2007-12-20 08:03:40mpmsetstatus: unread -> chatting
nosy: + mpm
messages: + msg4654
title: hg status -ui shows unkowns; just -u doesn't -> hg status -ui shows unknowns; just -u doesn't
2007-12-20 04:45:55evanpcreate