Issue951

Title Valid regular expression breaks .hgignore
Priority bug Status resolved
Superseder Nosy List ThomasAH, bos, dlipin, jglick, jskrivanek, lcantey, mpm, pmezard, rhc, sborho
Assigned To Topics patch, windows

Created on 2008-01-29.10:48:12 by jskrivanek, last changed 2008-03-22.18:43:02 by mpm.

Files
File name Uploaded Type Edit Remove
.hgignore jskrivanek, 2008-01-29.10:48:11 application/octet-stream
Messages
msg5689 (view) Author: mpm Date: 2008-03-22.18:43:02
A fix went in 59a9dc9562e2, marking resolved
msg5270 (view) Author: mpm Date: 2008-02-14.16:03:27
Note that generally you can work around this problem by combining lines of your
ignore file into regular expressions. ie:

some/long/path/a
some/long/path/b
some/long/path/even/longer/c
some/long/path/even/longer/d

becomes:

some/long/path/(a|b|(even/longer(c|d)))

which is both faster and smaller.
msg5269 (view) Author: rhc Date: 2008-02-14.15:53:43
Thanks for the test info!

I ran that command and got a "2" output, which I am assuming means "2 bytes",
i.e., that it is a 16-bit regex engine. I think almost all of us on the project
use Darwin Port for our python and Hg installations. I'm thinking now that Port
may be building Python with the tiny regex engine by default.

Since I only installed Hg via Port, I can't really test the patch (my
apologies). I'll try rebuilding Python from source and see if they will let me
specify the 32-bit engine during build.

Thanks!
msg5268 (view) Author: mpm Date: 2008-02-14.15:45:54
I think the test is:

python -c 'import _sre; print _sre.CODESIZE'
msg5267 (view) Author: ThomasAH Date: 2008-02-14.14:48:58
The patch is in the main repo:
http://www.selenic.com/hg/rev/59a9dc9562e2
msg5266 (view) Author: rhc Date: 2008-02-14.14:42:21
Guess my bug report got switched to this thread (which is fine). I am unfamiliar
with this bug tracking system, so please forgive me if I ask a silly question.
However, I don't see the patch mpm refers to, or else we would give it a try.

Also, as a further data point, please note that we see this truncation problem
using Python 2.5.1, 2.3.5, and 2.4.4 on Mac OS-X, both on Intel and PPC, under
both 10.5.x and 10.4.x. Your thread implies that it is associated with Pythons
built with 16-bit regex engines - can you give me an idea of how I find out if
Python was built with 16 vs 32 bit regex?

Thanks
Ralph
msg5262 (view) Author: mpm Date: 2008-02-13.22:13:23
I've pushed a version of my patch below to mainline in 59a9dc9562e2. But no
one's told me whether it works yet. Will set this to testing.
msg5259 (view) Author: ThomasAH Date: 2008-02-13.21:39:10
added nosy from superseded issue975
msg5126 (view) Author: mpm Date: 2008-02-06.20:16:25
On Wed, 2008-02-06 at 19:51 +0000, Lee Cantey wrote:
> Lee Cantey <lcantey@gmail.com> added the comment:
> 
> Was just pointed to this.  The Windows installers are based on Python 2.4.4.  If 
> this is a problem with the 2.4 series regex engine I can update to the 2.5 series.

We may also be able to simply set a threshold above which we break our
large regexes, regardless of whether the compiler chokes on them. Say at
50k. Something like this:

diff -r b7f44f01a632 mercurial/util.py
--- a/mercurial/util.py	Tue Feb 05 16:09:21 2008 -0600
+++ b/mercurial/util.py	Wed Feb 06 14:10:43 2008 -0600
@@ -459,6 +459,8 @@
             return
         try:
             pat = '(?:%s)' % '|'.join([regex(k, p, tail) for (k, p) in pats])
+            if len(pat) > 50000:
+                raise OverflowError()
             return re.compile(pat).match
         except OverflowError:
             # We're using a Python with a tiny regex engine and we

But I think this is only a problem with Pythons built with 16-bit regex
engines and mine is 32-bit.
msg5125 (view) Author: lcantey Date: 2008-02-06.19:51:47
Was just pointed to this.  The Windows installers are based on Python 2.4.4.  If 
this is a problem with the 2.4 series regex engine I can update to the 2.5 series.
msg5115 (view) Author: jskrivanek Date: 2008-02-06.08:36:49
Any plans for update of installer at http://mercurial.berkwood.com/. I tried
installer from http://qct.sourceforge.net/Mercurial-NSI.html. I can't reproduce
this issue with Hg from this installer but it is a really pain to install it.
For example, it has hard coded location of python.exe.
msg5071 (view) Author: jskrivanek Date: 2008-01-31.20:51:08
Steve Borho wrote:
> I was going to suggest an 'hg version --verbose'.  The TortoiseHg
> installers at least have an 'about' dialog that lists Hg version, Python
> version, and the Gtk library versions.

I am using this one:
http://mercurial.berkwood.com/binaries/Mercurial-0.9.5-d39af2eabb8c.exe

J.
msg5068 (view) Author: sborho Date: 2008-01-31.19:56:30
I was going to suggest an 'hg version --verbose'.  The TortoiseHg
installers at least have an 'about' dialog that lists Hg version, Python
version, and the Gtk library versions.
msg5066 (view) Author: mpm Date: 2008-01-31.19:52:44
We should probably report the Python version somewhere in the trace.
msg5065 (view) Author: sborho Date: 2008-01-31.19:51:36
On Thu, 2008-01-31 at 19:42 +0000, Matt Mackall wrote:
> Matt Mackall <mpm@selenic.com> added the comment:
> 
> Let's ask Steve then.

My installers report a version number like 0.9.5+win32extras and they
are all built on Python-2.5.1.  It looks like you are using one of Lee
Cantey's packages.  I think his are built on Python-2.4.
msg5064 (view) Author: mpm Date: 2008-01-31.19:42:35
Let's ask Steve then.
msg5063 (view) Author: jskrivanek Date: 2008-01-31.19:40:00
I am using hg.exe from windows installer. I don't know what Python is inside.
msg5060 (view) Author: mpm Date: 2008-01-31.18:32:02
Still need to know what Python you're running.
msg5046 (view) Author: jskrivanek Date: 2008-01-31.08:34:16
Yes, this is very likely the same issue. If I modify my .hgignore I get
sometimes the error below. I am using Hg 0.9.5 from windows installer and I
don't know which Python is in it.

D:\Development\hg\test1>hg stat -A
** unknown exception encountered, details follow
** report bug details to http://www.selenic.com/mercurial/bts
** or mercurial@selenic.com
** Mercurial Distributed SCM (version 0.9.5)
Traceback (most recent call last):
  File "hg", line 14, in ?
  File "mercurial\dispatch.pyc", line 20, in run
  File "mercurial\dispatch.pyc", line 29, in dispatch
  File "mercurial\dispatch.pyc", line 45, in _runcatch
  File "mercurial\dispatch.pyc", line 348, in _dispatch
  File "mercurial\dispatch.pyc", line 401, in _runcommand
  File "mercurial\dispatch.pyc", line 357, in checkargs
  File "mercurial\dispatch.pyc", line 340, in <lambda>
  File "mercurial\commands.pyc", line 2571, in status
  File "mercurial\localrepo.pyc", line 909, in status
  File "mercurial\dirstate.pyc", line 514, in status
RuntimeError: internal error in regular expression engine
msg5040 (view) Author: mpm Date: 2008-01-30.22:55:24
Is this related to the report from issue955? I suspect we're hitting whatever's
alluded to by this:

http://mail.python.org/pipermail/python-list/2006-January/363280.html

What Python version are you running?
msg5023 (view) Author: bos Date: 2008-01-29.19:41:57
I can't reproduce this either, on Linux.
msg5021 (view) Author: pmezard Date: 2008-01-29.13:20:57
> Please, try to replace dir/a.class with build\a.class as in my original message.

Still cannot reproduce it.
I am using the slightly modified script:

----
rd /s /q t
call hg init t
cd t
cp ..\.hgignore .
echo a > a
echo a > a.class
mkdir build
echo a > build\a.class
hg st -A
----

Only differences are I need a "call" because my "hg" is a batch file, I use echo
instead of touch and actually create build/a.class instead of a/a.class (likely
a typo in your example).

The last status call outputs:

----
t>hg st -A
? .hgignore
? a
I a.class
I build\a.class
----
msg5020 (view) Author: jskrivanek Date: 2008-01-29.13:00:57
Please, try to replace dir/a.class with build\a.class as in my original message.
msg5019 (view) Author: jskrivanek Date: 2008-01-29.12:58:26
Doesn't seem to be a cygwin issue because I can reproduce also from
command-line. I installed Hg from windows installer.
msg5018 (view) Author: pmezard Date: 2008-01-29.12:54:02
Probably a cygwin issue, I cannot reproduce it on Windows (crew or 0.9.5) only.

Unfortunately, I am not a cygwin user and I have no idea what people mean when
they say they use mercurial under cygwin. Could you explain me that ?

- Are you running it from cygwin shell ?
- Are you running a binary version or from sources ? Compiled natively or in
cygwin ? How do you install it ?
msg5017 (view) Author: jskrivanek Date: 2008-01-29.12:33:16
I am on WindowXP, with Cygwin, Hg 0.9.5. Probably Windows specific because I
haven't heard about this issue on other OS.
msg5016 (view) Author: pmezard Date: 2008-01-29.12:22:42
I cannot reproduce with crew tip or 0.9.5:

$ tree
.
|-- a
|-- a.class
`-- dir
    `-- a.class

1 directory, 3 files
$ hg st -A
? .hgignore
? a
I a.class
I dir/a.class
$ grep 'test/j2ee' .hgignore 
^j2ee.kit/test/qa-functional/src/org/netbeans/test/j2ee/multiview/\.Utils\.java\.swp$
^j2ee.kit/test/qa-functional/src/org/netbeans/test/j2ee/persistence/\.PersistenceUnitTest\.java\.swp$

Which version are you using, on what system ?
msg5015 (view) Author: jskrivanek Date: 2008-01-29.10:48:11
The pattern below causes that othere patterns in .hgignore are not taken into
account:

^j2ee.kit/test/qa-functional/src/org/netbeans/test/j2ee/multiview/\.Utils\.java\.swp$
^j2ee.kit/test/qa-functional/src/org/netbeans/test/j2ee/persistence/\.PersistenceUnitTest\.java\.swp$

To reproduce:

hg init
touch a
mkdir build
touch a/a.class
cp <attached .hgignore> .hgignore
hg stat
? .hgignore
? a
? build\a.class
Remove or comment two lines above (line 243, 244)
hg stat
? .hgignore
? a

The other strange thing is that if you comment just one line, it also starts to
work.
History
Date User Action Args
2008-03-22 18:43:02mpmsetstatus: testing -> resolved
nosy: mpm, bos, ThomasAH, sborho, lcantey, pmezard, jglick, dlipin, jskrivanek, rhc
messages: + msg5689
2008-02-14 16:03:28mpmsetnosy: mpm, bos, ThomasAH, sborho, lcantey, pmezard, jglick, dlipin, jskrivanek, rhc
messages: + msg5270
2008-02-14 15:53:43rhcsetnosy: mpm, bos, ThomasAH, sborho, lcantey, pmezard, jglick, dlipin, jskrivanek, rhc
messages: + msg5269
2008-02-14 15:45:54mpmsetnosy: mpm, bos, ThomasAH, sborho, lcantey, pmezard, jglick, dlipin, jskrivanek, rhc
messages: + msg5268
2008-02-14 14:48:58ThomasAHsetnosy: mpm, bos, ThomasAH, sborho, lcantey, pmezard, jglick, dlipin, jskrivanek, rhc
messages: + msg5267
2008-02-14 14:42:22rhcsetnosy: mpm, bos, ThomasAH, sborho, lcantey, pmezard, jglick, dlipin, jskrivanek, rhc
messages: + msg5266
2008-02-13 22:13:23mpmsetstatus: chatting -> testing
nosy: mpm, bos, ThomasAH, sborho, lcantey, pmezard, jglick, dlipin, jskrivanek, rhc
messages: + msg5262
2008-02-13 21:39:11ThomasAHsetnosy: + rhc, ThomasAH
messages: + msg5259
2008-02-11 13:20:19djcsettopic: + patch
nosy: mpm, bos, sborho, lcantey, pmezard, jglick, dlipin, jskrivanek
2008-02-11 04:36:42mpmlinkissue975 superseder
2008-02-08 08:43:44dlipinsetnosy: + dlipin
2008-02-06 20:16:33mpmsetnosy: mpm, bos, sborho, lcantey, pmezard, jglick, jskrivanek
messages: + msg5126
2008-02-06 19:51:50lcanteysetnosy: + lcantey
messages: + msg5125
2008-02-06 08:36:49jskrivaneksetnosy: mpm, bos, sborho, pmezard, jglick, jskrivanek
messages: + msg5115
2008-01-31 20:51:08jskrivaneksetnosy: mpm, bos, sborho, pmezard, jglick, jskrivanek
messages: + msg5071
2008-01-31 19:56:30sborhosetnosy: mpm, bos, sborho, pmezard, jglick, jskrivanek
messages: + msg5068
2008-01-31 19:55:06mpmlinkissue955 superseder
2008-01-31 19:52:44mpmsetnosy: mpm, bos, sborho, pmezard, jglick, jskrivanek
messages: + msg5066
2008-01-31 19:51:37sborhosetnosy: mpm, bos, sborho, pmezard, jglick, jskrivanek
messages: + msg5065
2008-01-31 19:42:35mpmsetnosy: + sborho
messages: + msg5064
2008-01-31 19:40:00jskrivaneksetnosy: mpm, bos, pmezard, jglick, jskrivanek
messages: + msg5063
2008-01-31 18:32:03mpmsetnosy: mpm, bos, pmezard, jglick, jskrivanek
messages: + msg5060
2008-01-31 08:34:17jskrivaneksetnosy: mpm, bos, pmezard, jglick, jskrivanek
messages: + msg5046
2008-01-30 22:55:24mpmsetnosy: + mpm
messages: + msg5040
2008-01-29 19:42:01bossettopic: + windows
nosy: + bos
messages: + msg5023
2008-01-29 15:17:09jglicksetnosy: + jglick
2008-01-29 13:20:58pmezardsetmessages: + msg5021
2008-01-29 13:00:58jskrivaneksetmessages: + msg5020
2008-01-29 12:58:26jskrivaneksetmessages: + msg5019
2008-01-29 12:54:02pmezardsetmessages: + msg5018
2008-01-29 12:33:16jskrivaneksetmessages: + msg5017
2008-01-29 12:22:42pmezardsetstatus: unread -> chatting
nosy: + pmezard
messages: + msg5016
2008-01-29 10:48:12jskrivanekcreate