[PATCH] highlight: do not use guess_lexer functions. they use too much CPU time for certain inputs
Ralf Schmitt
schmir at gmail.com
Wed Apr 2 16:53:44 CDT 2008
On Wed, Apr 2, 2008 at 10:45 PM, Matt Mackall <mpm at selenic.com> wrote:
>
> On Wed, 2008-04-02 at 21:59 +0200, Ralf Schmitt wrote:
> > # HG changeset patch
> > # User ralf at brainbot.com
> > # Date 1207165818 -7200
> > # Node ID 50015149baa0dbf1b7066f0356b65f492ed78450
> > # Parent 101526031d06d184559ae797687e50661b96156e
> > highlight: do not use guess_lexer functions. they use too much CPU time
> for certain inputs.
>
> Does certain input mean big inputs? Can we send some truncated source to
> the guesser instead?
I reported this some time ago:
http://selenic.com/pipermail/mercurial/2008-March/018029.html
The file where this happened for me is a php file with around 2000 lines
(140k).
I wrote a short script to measure the time it takes to run
guess_lexer_for_filename on truncated input:
from pygments.lexers import guess_lexer_for_filename
text=open("Collection.i18n.php").read()
import time
size=512
while 1:
stime=time.time()
for run in range(10):
guess_lexer_for_filename("collection.i18n.php", text[:size],
encoding="utf-8")
print (time.time()-stime)/10, size
size+=512
It prints the following values (first row is time needed in seconds, second
row is size in bytes):
0.00721120834351 512
0.00744049549103 1024
0.0433429002762 1536
0.161764788628 2048
0.34955329895 2560
0.627179193497 3072
0.958257818222 3584
1.46866378784 4096
2.11897850037 4608
2.94355890751 5120
3.93533871174 5632
5.09589328766 6144
This is on a 2.4 Ghz CPU.
Regards,
- Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://selenic.com/pipermail/mercurial/attachments/20080402/9ce0e059/attachment.htm
More information about the Mercurial
mailing list