[PATCH] highlight: do not use guess_lexer functions. they use too much CPU time for certain inputs
Brendan Cully
brendan at kublai.com
Thu Apr 3 18:49:55 CDT 2008
yikes. How about we cap it at 1K? That'll probably work 95% of the
time, no?
On Wednesday, 02 April 2008 at 23:53, Ralf Schmitt wrote:
>
>
> On Wed, Apr 2, 2008 at 10:45 PM, Matt Mackall <mpm at selenic.com> wrote:
>
>
> On Wed, 2008-04-02 at 21:59 +0200, Ralf Schmitt wrote:
> > # HG changeset patch
> > # User ralf at brainbot.com
> > # Date 1207165818 -7200
> > # Node ID 50015149baa0dbf1b7066f0356b65f492ed78450
> > # Parent 101526031d06d184559ae797687e50661b96156e
> > highlight: do not use guess_lexer functions. they use too much CPU time
> for certain inputs.
>
> Does certain input mean big inputs? Can we send some truncated source to
> the guesser instead?
>
>
> I reported this some time ago:
> http://selenic.com/pipermail/mercurial/2008-March/018029.html
> The file where this happened for me is a php file with around 2000 lines
> (140k).
>
> I wrote a short script to measure the time it takes to run
> guess_lexer_for_filename on truncated input:
> from pygments.lexers import guess_lexer_for_filename
>
> text=open("Collection.i18n.php").read()
>
> import time
> size=512
> while 1:
> stime=time.time()
> for run in range(10):
> guess_lexer_for_filename("collection.i18n.php", text[:size], encoding=
> "utf-8")
> print (time.time()-stime)/10, size
>
> size+=512
>
>
> It prints the following values (first row is time needed in seconds, second row
> is size in bytes):
>
> 0.00721120834351 512
> 0.00744049549103 1024
> 0.0433429002762 1536
> 0.161764788628 2048
> 0.34955329895 2560
> 0.627179193497 3072
> 0.958257818222 3584
> 1.46866378784 4096
> 2.11897850037 4608
> 2.94355890751 5120
> 3.93533871174 5632
> 5.09589328766 6144
>
> This is on a 2.4 Ghz CPU.
>
> Regards,
> - Ralf
>
> _______________________________________________
> Mercurial mailing list
> Mercurial at selenic.com
> http://selenic.com/mailman/listinfo/mercurial
More information about the Mercurial
mailing list