[PATCH] highlight: do not use guess_lexer functions. they use too much CPU time for certain inputs

Brendan Cully brendan at kublai.com
Thu Apr 3 18:49:55 CDT 2008


yikes. How about we cap it at 1K? That'll probably work 95% of the
time, no?

On Wednesday, 02 April 2008 at 23:53, Ralf Schmitt wrote:
> 
> 
> On Wed, Apr 2, 2008 at 10:45 PM, Matt Mackall <mpm at selenic.com> wrote:
> 
> 
>     On Wed, 2008-04-02 at 21:59 +0200, Ralf Schmitt wrote:
>     > # HG changeset patch
>     > # User ralf at brainbot.com
>     > # Date 1207165818 -7200
>     > # Node ID 50015149baa0dbf1b7066f0356b65f492ed78450
>     > # Parent  101526031d06d184559ae797687e50661b96156e
>     > highlight: do not use guess_lexer functions. they use too much CPU time
>     for certain inputs.
> 
>     Does certain input mean big inputs? Can we send some truncated source to
>     the guesser instead?
> 
>  
> I reported this some time ago:
> http://selenic.com/pipermail/mercurial/2008-March/018029.html
> The file where this happened for me is a php file with around 2000 lines
> (140k).
> 
> I wrote a short script to measure the time it takes to run
> guess_lexer_for_filename on truncated input:
> from pygments.lexers import guess_lexer_for_filename
> 
> text=open("Collection.i18n.php").read()
> 
> import time
> size=512
> while 1:
>     stime=time.time()
>     for run in range(10):
>         guess_lexer_for_filename("collection.i18n.php", text[:size], encoding=
> "utf-8")
>     print (time.time()-stime)/10, size
>    
>     size+=512
> 
> 
> It prints the following values (first row is time needed in seconds, second row
> is size in bytes):
> 
> 0.00721120834351 512
> 0.00744049549103 1024
> 0.0433429002762 1536
> 0.161764788628 2048
> 0.34955329895 2560
> 0.627179193497 3072
> 0.958257818222 3584
> 1.46866378784 4096
> 2.11897850037 4608
> 2.94355890751 5120
> 3.93533871174 5632
> 5.09589328766 6144
> 
> This is on a 2.4 Ghz CPU.
> 
> Regards,
> - Ralf
> 

> _______________________________________________
> Mercurial mailing list
> Mercurial at selenic.com
> http://selenic.com/mailman/listinfo/mercurial



More information about the Mercurial mailing list