Re: Regex Speed

2007-02-21 Thread Pop User
[EMAIL PROTECTED] wrote:
> While creating a log parser for fairly large logs, we have run into an
> issue where the time to process was relatively unacceptable (upwards
> of 5 minutes for 1-2 million lines of logs). In contrast, using the
> Linux tool grep would complete the same search in a matter of seconds.
>   
Its very hard to beat grep depending on the nature of the regex you are 
searching using. The regex engines in python/perl/php/ruby have traded 
the speed of grep/awk for the ability to do more complex searches.

http://swtch.com/~rsc/regexp/regexp1.html

This might not be your problem but if it is you can always popen grep.

It would be nice if there were a Thompson NFA re module.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex Speed

2007-02-21 Thread Pop User
John Machin wrote:
> Or a Glushkov NFA simulated by bit parallelism re module ... see
> http://citeseer.ist.psu.edu/551772.html
> (which Russ Cox (author of the paper you cited) seems not to have
> read).
>   
NR-grep looks interesting, I'll read that. Thanks.
> Cox uses a "pathological regex" (regex = "a?" * 29 + "a" * 29, in
> Python code) to make his point: grep uses a Thompson gadget and takes
> linear time, while Python perl and friends use backtracking and go off
> the planet.
>
>   
It might be pathological but based on the original posters timings his 
situation seems to relate.
My main point was that its quite possible he isn't going to get faster 
than grep regardless of
the language he uses and if grep wins, use it.  I frequently do. 

> Getting back to the "It would be nice ..." bit: yes, it would be nice
> to have even more smarts in re, but who's going to do it? It's not a
> "rainy Sunday afternoon" job :
One of these days.  :)



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficient way of testing for substring being one of a set?

2008-04-03 Thread Pop User
[EMAIL PROTECTED] wrote:
 > Dennis Benzinger:
 >> You could use the Aho-Corasick algorithm > Aho-Corasick_algorithm>.
 >> I don't know if there's a Python implementation yet.
 >
 > http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/
 >

http://nicolas.lehuen.com/download/pytst/ can do it as well.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python vs. grep

2008-05-07 Thread Pop User

Anton Slesarev wrote:


But I have some problem with writing performance grep analog.



I don't think you can ever catch grep.  Searching is its only purpose in 
life and its very good at it.  You may be able to come closer, this 
thread relates.


http://groups.google.com/group/comp.lang.python/browse_thread/thread/2f564523f476840a/d9476da5d7a9e466

This relates to the speed of re.  If you don't need regex don't use re. 
 If you do need re an alternate re library might be useful but you 
aren't going to catch grep.



--
http://mail.python.org/mailman/listinfo/python-list


Re: freebsd and multiprocessing

2010-03-02 Thread Pop User
On 3/2/2010 12:59 PM, Tim Arnold wrote:
> 
> I'll write some test programs using multiprocessing and see how they
> go before committing to rewrite my current code. I've also been
> looking at 'parallel python' although it may have the same issues.
> http://www.parallelpython.com/
> 

parallelpython works for me on FreeBSD 6.2.

-- 
http://mail.python.org/mailman/listinfo/python-list