[EMAIL PROTECTED] (Ilpo Nyyss�nen) wrote:
> Of course it caches those when running. The point is that it needs to
> recompile every time you have restarted the program. With short lived
> command line programs this really can be a problem.
Are you speculating that it might be a problem, or saying that you have
seen it be a problem in a real-life program?
I just generated a bunch of moderately simple regexes from a dictionary
wordlist. Looks something like:
Roy-Smiths-Computer:play$ head exps
a.*a[0-9]{34}
a.*ah[0-9]{34}
a.*ahed[0-9]{34}
a.*ahing[0-9]{34}
a.*ahs[0-9]{34}
a.*al[0-9]{34}
a.*alii[0-9]{34}
a.*aliis[0-9]{34}
a.*als[0-9]{34}
a.*ardvark[0-9]{34}
Then I ran them through a little script that does:
for exp in sys.stdin.readlines():
regex = re.compile (exp)
and timed it for various numbers of lines. On my G4 Powerbook (1 GHz
PowerPC), I'm compiling about 1000 regex's per second:
Roy-Smiths-Computer:play$ time head -5000 < exps | ./regex.py
real 0m5.208s
user 0m4.690s
sys 0m0.090s
So, my guess is that unless you're compiling 100's of regexes each time you
start up, the one-time compilation costs are probably not significant.
> And yes, I have read the source of sre.py and I have made an ugly
> module that digs the compiled data and pickles it to a file and then
> in next startup it reads that file and puts the stuff back to the
> cache.
That's exactly what I would have done if I really needed to improve startup
speed. In fact, I did something like that many moons ago, in a previous
life. See R. Smith, "A finite state machine algorithm for finding
restriction sites and other pattern matching applications", CABIOS, Vol 4,
no. 4, 1988. In that case, I had about 1200 patterns I was searching for
(and doing it on hardware running about 1% of the speed of my current
laptop).
BTW, why did you have to dig out the compiled data before pickling it?
Could you not have just pickled whatever re.compile() returned?
--
http://mail.python.org/mailman/listinfo/python-list