Re: Can I beat perl at grep-like processing speed?

2006-12-29 Thread Tim Smith


you may not be able to beat perl's regex speed, but you can take some steps to 
speed up your python program using map and filter

here's a modified python program that will do your search faster

#!/usr/bin/env python
import re
r = re.compile(r'destroy', re.IGNORECASE)

def stripit(x):
  return x.rstrip("\r\n")

print "\n".join( map(stripit, filter(r.search, file('bigfile'))) )

#time comparison on my machine
real0m0.218s
user0m0.210s
sys 0m0.010s

real0m0.464s
user0m0.450s
sys 0m0.010s

#original time comparison on my machine

real0m0.224s
user0m0.220s
sys 0m0.010s

real0m0.508s
user0m0.510s
sys 0m0.000s

also, if you replace the regex with a test like lambda x: 
x.lower().find("destroy") != -1, you will get really close to the speed of 
perl's (its possible perl will even take this shortcut when getting such a 
simple regex

#here's the times when doing the search this way
real0m0.221s
user0m0.210s
sys 0m0.010s

real0m0.277s
user0m0.280s
sys 0m0.000s

 -- Tim

-- On 12/29/06 "js " <[EMAIL PROTECTED]> wrote:

> Just my curiosity.
> Can python beats perl at speed of grep-like processing?
> 
> $ wget http://www.gutenberg.org/files/7999/7999-h.zip
> $ unzip 7999-h.zip
> $ cd 7999-h
> $ cat *.htm > bigfile
> $ du -h bigfile
> du -h bigfile
> 8.2M  bigfile
> 
> -- grep.pl --
> #!/usr/local/bin/perl
> open(F, 'bigfile') or die;
> 
> while() {
>   s/[\n\r]+$//;
>   print "$_\n" if m/destroy/oi;
> }
> -- END --
> -- grep.py --
> #!/usr/bin/env python
> import re
> r = re.compile(r'destroy', re.IGNORECASE)
> 
> for s in file('bigfile'):
>   if r.search(s): print s.rstrip("\r\n")
> -- END --
> 
> $ time perl grep.pl  > pl.out; time python grep.py > py.out
> real  0m0.168s
> user  0m0.149s
> sys   0m0.015s
> 
> real  0m0.450s
> user  0m0.374s
> sys   0m0.068s
> # I used python2.5 and perl 5.8.6
> -- 
> http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PEP 3107 Function Annotations for review and comment

2006-12-30 Thread Tim Smith

here's a potentially nifty way of adding decorators to input args for python:

  def a(int(arg1), arg2, tuple(arg3)):
#arg1 is an int (or was converted to an int)
#arg2's type is not known (ie this decoration is optional)
#arg3 is a tuple (may have been a list coming in, but is now a tuple)
pass

this would add optional conversion of input arguments to desired types
(you could drop the parens, making it more like standard type syntax, but I put 
them there to intone that the int() method will be called on every arg1 coming 
in, and so on)

this would also add ability to write your own conversion functions to handle 
type checking as arguments come into a function

should add little to no overhead (as you are likely doing this manually like so 
if desired:
  def a(arg1, arg2, arg3):
arg1 = int(arg1)
arg3 = tuple(arg3)
pass

addendum:
  any type conversion method should throw ValueError on failure (this would 
allow python to catch this error and throw a new exception (InputArgError) or 
something

so this:
  def a(int(arg1), arg2, tuple(arg3)):
pass

would more or less translate to this:
  def a(arg1, arg2, arg3):
try:
  arg1 = int(arg1)
  arg3 = tuple(arg3)
except ValueError:
  raise InputArgError("what went wrong")
pass

it would likely be desired to create some extra builtin functions like:
  convdict, convlist, convtuple, that if input is already this type, will 
return the input unmodified, (as opposed to calling a constructor on dict, 
list, tuple to create a whole new object (copying all the data))

another nice side effect of this is it adds the ability to call by value 
instead of by reference:
  def a(list(b)):
pass #call by value

  def a(convlist(b)):
pass #call by reference (unless input type wasn't list)

 -- Tim



-- On 12/30/06 "John Roth" <[EMAIL PROTECTED]> wrote:

> BJörn Lindqvist wrote:
> > On 12/29/06, Tony Lownds <[EMAIL PROTECTED]> wrote:
> > > Rationale
> > > =
> > >
> > > Because Python's 2.x series lacks a standard way of annotating a
> > > function's parameters and return values (e.g., with information about
> > > what type a function's return value should be), a variety of tools
> > > and libraries have appeared to fill this gap [#tailexamp]_.  Some
> > > utilise the decorators introduced in "PEP 318", while others parse a
> > > function's docstring, looking for annotations there.
> > >
> > > This PEP aims to provide a single, standard way of specifying this
> > > information, reducing the confusion caused by the wide variation in
> > > mechanism and syntax that has existed until this point.
> >
> > I think this rationale is very lacking and to weak for such a big
> > change to Python. I definitely like to see it expanded.
> >
> > The reference links to two small libraries implementing type checking
> > using decorators and doc strings. None of which to seem to be very
> > popular in the Python community. Surely, those two libraries *alone*
> > can't be enough of a motivation for this? To me, it is far from
> > self-evident what purpose function annotations would serve.
> >
> > I also wonder why a very obtrusive syntax addition is needed when it
> > clearly is possible to annotate functions in today's Python. Why is
> > syntax better than just adding a function annotation decorator to the
> > standard library?
> >
> > @annotate(a = int, b = dict, c = int)
> > def foo(a, b, c = 5):
> > ...
> >
> > Are decorators too ugly?
> >
> > --
> > mvh Björn
> 
> The problem I have with it is that it doesn't solve the problem
> I've got, and I can see some user requests to use it rather than
> the metadata solution I've got now in Python FIT. Neither do
> decorators, by the way.
> 
> So, what are the problems I see?
> 
> First, it only handles functions/methods. Python FIT needs
> metadata on properties and assignable/readable attributes
> of all kinds. So in no sense is it a replacement. Parenthetically,
> neither is the decorator facility, and for exactly the same reason.
> 
> Second, it has the potential to make reading the function
> header difficult. In the languages I'm familiar with, static type
> declarations are a very few, somewhat well chosen words.
> In this proposal, it can be a general expression. In Python
> FIT, that could well turn into a full blown dictionary with
> multiple keys.
> 
> Third, it's half of a proposal. Type checking isn't the only use
> for metadata about functions/methods, classes, properties
> and other objects, and the notion that there are only going to
> be a small number of non-intersecting libraries out there is
> an abdication of responsibility to think this thing through.
> 
> I should note that there are quite a few packages out there
> that use some form of annotation, be they comments
> (like Ned Bachelder's coverage analyzer and the two
> lint packages I'm aware of), docstrings, decorators or
> auxilliary dictionarys (like Python FIT, and a possible
> Pyth