re module non-greedy matches broken

2005-04-03 Thread lothar
re:
4.2.1 Regular Expression Syntax
http://docs.python.org/lib/re-syntax.html

  *?, +?, ??
  Adding "?" after the qualifier makes it perform the match in non-greedy or
minimal fashion; as few characters as possible will be matched.

the regular expression module fails to perform non-greedy matches as
described in the documentation: more than "as few characters as possible"
are matched.

this is a bug and it needs to be fixed.

examples follow.

[EMAIL PROTECTED] /ntd/vl
$ cat vwre.py
#! /usr/bin/env python

import re

vwre = re.compile("V.*?W")
vwlre = re.compile("V.*?WL")

if __name__ == "__main__":

  newdoc = "V1WVVV2WWW"
  vwli = re.findall(vwre, newdoc)
  print "vwli[], expect", ['V1W', 'V2W']
  print "vwli[], return", vwli

  newdoc = "V1WLV2WV3WV4WLV5WV6WL"
  vwlli = re.findall(vwlre, newdoc)
  print "vwlli[], expect", ['V1WL', 'V4WL', 'V6WL']
  print "vwlli[], return", vwlli

[EMAIL PROTECTED] /ntd/vl
$ python vwre.py
vwli[], expect ['V1W', 'V2W']
vwli[], return ['V1W', 'VVV2W']
vwlli[], expect ['V1WL', 'V4WL', 'V6WL']
vwlli[], return ['V1WL', 'V2WV3WV4WL', 'V5WV6WL']

[EMAIL PROTECTED] /ntd/vl
$ python -V
Python 2.3.3


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: re module non-greedy matches broken

2005-04-03 Thread lothar
this response is nothing but a description of the behavior i reported.

as to whether this behaviour was intended, one would have to ask the module
writer about that.
because of the statement in the documentation, which places no qualification
on how the scan for the shortest possible match is to be done, my guess is
that this problem was overlooked.

to produce a non-greedy (minimal length) match it is required that the start
of the non-greedy part of the match repeatedly be moved right with the last
match of the left-hand part of the pattern (preceding the .*?).

why would someone want a non-greedy (minimal length) match that was not
always non-greedy (minimal length)?



"André Malo" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
* lothar wrote:

> re:
> 4.2.1 Regular Expression Syntax
> http://docs.python.org/lib/re-syntax.html
>
>   *?, +?, ??
>   Adding "?" after the qualifier makes it perform the match in non-greedy
>   or
> minimal fashion; as few characters as possible will be matched.
>
> the regular expression module fails to perform non-greedy matches as
> described in the documentation: more than "as few characters as possible"
> are matched.
>
> this is a bug and it needs to be fixed.

The documentation is just incomplete. Non-greedy regexps still start
matching the leftmost. So instead the longest of the leftmost you get the
shortest of the leftmost. One may consider this as a documentation bug,
yes.

nd
--
# André Malo, <http://www.perlig.de/> #



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: re module non-greedy matches broken

2005-04-04 Thread lothar
how then, do i specify a non-greedy regex
  <1st-pat>*?

that is, such that non-greedy part *?
excludes a match of <1st-pat>

in other words, how do i write regexes for my examples?

what book or books on regexes or with a good section on regexes would you
recommend?
Hopcroft and Ullman?


"André Malo" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> * "lothar" <[EMAIL PROTECTED]> wrote:
>
> > this response is nothing but a description of the behavior i reported.
>
> Then you have not read my response carefully enough.
>
> > as to whether this behaviour was intended, one would have to ask the
module
> > writer about that.
>
> No, I've responded with a view on regexes, not on the module. That is the
way
> _regexes_ work. Non-greedy regexes do not match the minimal-length at all,
they
> are just ... non-greedy (technically the backtracking just stacks the
longest
> instead of the shortest). They *may* match the shortest match, but it's a
> special case. Therefore I've stated that the documentation is incomplete.
>
> Actually your expectations go a bit beyond the documentation. From a
certain
> point of view (matches always start most left) the matches you're seeing
> *are* the minimal-length matches.
>
> > because of the statement in the documentation, which places no
qualification
>

>   that's the
point.
>
> > on how the scan for the shortest possible match is to be done, my guess
is
> > that this problem was overlooked.
>
> In the docs, yes. But buy yourself a regex book and learn for yourself ;-)
> The first thing you should learn about regexes is that the source of pain
> of most regex implementations is the documentation, which is very likely
> to be wrong.
>
> Finally let me ask a question:
>
> import re
> x = re.compile('<.*?>')
> print x.search('..').group(0)
>
> What would you expect to be printed out?  or ? Why?
>
> nd




-- 
http://mail.python.org/mailman/listinfo/python-list


Re: re module non-greedy matches broken

2005-04-04 Thread lothar
with respect to the documentation, the module is broken.

the module does not necessarily deliver a "minimal length" match for a
non-greedy pattern.


"Fredrik Lundh" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> "lothar" wrote:
>
> > this is a bug and it needs to be fixed.
>
> it's not a bug, and it's not going to be "fixed".  search, findall,
finditer, sub,
> etc. all scan the target string from left to right, and process the first
location
> (or all locations) where the pattern matches.
>
> 
>
>
>



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: re module non-greedy matches broken

2005-04-04 Thread lothar
no - in the non-greedy regex
  <1st-pat>*?

<1st-pat>,  and  are arbitrarily complex patterns.

with character classes and negative character classes you do not need
non-greediness anyway.


"John Ridley" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
>
> --- lothar <[EMAIL PROTECTED]> wrote:
> > how then, do i specify a non-greedy regex
> >   <1st-pat>*?
> >
> > that is, such that non-greedy part *?
> > excludes a match of <1st-pat>
> >
> > in other words, how do i write regexes for my examples?
>
> Not sure if I completely understand your explanation, but does this get
> any closer to what your looking for?
>
> >>> vwre = re.compile("V[^V]*?W")
> >>> newdoc = "V1WVVV2WWW"
> >>> re.findall(vwre, newdoc)
> ['V1W', 'V2W']
>
> That is: , then  as few times as possible, then 
>
>
> John Ridley
>
> Send instant messages to your online friends http://uk.messenger.yahoo.com



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: re module non-greedy matches broken

2005-04-05 Thread lothar
a non-greedy match - as implicitly defined in the documentation - is a match
in which there is no proper substring in the return which could also match
the regex.

you are skirting the issue as to why a matcher should not be able to return
a non-greedy match.

there is no theoretical reason why it can not be done.



"André Malo" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> * "lothar" <[EMAIL PROTECTED]> wrote:
>
> > no - in the non-greedy regex
> >   <1st-pat>*?
> >
> > <1st-pat>,  and  are arbitrarily complex
patterns.
>
> The "not" is the problem. Regex patterns are expressed positive by
> definition (meaning, you can say, what you expect, but not what you
> don't expect). In other words, regexps were invented to define (uh...
regular)
> sets, nothing more (especially you can't define "non-sets"). So the usual
> way is to define the set you've called '*?' and describe
> it as regex. Modern regular expression engines (which are no longer
regular
> by the way ;-) allow shortcuts like negative lookahead assertions and the
> like.
>
> I want to make clear, that it isn't, that nobody _wants_ to give an advice
> how to express your pattern in general. The point is, that there's no
> real syntax for it. It depends on how your <1st-pat> and  look
> like. Chances are, that's even not expressable in one regex (depends on
> the complexity and kind of the set they define).
> Each pattern you write is special to the particular use case.





-- 
http://mail.python.org/mailman/listinfo/python-list


Re: re module non-greedy matches broken

2005-04-05 Thread lothar
give an re to find every innermost "table" element:

innertabdoc = """

  

   n
  

  


  



   y  z
  

  

  
  

  
  

  

"""

give an re to find every "pre" element directly followed by an "a" element:

preadoc = """

a r n


l y


r


f g z


m b u c v


u

"""

"John Ridley" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
>

> Could you post some real-world examples of the problems you are trying
> to deal with, please? Trying to come up with general solutions for
> arbitrarily complex patterns is a bit to hard for me :)




-- 
http://mail.python.org/mailman/listinfo/python-list


Re: re module non-greedy matches broken

2005-04-05 Thread lothar
a non-greedy match is implicitly defined in the documentation to be one such
that there is no proper substring in the return which could also match the
regex.

the documentation implies the module will return a non-greedy match.

the module does not return a non-greedy match.


"Fredrik Lundh" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> "lothar" wrote:
>
> > with respect to the documentation, the module is broken.
>
> nope.
>
> > the module does not necessarily deliver a "minimal length" match for a
> > non-greedy pattern.
>
> it isn't supposed to: a regular expression describes a *set* of matching
> strings, and the engine is free to return any string from that set.
Python's
> engine returns the *first* string it finds that belongs to the set.  if
you use
> a non-greedy operator, the engine will return the first non-greedy match
> it finds, not the overall shortest non-greedy match.
>
> if you don't want to understand how regular expressions work, don't use
> them.
>
> 
>
>
>



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: re module non-greedy matches broken

2005-04-06 Thread lothar
well done.
i had not noticed the lookahead operators.


"André Malo" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> * lothar wrote:
>



-- 
http://mail.python.org/mailman/listinfo/python-list


Can't compile with --enable-shared on MacOSX

2005-04-18 Thread Lothar Scholz
Help,

i tried to generate a dynamic library from the official
Python-2.4.0.tgz on MacOSX 10.3 but when i do the

./configure --enable-shared ; make ; sudo make install

or 

./configure --enable-shared=yes ; make ; sudo make install


It links statically. It's also strange that i can't find a
libpython2.4.a in my
/usr/local/lib. It's not installed by the install command.

Also  /usr/local/bin/python24 works fine.

I get an error that TK/TCL was not found. Is this the reason, i
thought i can simply ignore this error message.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Can't compile with --enable-shared on MacOSX

2005-04-18 Thread Lothar Scholz
Maarten Sneep <[EMAIL PROTECTED]> wrote in message news:<[EMAIL PROTECTED]>...

> On Mac OS X the shared library functionality is obtained through 
> frameworks. It may detect this by default, but I'm not sure about 

Not good. I don't want frameworks. I must embedd python into my
application.
Setting up a framework and installing/maintaining it is much more work
especially when i'm already doing my own maintainance for the
Linux/Windows port.

No ther way then stealing the dylib from the framework directory ?
-- 
http://mail.python.org/mailman/listinfo/python-list


Python as CGI on IIS and Windows 2003 Server

2005-06-09 Thread lothar . sch
Hi,

My python scripts are running as cgi scripts on an IIS on Windows XP.
I have to distribute it to IIS on Windows 2003 Server.
I tried to set python as cgi scripts in IIS on this machine in IIS
using advices from http://python.markrowsoft.com/iiswse.asp

No test with or without any " let the IIS execute python scrits as cgi.
Http Error code is 404 (but i'm sure that the file exists in the
requested path).

Is there any difference for python as CGI on IIS between Windows XP
prof. and Windows 2003 Server?

Thanks
Lothar

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python as CGI on IIS and Windows 2003 Server

2005-06-10 Thread lothar . sch
jean-marc schrieb:
> Some bits are coming back to me: the problems stemmed from adresses -
> getting the root of IIS was different so accessing files didn't work
> the same way.

thanks for that.
you are right, IIS versions are different.
Wich kind of adresses do you mean, http-adresses or paths in file
systems to root of IIS or to pythonscripts below IIS' root?

Unfortunately I couldn't find a way to solve the problem.


regards
Lothar

-- 
http://mail.python.org/mailman/listinfo/python-list


Oddity with large dictionary (several million entries)

2010-04-27 Thread Lothar Werzinger
Hi,

I am trying to load files into a dictionary for analysis. the size of the 
dictionary will grow quite large (several million entries) and as inserting 
into a dictionary is roughly O(n) I figured if I loaded each file into it's 
own dictionary it would speed things up. However it did not.

So I decided to write a small test program (attached)

As you can see I am inserting one million entries a time into a map. I ran 
the tests where I put all three million entries into one map and one where I 
put one million each into it's own map.

What I would have expected is that if I insert one million into it's own map 
the time to do that would be roughly constant for each map. Interestingly it 
is not. It's about the same as if I load everything into one map.

Oh and I have 4G of RAM and the test consumes about 40% at it's max. I even 
run the test on one of our servers with 64G of RAM, so I can rule out 
swapping as the issue.

Can anyone explain this oddity? Any insight is highly appreciated.

Here's the output of the test runs:

$ ./dicttest.py -t 0
Inserting into one map
Inserting 100 keys lasted 0:00:26 (38019 1/s)
len(map) 100
Inserting 100 keys lasted 0:01:17 (12831 1/s)
len(map) 200
Inserting 100 keys lasted 0:02:23 (6972 1/s)
len(map) 300
total 300

$ ./dicttest.py -t 1
Inserting into three maps
Inserting 100 keys lasted 0:00:32 (30726 1/s)
len(map) 100
Inserting 100 keys lasted 0:01:29 (11181 1/s)
len(map) 100
Inserting 100 keys lasted 0:02:23 (6957 1/s)
len(map) 100
total 300


Thanks,
Lothar

,[ /home/lothar/tmp/dicttest.py ]
| #!/usr/bin/python
| # -*- coding: utf-8 -*-
| 
| import datetime
| import optparse
| import sys
| import time
| 
| 
| 
| 
| def fillDict(map, nop, num, guid):
|   before = time.time()
|   
|   for i in xrange(0, num):
| key = (i, guid)
| if not nop:
|   map[key] = ([], {})
|   
|   after = time.time()
|   elapsed = (after - before)
|   if elapsed <= 0:
| divide = 1.0
|   else:
| divide = elapsed
|   elapsedTime = datetime.timedelta(seconds=int(elapsed))
|   print("Inserting %d keys lasted %s (%d 1/s)" % (num, elapsedTime, (num /
|   divide))) print("len(map) %d" % (len(map)))
| 
| 
| def test0(nop, num):
|   print("Inserting into one map")
|   map = {}
|   fillDict(map, nop, num, "0561c83c-9675-4e6f-bedc-86bcb6acfd71")
|   fillDict(map, nop, num, "0561c83c-9675-4e6f-bedc-86bcb6acfd72")
|   fillDict(map, nop, num, "0561c83c-9675-4e6f-bedc-86bcb6acfd73")
|   print("total %d" % (len(map)))
| 
| 
| def test1(nop, num):
|   print("Inserting into three maps")
|   map1 = {}
|   map2 = {}
|   map3 = {}
|   fillDict(map1, nop, num, "0561c83c-9675-4e6f-bedc-86bcb6acfd71")
|   fillDict(map2, nop, num, "0561c83c-9675-4e6f-bedc-86bcb6acfd72")
|   fillDict(map3, nop, num, "0561c83c-9675-4e6f-bedc-86bcb6acfd73")
|   total = 0
|   for map in [map1, map2, map3]:
| total += len(map)
|   print("total %d" % (total))
| 
| 
| 
| if __name__ == "__main__":
|   usage = "USAGE: %prog [options]"
|   description="test"
|   version="%prog 1.0"
|   
|   parser = optparse.OptionParser(usage=usage, version=version,
|   description=description) parser.add_option(
| "-t",
| "--testnum",
| action="store",
| dest="testnum",
| help="the number of the test to execute",
| type="int",
| default=1
|   )
|   parser.add_option(
| "-i",
| "--iterations",
| action="store",
| dest="iterations",
| help="the number of iterations",
| type="int",
| default=100
|   )
|   parser.add_option(
| "-n",
| "--nop",
| action="store_true",
| dest="nop",
| help="don't store in the dictionary only load and parse",
| default=False
|   )
| 
|   (options, args) = parser.parse_args()
|   
|   testmap = {
| 0:test0,
| 1:test1,
|   }
| 
|   test = testmap[options.testnum]
| 
|   test(options.nop, options.iterations)
`
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Oddity with large dictionary (several million entries)

2010-04-27 Thread Lothar Werzinger
Peter Otten wrote:

> Lothar Werzinger wrote:
>> Can anyone explain this oddity? Any insight is highly appreciated.
> 
> When you are creating objects like there is no tomorrow Python's cyclic
> garbage collections often takes a significant amount of time. The first
> thing I'd try is therefore switching it off with
> 
> import gc
> gc.disable()
> 
> Peter

Wow, that really MAKES a difference! Thanks a lot!

$ ~/tmp/dicttest.py -t 0
Inserting into one map
Inserting 100 keys lasted 0:00:01 (960152 1/s)
len(map) 100
Inserting 100 keys lasted 0:00:01 (846416 1/s)
len(map) 200
Inserting 100 keys lasted 0:00:04 (235241 1/s)
len(map) 300
total 300

$ ~/tmp/dicttest.py -t 1
Inserting into three maps
Inserting 100 keys lasted 0:00:01 (973344 1/s)
len(map) 100
Inserting 100 keys lasted 0:00:00 (1011303 1/s)
len(map) 100
Inserting 100 keys lasted 0:00:00 (1021796 1/s)
len(map) 100
total 300
<~/beacon>


Thanks!
-- 
Lothar
-- 
http://mail.python.org/mailman/listinfo/python-list