Re: Trace KeyboardInterrupt exception?
Tony Nelson wrote:
> I'm trying to find out what is eating some KeyboardInterrupt exceptions
> in a fairly large program (yum). My KeyboardInterrupt handler is called
> for some Ctl-C presses, but for others nothing seems to happen.
> ... I'd like to use a debugger to trace
> KeyboardInterrupt exceptions, make sure that they're happening, and see
> what is handling them.
I don't know how to do that in Idle. You can replace the default
Ctrl-C interrupt
handler with your own and use that to inspect the current stack. For
example,
>>> import signal
>>> signal.getsignal(signal.SIGINT)
>>> prev_handler = signal.getsignal(signal.SIGINT)
>>> def new_int_handler(*args):
... print "Keyboard Interrupt!"
... traceback.print_stack()
... prev_handler(*args)
...
>>> signal.signal(signal.SIGINT, new_int_handler)
>>> def spin():
... while 1: pass
...
>>> import traceback
>>> spin()
^CKeyboard Interrupt!
File "", line 1, in ?
File "", line 2, in spin
File "", line 3, in new_int_handler
Traceback (most recent call last):
File "", line 1, in ?
File "", line 2, in spin
File "", line 4, in new_int_handler
KeyboardInterrupt
>>>
There's no real need to call the old handler. You could "raise
KeyboardInterrupt"
or SystemExit or just ignore it, as in
>>> count = 0
>>> def new_int_handler(signum, frame):
...global count
...print messages[count]
...if count >= len(messages)-1:
... raise KeyboardInterrupt
...count += 1
...
>>> messages = {0: "Sorry, did you want me to do something?",
... 1: "That's ticklish!",
... 2: "Now, where did that off button go to",
... 3: "Do that again and I'll leave.",
... 4: "Shutdown activated"}
>>>
>>> def spin():
... while 1: pass
...
>>> spin()
^CTraceback (most recent call last):
File "", line 1, in ?
File "", line 2, in spin
KeyboardInterrupt
>>>
>>> import signal
>>> signal.signal(signal.SIGINT, new_int_handler)
>>>
>>> spin()
^CSorry, did you want me to do something?
^CThat's ticklish!
^CNow, where did that off button go to
^CDo that again and I'll leave.
^CShutdown activated
Traceback (most recent call last):
File "", line 1, in ?
File "", line 2, in spin
File "", line 5, in new_int_handler
KeyboardInterrupt
>>>
Andrew
[EMAIL PROTECTED]
--
http://mail.python.org/mailman/listinfo/python-list
Re: __lt__ slowing the "in" operator even if not called
Emanuele Aina wrote:
> I have some code which does a lot of "in" on lists containing objects
> with no __eq__ defined.
>
> It all goes fast until I add the __lt__() method: then I have a
> slowdown comparable to the one I get using the overridden __eq__, while
> the __lt__ method is never called.
>
> Someone can explain me why?
The list's __contains__ method is very simple
for (i = 0, cmp = 0 ; cmp == 0 && i < a->ob_size; ++i)
cmp = PyObject_RichCompareBool(el, PyList_GET_ITEM(a,
i),
Py_EQ);
The PyObject_RichCompareBool becomes more complex. The
relevant code occurs after the check for object identity. The next
step is
res = PyObject_RichCompare(v, w, op);
which goes into your case
/* If the types are equal, and not old-style instances, try to
get out cheap (don't bother with coercions etc.). */
if (v->ob_type == w->ob_type && !PyInstance_Check(v)) {
cmpfunc fcmp;
richcmpfunc frich = RICHCOMPARE(v->ob_type);
/* If the type has richcmp, try it first.
try_rich_compare
tries it two-sided, which is not needed since we've
a
single type only. */
if (frich != NULL) {
res = (*frich)(v, w, op);
if (res != Py_NotImplemented)
goto Done;
Py_DECREF(res);
}
/* No richcmp, or this particular richmp not
implemented.
Try 3-way cmp. */
fcmp = v->ob_type->tp_compare;
One new-style class defines '__lt__' while the other does not.
Here's where things get shaky for me. I think new-style classes
are instances of PyInstanceObject (and not PyClassObject). Instances
use 'instance_richcompare' to implement the rich comparison
between two instances. This function does the lookup for '__eq__'
in the two classes.
The tp_richcompare slot is set to instance_richcompare when any
of __lt__, __gt__, __eq_, etc. are defined in a new type. The macro-
based code looks like
TPSLOT("__lt__", tp_richcompare, slot_tp_richcompare,
richcmp_lt,
"x.__lt__(y) <==> x x<=y"),
TPSLOT("__eq__", tp_richcompare, slot_tp_richcompare,
richcmp_eq,
"x.__eq__(y) <==> x==y"),
TPSLOT("__ne__", tp_richcompare, slot_tp_richcompare,
richcmp_ne,
"x.__ne__(y) <==> x!=y"),
TPSLOT("__gt__", tp_richcompare, slot_tp_richcompare,
richcmp_gt,
"x.__gt__(y) <==> x>y"),
TPSLOT("__ge__", tp_richcompare, slot_tp_richcompare,
richcmp_ge,
"x.__ge__(y) <==> x>=y"),
So if you define "__lt__" in your object then the type gets a richcmp
function and your == test implicit in the 'in' search always incurs the
cost of figuring out that "__eq__" is not defined.
Andrew
[EMAIL PROTECTED]
--
http://mail.python.org/mailman/listinfo/python-list
Re: Seeking regex optimizer
Kay Schluehr wrote:
> I have a list of strings ls = [s_1,s_2,...,s_n] and want to create a
> regular expression sx from it, such that sx.match(s) yields a SRE_Match
> object when s starts with an s_i for one i in [0,...,n].
Why do you want to use a regex for this? When you have constant
strings there are other solutions. For example, this uses Nicolas
Lehuen's
pytst module from http://nicolas.lehuen.com/index.php/Pytst
>>> import tst
>>> tree = tst.TST()
>>> tree["aa"] = (1, "String with 'aa'")
>>> tree["aahhh"] = (2, "At the doctor's")
>>> tree["a"] = (3, "indefinite article")
>>> tree.scan("This is a bunch of text. Have you seen an aardvark?",
... tst.TupleListAction())
[('This is ', -8, None), ('a', 1, (3, 'indefinite article')), (' bunch
of text. H', -18, None), ('a', 1, (3, 'indefinite article')), ('ve you
seen ', -12, None), ('a', 1, (3, 'indefinite article')), ('n ', -2,
None), ('aa', 2, (1, "String with 'aa'")), ('rdv', -3, None), ('a', 1,
(3, 'indefinite article')), ('rk?', -3, None)]
>>>
Each returned tuple has three elements:
For a substring which is not a match these are:
- the unmatched substring
- the negative length of the substring
- None
For a substring which matches:
- the matched substring
- the positive length of the substring
- the value of the match in the TST (the tree)
It uses Aho-Corasick for the implementation which is fast and does what
you expect it to do. Nor does it have a problem of matching more than
99 possible strings as the regexp approach may have.
Andrew
[EMAIL PROTECTED]
--
http://mail.python.org/mailman/listinfo/python-list
Re: Seeking regex optimizer
Replying to me Mirco Wahab wrote:
> If you pull the strings into (?>( ... )) (atomic groups),
> this would't happen.
Given that Python's re engine doesn't support this feature
it doesn't really help the original poster's problem.
Even if some future Python did support it, the limit
to 100 named groups is unaffected by backtracking.
>>> import re
>>> re.compile("(.)"*100)
Traceback (most recent call last):
File "", line 1, in ?
File
"/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/sre.py",
line 180, in compile
return _compile(pattern, flags)
File
"/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/sre.py",
line 225, in _compile
p = sre_compile.compile(pattern, flags)
File
"/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/sre_compile.py",
line 506, in compile
raise AssertionError(
AssertionError: sorry, but this version only supports 100 named groups
>>>
There was no backtracking in "(.)"*100.
Andrew
[EMAIL PROTECTED]
--
http://mail.python.org/mailman/listinfo/python-list
europython room share
Is anyone here going to Europython and would like a roommate to help split the cost? I'll be there for all three days of the conference plus a few extra days for sprints. I figure I can move elsewhere if need be for the sprints. It looks like the best choices are St. Genis (because it is about 20 minutes walk away) or Geneva (30 minutes by frequent bus, with more to do and places to eat). I did some research looking for places to stay. My biggest problem was my inability to read French, though most of the hotels on the CERN page at http://housing-service.web.cern.ch/housing-service/listhotel.html also have alternate pages in English. My second biggest is not knowing the geography. The Europython "Places to stay" page at http://www.europython.org/sections/accommodation says Nearby places (useful when consulting the hotels with special rates) include Meyrin, Satigny (in Switzerland) and St. Genis-Pouilly (in France). Figuring a lot of students have worked at CERN and written up their experiences, I found one with this comment in http://www.sccs.swarthmore.edu/org/swap/reureviews/cern.html CERN is about 30 minutes by bus from Geneva and a 20 minute walk or bike ride from entertainment in St. Genis, France. Summer students hung out most Tuesdays at a pub in St. Genis. The Europython wiki says the bus to Geneva "leaves four times an hour during most of a weekday." I don't know if most people will meet up for dinner and drinks somewhere close by CERN or in Geneva, making it harder to figure out where to stay. I'm thinking nearby.. at least those staying in the CERN hostel. Andrew [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list
Re: Seeking regex optimizer
Kay Schluehr replied to my question: > > Why do you want to use a regex for this? > > Because it is part of a tokenizer that already uses regexps and I do > not intend to rewrite / replace it. Switching to pytst is not a big change - there will be little impact on the rest of your code. On the other hand, it does change your build and deployment process because of the need for another package. Because you are constraining yourself to regex there isn't much of an improvement you can do. > I found that certain groups of token > might be represented in a more compact mannerr: for matching ['+', > '++'] one might generate '\+|\+\+' or '\+\+?' and I wanted to know if > there is some generic approach to solve the "inverse regexp" problem in > a non-trivial fashion. There are a few questions which come up. Why do you want to have a "more compact" representation? Do you think it will make the result faster? Likely it will because of the reduced need for backtracking but unless you have a lot of strings with the same prefix then the number of backtracks should go as something like the log of the number of words, which scales pretty well. Are you speed limited by the existing implementation? Likely not. In which case the performance isn't an issue. Consider this anecdote from http://www.technicat.com/writing/programming.html I was asked in a phone interview with Google how I would search for a number in an array of ordered numbers. Obviously, the questioner was asking for a CS 101 answer - binary search. But in real life, I would probably do the "wrong" thing - search the array from beginning to end. There's no point in taking twice the time to write twice as much code that has to be maintained and debugged if the application performance is good enough, and in particular if that piece of code is not the bottleneck in the application. (And I seriously doubt you'd have that data laid out linearly in a fixed array like that if it was the bottleneck) As it turns out, the regexp implementation could do the optimization you want. That is, it could recognize that the input sequence is a set of fixed words (no special characters) and implement something like Aho-Corasick, in which case you wouldn't need to change your input. I don't know if Python's regexp implementation does this already. Neither do you. Test the functionality and performance first before going starting other work. I am not the first to give you this advice: John Machin: > Optimised with respect to what? speed? ease of construction? [email protected]: > As others have suggested, you should first try the most naive > implementation before making a hard optimization problem out of this. As for the general question of the existance of a "generic approach to solve the "inverse regexp" problem in a non-trivial fashion." Yes, there is. Kinda of. All (formal/theoretical) regular expressions can be written as a regular expression without need for backtracking. If you look up the theory of regular expressions, backtracking occurs when there are ambiguities in traversing the state machine. These are called nondeterministic finite state machines, or NFAs. It turns out that every NFA can be written as a DFA, which has no need for backtracking. Except. Actually, two excepts. The normal algorithm for converting an NFA to a DFA doesn't support grouping, which you'll need to figure out which group matched. Wikipedia says there is an alternate, slower algorithm which does support groups. The second except is that regexps in Python, Perl, etc. are not regular expressions. Some patterns, like r"([A-Z][a-z]+)\1", which matches "WikiWiki", are not "regular grammars" under the definition used in language theory. It's possible to use regexps to match strings of prime length, for example. So in the general case there is no solution. There are many possible optimizations, but no general solution. For your problem, which is a limited subset of the problem as it involves only constant strings, then the solution is a prefix tree, also called a trie. See http://en.wikipedia.org/wiki/Prefix_tree John Machin in an earlier post suggested you search along these lines when he said > I would have thought the way to approach this would be a simple > character-by-character tree/trie-driven lookup. This would be worst case > O(n) where n is the length of the longest string in your list. There may > well be a Python-callable gadget for this on the net somewhere. Google > "Danny Yoo ahocorasick" for a Python-callable solution to a similar but > more complex problem. For that matter I suggested it when I pointed you to pytst. The tst means "ternary search tree" (some write it "ternary search trie") which is a different implementation of the same idea. Your posts read like you ignored those pointers because they didn't fall into the category "implemented with a regular expression". If that's the cas
Re: Seeking regex optimizer
Mirco Wahab wrote:
> Hi, are you the A.Dalke from the Schulten group (VMD) as
> listed here: http://www.ks.uiuc.edu/Overview/People/former.cgi
Yes. But I left there nearly a decade ago.
> # naive regex '\d+9'
> # find some number only if it ends by 9
> my $str="10099000";
>
> print "non-atomic backtracks: $1\n" if $str =~ / ( \d+9 )/x;
> print "atomic: matches $1\n" if $str =~ / ( (?>\d+)9 )/x;
What makes me less than enthusiastic about various new regexp
features is that they can be written other ways. For example,
>>> import re
>>> pat = re.compile("([0-8]*9)+")
>>> pat.match("10099000")
<_sre.SRE_Match object at 0x590a0>
>>> _.group(0)
'10099'
>>> pat = re.compile("([0-8]*9)+(?!\d)")
>>> pat.match("10099000")
>>>
They are different ways to reach the same goal. Understanding
the alternatives requires some non-trivial understanding about
how regexp engines work. The diversity of solutions with no
clear preference of one over the other overly promotes
idiosyncratic patterns and requires that everyone reading the
regexps must understand all the special syntax contructions
even when a person's working vocabulary only uses a subset
of those.
> Second line would not backtrack and not find the 9 behind \d+
> because atomic grouping prevents that, so line 1 matches
> 10099 and line 2 nothing (but very fast, which is the point).
While my pattern is not as fast. It's linearly slower than yours
as it does the backtracking. (Unless the engine is very smart.)
That factor is not something I'm concerned with.
I can also get performance without backtracking (though there
is setup for backtracking) using
>>> pat = re.compile("([0-8]*9)+([0-8]*)")
>>> m = pat.match("10099000")
>>> if m.group(2): print "does not end in a 9"
...
does not end in a 9
>>>
>>> m = pat.match("100990009a")
>>> if m.group(2): print "does not end in a 9"
...
>>>
> Why is there a named group count
> restriction in Python? Any reasons
> one should know?
While not definite, I think it introduces an ambiguity with octal
escape codes. (*That's* something to omit in Python 3000!)
Here's the quote from perlre
There is no limit to the number of captured substrings that you
may
use. However Perl also uses \10, \11, etc. as aliases for \010,
\011,
etc. (Recall that 0 means octal, so \011 is the character at
number 9
in your coded character set; which would be the 10th character,
a hori-
zontal tab under ASCII.) Perl resolves this ambiguity by
interpreting
\10 as a backreference only if at least 10 left parentheses have
opened
before it. Likewise \11 is a backreference only if at least 11
left
parentheses have opened before it. And so on. \1 through \9
are
always interpreted as backreferences.
Python doesn't have "\10", "\11", etc. as aliases for "\010", "\011",
etc.
>>> re.compile("\10")
<_sre.SRE_Pattern object at 0x575c0>
>>> pat = re.compile(r"\10")
Traceback (most recent call last):
File "", line 1, in ?
File
"/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/sre.py",
line 180, in compile
return _compile(pattern, flags)
File
"/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/sre.py",
line 227, in _compile
raise error, v # invalid expression
sre_constants.error: bogus escape: '\\10'
>>> pat = re.compile(r"\010")
>>>
so there is no ambiguity until reaching \100.
>>> pat = re.compile(r"\100")
>>> pat.match("@")
<_sre.SRE_Match object at 0x232560>
>>>
Most likely /F (who implemented the regex engine) decided
to refuse the temptation to guess the meaning, eg by
parens counting as Perl does.
Try this variation of your program to see how Perl changes
the meaning of "\100" based on the input string
for (my $i=98; $i<=103; $i++) {
my $re = '(\d+)\D' x $i . '\100';
my $text = join ' ', 0..$i;
$text .= " @";
print "Testing $i\n";
if ($text =~ /$re/) {
print " Number of matches: $+\n";
print " Last group: ${$i}\n";
} else {
print "No matches\n";
}
}
I get
Testing 98
Number of matches: 98
Last group: 98
Testing 99
Number of matches: 99
Last group: 99
Testing 100
No matches
Testing 101
No matches
Testing 102
No matches
Testing 103
No matches
Andrew
[EMAIL PROTECTED]
--
http://mail.python.org/mailman/listinfo/python-list
