from:"Rhamphoryncus"

Re: Documentation suggestions

2005-12-06 Thread Rhamphoryncus

A.M. Kuchling wrote:
> Here are some thoughts on reorganizing Python's documentation, with
> one big suggestion.

Throwing in my own 2¢.. I think the language reference should be
disseminated into the rest of the documentation.  Some of the stuff
(operator precedence anybody?) could be done directly, while more
technical aspects could be put in boxes with green[0] backgrounds
saying "technical details".  Thinks specific to one implementation
could further be given a blue[0] background.

I don't expect everything to make the transition.  Are discussions of
"atoms" and fragments of BNF really better than calling them
expressions and linking to CPython's Grammar file?

--
Adam Olsen, aka Rhamphoryncus

[0] Colors pulled off the top of my head

-- 
http://mail.python.org/mailman/listinfo/python-list

Proposal for adding Shallow Threads and a Main Loop to Python

2005-03-17 Thread Rhamphoryncus

First a bit about myself.  I've been programming in python several
years now, and I've got several more years before that with C.  I've
got a lot of interest in the more theoretical stuff (language design,
component architectures, etc).  Of late my focus has been on concurrent
operations (and on how to design a GUI architecture, but that's not
what this post is about).  I've looked at threads, and the inability to
kill them easily was a problem.  To get them to quit at all you had to
explicitly check a var that'd trigger it.  Might as well be using
event-driven.. so I looked at event-driven.  That doesn't have the
issue with ending operations, but it does make writing code much
harder.  Anything more than the most trivial loop has to be turned into
a state machine instead.

But recently I came up with a solution to that too.  I call it Shallow
Threads.

A shallow thread is just a generator modified in the most obvious way
possible.  The yield statement is replaced with a waitfor expression.
You give it the object you wish to "wait for".  Then when it's ready
you get back a return value or an exception.  These waitfor expressions
are the only points where your shallow thread may get suspended, so
it's always explicit.  If you call another function it will be treated
exactly as a normal function call from a generator (this is why they're
'shallow'), so there's no issues with calling C functions.

On the other end of things, your shallow thread object would have
__resume__ and __resumeexc__ methods (or perhaps a single __resume__
method with a default raise_=False argument).  They return a tuple of
(mode,value) where mode is a string of 'waitfor', 'exception', or
'return' and value corresponds to that.  These methods will be used by
a main loop to resume the shallow thread, like next() is used with a
generator.

A main loop is the next part of my proposal.  Where shallow threads
could be added with a fairly small patch, a main loop would require a
much more extensive one.  And unfortunately you need a main loop to use
shallow threads.  (You could use twisted but that's got the problems of
being third-party and not designed for inclusion in the python core.)

First part is the concept of an "event notifier object".  This is
simply an object that is not "done" yet, and you can "watch" and be
given a single value (or exception) when it is done.  This could be
something internal to your program like a shallow thread above, or it
could be something external like a file read completing.  This would,
like most other protocols in python, involve setting an attribute or a
method to support it.  I haven't yet figured out the best design
though.

We need versions of many existing functions that produce event
notifiers instead.  I suggest adding an async keyword to the existing
functions, defaulting to False, to indicate an event notifier should be
produced.  For example: waitfor (waitfor open("hello.txt",
async=True)).read(async=True).  At some mythological point in the
future, perhaps the default for async to be switched to True and that
example would get much shorter.

Now as for the main loop itself, I suggest a mainloop module.  The
primary function used would be mainloop.runUntil(), which would take an
event notifier as it's argument and would return as soon as that event
notifier was triggered.  An example of this and other features follows.

import mainloop, urllib

def get_and_save(path):
infile = waitfor urllib.urlopen(path, async=True)
outfile = waitfor open(path.split('/')[-1], async=True)
waitfor outfile.write(waitfor infile.read(async=True), async=True)
infile.close()
outfile.close()

def main():
a = get_and_save("http://python.org/pics/PyBanner021.gif";)
b = get_and_save("http://python.org/pics/pythonHi.gif";)
c = get_and_save("http://python.org/pics/PythonPoweredSmall.gif";)

waitfor allDone(a, b, c)

if __name__ == "__main__":
mainloop.runUntil(main())

Well there you have it.  I've glossed over many details but they can be
cleared up later.  What I need to know now is how everybody else thinks
about it.  Is this something you would use?  Does it seem like the
right way to do it?  And of course the all important one, can I get it
in to python core?  <0.5 wink>

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Proposal for adding Shallow Threads and a Main Loop to Python

2005-03-17 Thread Rhamphoryncus

Gary D. Duzan wrote:
>A while back I tossed something together to deal with the same
issue
> in terms of "futures" (or "promises".)  Here is roughly what the
above
> code would look like with futures as I implemented them:



>This was all done using plain Python 1.5.2 in 80 lines of code,
> including some blank lines and doc strings. Maybe I'll brush off
> the code a bit and post it one of these days.
>
>   Gary Duzan
>   BBN Technologies

Hrm.  Well to be honest the point is the shallow threads, *not* the
main loop and event notifiers.  Otherwise I'd just use blocking calls
in real threads, or maybe just twisted, and not bother creating
anything new, ya know?

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Proposal for adding Shallow Threads and a Main Loop to Python

2005-03-18 Thread Rhamphoryncus

ense
intended
> - it's a little bit more comfortable than the generators approach
sketched
> by others (e.g. David Mertz if I recall corretly) - but to my view,
it
> _could_ be done in today python because we have generators. Or not?

Go and compare the two get_and_save functions.  Would you call that a
"little bit more comfortable"?  Now imagine the function was 20 lines
to begin with, involving a couple loops and some list comprehensions.
Then just for kicks, imagine it without generators, instead turning it
into a state machine.

Yes, you can *technically* do it using generators (and they're what
make the implementation easy).  But you could also do it using a state
machine.  Doesn't mean it's practical though.

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Using weakrefs instead of del

2005-04-06 Thread Rhamphoryncus

I know __del__ can't be relied on to do cleanup, so I'm thinking of
using a globally reachable weakref to do the job, like so:

import weakref

class Foo:
  def __init__(self, *args, **kwargs):
self.__realfoo = RealFoo(self, *args, **kwargs)
  def blah(self):
return self.__realfoo.blah()

class RealFoo:
  refs = set()
  def __init__(self, front):
self.refs.add(weakref.ref(front, self.cleanup))
  def blah(self):
print "Blah!"
  def cleanup(self, ref):
print "Doing cleanup"
self.refs.remove(ref)

Is there any reason why this wouldn't work?  Can it be used as a
general replacement for anything that needs __del__?  Even in Jython or
other alternative implementations?

Some issues I know about:
 * The weakref must be globally reachable (as I do with the above code)
to make sure it doesn't get deleted at the same time Foo does, because
python would not call the callback if that were the case.
 * I can't use any getattr tricks to wrap .blah() because Foo().blah()
would only use a  method bound to RealFoo, thus allowing Foo to be
deallocated and .cleanup() to be called before the method itself is.
Unfortunately my metaclass-fu is too weak to figure out something
without that problem.
 * I've no idea what will happen when the interpreter exits, but I
don't imagine it would matter much for most uses.

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: do people really complain about significant whitespace?

2006-08-10 Thread Rhamphoryncus

Stephen Kellett wrote:
> function()
> loop1()
> blah
> blah
>
> loop2()
> blah
>
> loop3()
> blah
>
> blah3
>
> otherloop()
> blah
>
> Yes the above example is contrived - so that I could demonstrate what I
> wanted to demonstrate. But I've run into these problems with Python code
> I've written and when reading code written by others. Its a real
> problem, not just one I've cooked up for an argument.

[much snippage]

I actually agree with you here; when indentation goes on longer than a
screen it can become difficult know what that indentation is associated
with.  Braces may mitigate it somewhat, but not completely.  Some
points to consider:

* Python is inherently shorter than C/C++, so you're less likely to go
over one screen.
* Long functions should usually be refactored anyway and would gain
readability in any language.  The threshold here is smaller in python
because your code will do more significant things, vs C where your
screen gets covered with minutiae that you learn to mentally ignore.

Also, I wonder if a more intelligent editor would help here, one that
would visually indicate blocks as such:

function()
|   loop1()
|   |   blah
|   |   blah
|   |
|   |   loop2()
|   |   |   blah
|   |   |
|   |   |   loop3()
|   |   |   |   blah
|   |   |
|   |   |   blah3
|
|   otherloop()
|   |   blah

Surely this would eliminate the problem?

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python share CPU time?

2006-08-10 Thread Rhamphoryncus

Yannick wrote:
> Hi,
>
> I would like to program a small game in Python, kind of like robocode
> (http://robocode.sourceforge.net/).
> Problem is that I would have to share the CPU between all the robots,
> and thus allocate a time period to each robot. However I couldn't find
> any way to start a thread (robot), and interrupt it after a given time
> period.
> Any suggestions on how to proceed?
> Is Python just not adapted to this kind of things?

If your intent is to allow the user to program a robot using python
then no it's NOT suitable.  There's no builtin way to restrict what
code can do (but Zope might have a way), nor can you force it to pause
after a predetermined amount of time, nevermind making that amount of
time deterministic enough to be fair in a game.

However, if your intent was to have only trusted code then Python would
work fine.  My preference would be for generators (especially once
Python 2.5 is released), but YMMV.

-- 
http://mail.python.org/mailman/listinfo/python-list

istep() addition to itertool? (Was: Re: Printing n elements per line in a list)

2006-08-19 Thread Rhamphoryncus

unexpected wrote:
> If have a list from 1 to 100, what's the easiest, most elegant way to
> print them out, so that there are only n elements per line.

I've run into this problem a few times, and although many solutions
have been presented specifically for printing I would like to present a
more general alternative.

from itertools import chain
def istepline(step, iterator):
i = 0
while i < step:
yield iterator.next()
i += 1

def istep(iterable, step):
iterator = iter(iterable)  # Make sure we won't restart iteration
while True:
# We rely on istepline()'s side-effect of progressing the
# iterator.
start = iterator.next()
rest = istepline(step - 1, iterator)
yield chain((start,), rest)
for i in rest:
pass  # Exhaust rest to make sure the iterator has
  # progressed properly.

>>> i = istep(range(12), 5)
>>> for x in i: print list(x)
...
[0, 1, 2, 3, 4]
[5, 6, 7, 8, 9]
[10, 11]

>>> i = istep(range(12), 5)
>>> for x in i: print x
...

>>> from itertools import islice, chain, repeat
>>> def pad(iterable, n, pad): return islice(chain(iterable, repeat(pad)), n)
>>> i = istep(range(12), 5)
>>> for x in i: print list(pad(x, 5, None))
...
[0, 1, 2, 3, 4]
[5, 6, 7, 8, 9]
[10, 11, None, None, None]

Would anybody else find this useful?  Maybe worth adding it to itertool?

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: sum and strings

2006-08-19 Thread Rhamphoryncus

Bill Pursell wrote:
> Georg Brandl wrote:
> > Paul Rubin wrote:
> > > Sybren Stuvel <[EMAIL PROTECTED]> writes:
> > >> Because of "there should only be one way to do it, and that way should
> > >> be obvious". There are already the str.join and unicode.join methods,
> > >
> > > Those are obvious???
> >
> > Why would you try to sum up strings? Besides, the ''.join idiom is quite
> > common in Python.
>
> One could extend this argument to dissallow the following:
> >>> "foo" + "bar"

It's worthwhile to note that the use of + as the concatenation operator
is arbitrary.  It could just have well been | or &, and has no
relationship with mathematically addition.  Were history different
perhaps Guido would have gone with | or & instead, and we wouldn't be
having this conversation.

It's also worth stressing (not in response to your post, but others)
that sum([[1],[2],[3]], []) is just as bad as attempting to sum
strings, both conceptually (it's not mathematical addition), and
performance-wise.  Don't do it. :)

I believe the prefered method to flatten a list of lists is this:

shallow = []
for i in deep:
shallow.extend(i)

Yes, it's three lines.  It's also very easy to read.  reduce() and
sum() are not.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: sum and strings

2006-08-20 Thread Rhamphoryncus

Paddy wrote:
> Rhamphoryncus wrote:
>
> >
> > It's worthwhile to note that the use of + as the concatenation operator
> > is arbitrary.  It could just have well been | or &, and has no
> > relationship with mathematically addition.
>
> The effect of the string concatenation operator is only secondary.
> Secondary to the use of the word sum; and what could be 'reasonably'
> concieved as sum's effect on  non-numeric types.
> >
> > It's also worth stressing (not in response to your post, but others)
> > that sum([[1],[2],[3]], []) is just as bad as attempting to sum
> > strings, both conceptually (it's not mathematical addition)
>
> Unfortunately, the words sum and summation are linked to other words
> that are commonly used to describe what is happening to the numders,
> the lists, and the strings.
> Words such as accumulate, concatenate, aggregate, collect, assemble, as
> well as add.

String concatenation and numeric addition only group together under the
most vague of english terms, "putting things together".  String
concatenation violates many mathematical properties of
addition/summation, and simply isn't related.

It is unfortunate that many languages (including python) promote the
confusion by using + for a "put things together" operator, ie both
mathematical addition and string concatenation.

> > I believe the prefered method to flatten a list of lists is this:
> >
> > shallow = []
> > for i in deep:
> > shallow.extend(i)
> >
> > Yes, it's three lines.  It's also very easy to read.  reduce() and
> > sum() are not.
>
> I'd like to squeeze in the listcomp version, not because it is one line
> shorter, but because I, and maybe others prefer short listcomps such as
> the folowing:
>
> shallow = []
> [shallow.extend(i) for i in deep]

I'm sure this has been mentioned before, but listcomps are for when you
want to store the list and use it for further things, not for when you
want a side effect.  TOOWTDI.

And of course, if saving a line was the reason:

shallow = []
for i in deep: shallow.extend(i)

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Client-side TCP socket receiving "Address already in use" upon connect

2006-09-04 Thread Rhamphoryncus

Felipe Almeida Lessa wrote:
> 2006/9/3, Alex Martelli <[EMAIL PROTECTED]>:
> > Reflecting on the OP's use case, since all connections are forever being
> > made to the same 16 servers, why not tweak thinks a bit to hold those
> > connections open for longer periods of time, using a connection for many
> > send/receive "transactions" instead of opening and closing such
> > connections all of the time?  That might well work better...
>
> Connecting to 16 differente servers per second gives a very poor
> performance, right? There's some overhead in creating TCP connections,
> even on fast networks and computers. Am I right?

I can think of four costs you would invoke by not reusing connections:
1) extra sockets.  This is what the OP experienced.
2) startup/teardown bandwidth.  More packets need to be sent to start
or end a connection
3) startup latency.  It takes some time to create a usable connection.
4) rampup time.  TCP/IP doesn't dump everything on the network as soon
as a connection is opened.  It "feels it out", giving it more, then a
little more, until it finds the limit.  If you're sending multiple
files (small ones especially!) you'll likely hit this.

So yeah, bottom line, it IS faster and more efficient to reuse
connections.  If you're doing a protocol for sending files you'll want
to do it.

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Deleting from a list while iterating

2006-12-03 Thread Rhamphoryncus

The problems of this are well known, and a suggestion for making this
easier was recently posted on python-dev.  However, I believe this can
be done just as well without a change to the language.  What's more,
most of the suggested methods (in my search results as well as the
suggestion itself) do not scale well, which my approach would solve.

My approach is to make a set of indexes to removed while iterating,
then use a list comprehension to filter them out after.  Timings of
this and two other common approaches follow:

setapproach = """\
def func(count):
from random import random
items = [random() for i in xrange(count)]
remove = set()
for index, x in enumerate(items):
#...do something...
if x < 0.5:
remove.add(index)
items = [x for index, x in enumerate(items) if index not in remove]
"""

copyapproach = """\
def func(count):
from random import random
items = [random() for i in xrange(count)]
for x in items[:]:
if x < 0.5:
items.remove(x)
"""

reverseapproach = """\
def func(count):
from random import random
items = [random() for i in xrange(count)]
for index in range(len(items) - 1, -1, -1):
if items[index] < 0.5:
del items[index]
"""

>>> import timeit
>>> timeit.Timer(stmt='func(1000)', setup=setapproach).timeit(1)
0.0016040802001953125
>>> timeit.Timer(stmt='func(1000)', setup=copyapproach).timeit(1)
0.0085191726684570312
>>> timeit.Timer(stmt='func(1000)', setup=reverseapproach).timeit(1)
0.0011308193206787109

>>> timeit.Timer(stmt='func(1)', setup=setapproach).timeit(1)
0.021183013916015625
>>> timeit.Timer(stmt='func(1)', setup=copyapproach).timeit(1)
1.0268981456756592
>>> timeit.Timer(stmt='func(1)', setup=reverseapproach).timeit(1)
0.038264989852905273

>>> timeit.Timer(stmt='func(10)', setup=setapproach).timeit(1)
0.23896384239196777
>>> timeit.Timer(stmt='func(10)', setup=copyapproach).timeit(1)
274.57498288154602
>>> timeit.Timer(stmt='func(10)', setup=reverseapproach).timeit(1)
2.2382969856262207

As you can see, although reverse iteration is somewhat faster at
smaller sizes, a set is substantially faster at larger sizes, and I
believe is more readable anyway.  Copying shouldn't even be considered
unless you know the size will always be trivial (< 1000).

I'm sure there's a few other approaches that would do even better under
certain conditions.  One is a generator, if your input and output
should both be iterators.  Another is using slicing to move contiguous
sections of retained items over the removed items.  I leave both of
these as an exercise for the reader.

-- 
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Deleting from a list while iterating

2006-12-03 Thread Rhamphoryncus

Marc 'BlackJack' Rintsch wrote:
> Why do you make it that complicated?  If you are going to build a new list
> anyway, this can be done without the `set()` and just one listcomp:

Fredrik Lundh wrote:
> your set approach doesn't modify the list in place, though; it creates
> a new list, in a rather roundabout way.  if modification in place isn't
> important, the normal way is of course to create a *new* list:
>
>  items = [i for i in items if not i < 0.5]
>
> on my machine, that's about two orders of magnitude faster than your
> "fast" approach for n=10.

Sorry, I should have clarified that the original post assumed you
needed info from the "do something" phase to determine if an element is
removed or not.  As you say, a list comprehension is superior if that
is not necessary.

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Deleting from a list while iterating

2006-12-03 Thread Rhamphoryncus

Martin v. Löwis wrote:
> Rhamphoryncus schrieb:
> > setapproach = """\
> > def func(count):
> > from random import random
> > items = [random() for i in xrange(count)]
> > remove = set()
> > for index, x in enumerate(items):
> > #...do something...
> > if x < 0.5:
> > remove.add(index)
> > items = [x for index, x in enumerate(items) if index not in remove]
> > """
>
> This is different from the other approaches in that it doesn't
> modify items. If you wanted a new list, you could incrementally
> build one already in the first pass, no need to collect the
> indices first (as BlackJack explains).

I didn't feel this distinction was worth mentioning, since "oldlist[:]
= newlist" is so trivial.  The only solution that really avoids making
a copy is the reverse-iteration one.


> If you wanted in-place modification, I'd do
>
>  newitems = []
>  for x in items:
> if not (x < 0.5):
>newitems.append(x)
>  items[:] = newitems

I agree, that does seem simpler.


> > copyapproach = """\
> > def func(count):
> > from random import random
> > items = [random() for i in xrange(count)]
> > for x in items[:]:
> > if x < 0.5:
> > items.remove(x)
> > """
>
> This happens to work for your example, but is incorrect
> in the general case: you meant to write
>
>   del items[i+removed]
>
> here, as .remove(x) will search the list again, looking
> for the first value to remove. If your condition for
> removal happens to leave some items equal to x in the
> list, it would remove the wrong item. Because the numbering
> changes while the iteration is in progress, you have
> to count the number of removed items also.

I agree that the example I gave here sucks.  However, I copied it from
another posting as a recommended method to get removal to work *right*
(without noticing how slow it is).

There seems to have been many distinct approaches to this problem over
the years.  Hopefully we can converge on a single ideal solution (or a
few situational ones) as TOOWTDI.

-- 
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Thread-safety of dict

2007-06-01 Thread Rhamphoryncus

On Jun 1, 3:51 am, Duncan Booth <[EMAIL PROTECTED]> wrote:
> "Adam Olsen" <[EMAIL PROTECTED]> wrote:
> > So there you have it: if you're using a dict with custom classes (or
> > anything other than str) across multiple threads, and without locking
> > it, it's possible (though presumably extremely rare) for a lookup to
> > fail even through the key was there the entire time.
>
> Nice work.
>
> It would be an interesting exercise to demonstrate this in practice, and I
> think it should be possible without resorting to threads (by putting
> something to simulate what the other thread would do into the __cmp__
> method).

I had attempted to do so, but I lost interest when I realized I'd have
to manipulate the memory allocator at the same time. ;)

> I don't understand your reasoning which says it cannot stay in
> ma_smalltable: PyDict_SetItem only calls dictresize when at least 2/3 of
> the slots are filled. You can have 5 items in the small (8 slot) table and
> the dictionary will resize to 32 slots on adding the 6th,the next resize
> comes when you add the 22nd item.

What you're missing is that a slot can be in any of 3 states:
1) Active.  A key is here.  If the current slot doesn't match
lookdict() will try the next one.
2) Dummy.  Used to be a key here, but now it's gone.  lookdict() will
try the next one.
3) NULL.  Never was a key here.  lookdict() will stop.

lookdict() needs the NULL slots to stop searching.  As keys are added
and removed from the table it will get filled with dummy slots, so
when the total number of active+dummy slots exceeds 2/3rds it will
trigger a resize (to the same size or even a smaller size!) so as to
clear out all the dummy slots (letting lookups finish sooner).

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python's handling of unicode surrogates

2007-04-19 Thread Rhamphoryncus

On Apr 19, 11:02 pm, Neil Hodgson <[EMAIL PROTECTED]>
wrote:
> Adam Olsen:
>
> > To solve this I propose Python's unicode type using UTF-16 should have
> > gaps in its index, allowing it to only expose complete unicode scalar
> > values.  Iteration would produce surrogate pairs rather than
> > individual surrogates, indexing to the first half of a surrogate pair
> > would produce the entire pair (indexing to the second half would raise
> > IndexError), and slicing would be required to not separate a surrogate
> > pair (IndexError otherwise).
>
> I expect having sequences with inaccessible indices will prove
> overly surprising. They will behave quite similar to existing Python
> sequences except when code that works perfectly well against other
> sequences throws exceptions very rarely.

"Errors should never pass silently."

The only way I can think of to make surrogates unsurprising would be
to use UTF-8, thereby bombarding programmers with variable-length
characters.


> > Reasons to treat surrogates as undivisible:
> > * \U escapes and repr() already do this
> > * unichr(0x1) would work on all unicode scalar values
>
> unichr could return a 2 code unit string without forcing surrogate
> indivisibility.

Indeed.  I was actually surprised that it didn't, originally I had it
listed with \U and repr().


> > * "There is no separate character type; a character is represented by
> > a string of one item."
>
>  Could amend this to "a string of one or two items".
>
> > * iteration would be identical on all platforms
>
> There could be a secondary iterator that iterates over characters
> rather than code units.

But since you should use that iterator 90%+ of the time, why not make
it the default?


> > * sorting would be identical on all platforms
>
> This should be fixable in the current scheme.

True.


> > * UTF-8 or UTF-32 containing surrogates, or UTF-16 containing isolated
> > surrogates, are ill-formed[2].
>
> It would be interesting to see how far specifying (and enforcing)
> UTF-16 over the current implementation would take us. That is for the 16
> bit Unicode implementation raising an exception if an operation would
> produce an unpaired surrogate or other error. Single element indexing is
> a problem although it could yield a non-string type.

Err, what would be the point in having a non-string type when you
could just as easily produce a string containing the surrogate pair?
That's half my proposal.


> > Reasons against such a change:
> > * Breaks code which does range(len(s)) or enumerate(s).  This can be
> > worked around by using s = list(s) first.
>
> The code will work happily for the implementor and then break when
> exposed to a surrogate.

The code may well break already.  I just make it explicit.


> > * "Nobody is forcing you to use characters above 0x".  This is a
> > strawman.  Unicode goes beyond 0x because real languages need it.
> > Software should not break just because the user speaks a different
> > language than the programmer.
>
> Characters over 0x are *very* rare. Most of the Supplementary
> Multilingual Plane is for historical languages and I don't think there
> are any surviving Phoenician speakers. Maybe the extra mathematical
> signs or musical symbols will prove useful one software and fonts are
> implemented for these ranges. The Supplementary Ideographic Plane is
> historic Chinese and may have more users.

Yet Unicode has deemed them worth including anyway.  I see no reason
to make them more painful then they have to be.

A program written to use them today would most likely a) avoid
iteration, and b) replace indexes with slices (s[i] -> s[i:i
+len(sub)].  If they need iteration they'll have to reimplement it,
providing the exact behaviour I propose.  Or they can recompile Python
to use UTF-32, but why shouldn't such features be available by
default?


> I think that effort would be better spent on an implementation that
> appears to be UTF-32 but uses UTF-16 internally. The vast majority of
> the time, no surrogates will be present, so operations can be simple and
> fast. When a string contains a surrogate, a flag is flipped and all
> operations go through more complex and slower code paths. This way,
> consumers of the type see a simple, consistent interface which will not
> report strange errors when used.

Your solution would require code duplication and would be slower.  My
solution would have no duplication and would not be slower.  I like
mine. ;)


> BTW, I just implemented support for supplemental planes (surrogates,
> 4 byte UTF-8 sequences) for Scintilla, a text editing component.

I dream of a day when complete unicode support is universal.  With
enough effort we may get there some day. :)

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python's handling of unicode surrogates

2007-04-19 Thread Rhamphoryncus

(Sorry for the dupe, Martin.  Gmail made it look like your reply was
in private.)

On 4/19/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> > Thoughts, from all you readers out there?  For/against?
>
> See PEP 261. This things have all been discussed at that time,
> and an explicit decision against what I think (*) your proposal is
> was taken. If you want to, you can try to revert that
> decision, but you would need to write a PEP.

I don't believe this specific variant has been discussed.  The change
I propose would make indexes non-contiguous, making unicode
technically not a sequence.  I say that's a case for "practicality
beats purity".

Of course I'd appreciate any clarification before I bring it to
python-3000.

> Regards,
> Martin
>
> (*) I don't fully understand your proposal. You say that you
> want "gaps in [the string's] index", but I'm not sure what
> that means. If you have a surrogate pair on index 4, would
> it mean that s[5] does not exist, or would it mean that
> s[5] is the character following the surrogate pair? Is
> there any impact on the length of the string? Could it be
> that len(s[k]) is 2 for some values of s and k?

s[5] does not exist.  You would get an IndexError indicating that it
refers to the second half of a surrogate.

The length of the string will not be changed.  s[s.find(sub):] will
not be changed, so long as sub is a well-formed unicode string.
Nothing that properly handles unicode surrogates will be changed.

len(s[k]) would be 2 if it involved a surrogate, yes.  One character,
two code units.

The only code that will be changed is that which doesn't handle
surrogates properly.  Some will start working properly.  Some (ie
random.choice(u'\U0010\u')) will fail explicitly (rather than
silently).

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python's handling of unicode surrogates

2007-04-20 Thread Rhamphoryncus

On Apr 20, 5:49 pm, Ross Ridge <[EMAIL PROTECTED]>
wrote:
> Rhamphoryncus  <[EMAIL PROTECTED]> wrote:
> >The only code that will be changed is that which doesn't handle
> >surrogates properly.  Some will start working properly.  Some (ie
> >random.choice(u'\U0010\u')) will fail explicitly (rather than
> >silently).
>
> You're falsely assuming that any code that doesn't support surrogates
> is broken.  Supporting surrogates is no more required than supporting
> combining characters, right-to-left languages or lower case letters.

No, I'm only assuming code which would raise an error with my change
is broken.  My change would have minimal effect because it's building
on the existing correct way to do things, expanding them to handle
some additional cases.

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python's handling of unicode surrogates

2007-04-20 Thread Rhamphoryncus

On Apr 20, 6:21 pm, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> > I don't believe this specific variant has been discussed.
>
> Now that you clarify it: no, it hasn't been discussed. I find that
> not surprising - this proposal is so strange and unnatural that
> probably nobody dared to suggest it.

Difficult problems sometimes need unexpected solutions.

Although Guido seems to be relenting slightly on the O(1) indexing
requirement, so maybe we'll end up with an O(log n) solution (where n
is the number of surrogates, not the length of the string).

> > s[5] does not exist.  You would get an IndexError indicating that it
> > refers to the second half of a surrogate.
>
> [...]
>
> > len(s[k]) would be 2 if it involved a surrogate, yes.  One character,
> > two code units.
>
> Please consider trade-offs. Study advantages and disadvantages. Compare
> them. Can you then seriously suggest that indexing should have 'holes'?
> That it will be an IndexError if you access with an index between 0
> and len(s)???

If you pick an index at random you will get IndexError.  If you
calculate the index using some combination of len, find, index, rfind,
rindex you will be unaffected by my change.  You can even assume the
length of a character so long as you know it fits in 16 bits (ie any
'\u' escape).

I'm unaware of any practical use cases that would be harmed by my
change, so that leaves only philosophical issues.  Considering the
difficulty of the problem it seems like an okay trade-off to me.

> If you absolutely think support for non-BMP characters is necessary
> in every program, suggesting that Python use UCS-4 by default on
> all systems has a higher chance of finding acceptance (in comparison).

I wish to write software that supports Unicode.  Like it or not,
Unicode goes beyond the BMP, so I'd be lying if I said I supported
Unicode if I only handled the BMP.

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Decorating class member functions

2007-05-04 Thread Rhamphoryncus

On May 3, 10:34 pm, Peter Otten <[EMAIL PROTECTED]> wrote:
> The problem is not that you are decorating a method but that you are trying
> to use a callable class instance as a method. For that to work the class
> has to implement the descriptor protocol, see
>
> http://users.rcn.com/python/download/Descriptor.htm
>
> class Bugger (object):
> def __init__ (self, module, fn, instance=None):
> self.module = module
> self.fn = fn
> self.instance = instance
>
> def __call__ (self, *args, **kws):
> print "calling %s.%s()" % (self.module, self.fn.__name__)
> if self.instance is not None:
> args = (self.instance,) + args
> ret_val = self.fn(*args, **kws)
> return ret_val
>
> def __get__(self, instance, class_):
> if instance is None:
> return self
> return Bugger(self.module, self.fn, instance)
>
> def instrument (module_name):
> ret_val = lambda x: Bugger(module_name, x)
> return ret_val
>
> class Stupid(object):
> @instrument("xpd.spam")
> def update(self):
> print "update()"
>
> s = Stupid()
> s.update()
> Stupid.update(s)

I've been bitten by the "class instances as decorators won't get
bound" bug many times.  I had no idea there was a direct solution.
Thanks!

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Idea for joined() builtin

2007-08-16 Thread Rhamphoryncus

I know similar things have been argued before, but little things (like
the joined name implying a copy) can make a big difference.  That and
I'm providing a simple implementation that works right now, so you
don't have to wait for it to ever become a builtin. ;)

>>> joined([], [[1,2,3], [4,5,6]])
[1, 2, 3, 4, 5, 6]
>>> joined(' ', ['hello', 'world'])
'hello world'

def joined(sep, iterable):
if hasattr(sep, 'join'):
return sep.join(iterable)  # string-like interface
else:
output = type(sep)()  # Hopefully an empty container
for i in iterable:
output.extend(i)  # list-like interface
output.extend(sep)
else:
if sep:
del output[-len(sep):]
return output

A little commentary on the intended usage.  There are three common
ways to "combine" objects:

The first is adding numbers together.  This may be accomplished with a
+b or sum([a, b]).  All values affect the output, but their
individuality is lost; you cannot extract the original values[1].

The second is concatenation.  This is usually done with list.extend,
''.join(), or with my joined function.  It can also be done with a+b,
but the performance is substantially worse if done repeatedly.  All
values affect the output and most of their individuality is retained;
given the original lengths of the subsequences, you can extract them
from the combined form.

The third is the "or" operation of a set or integer.  This is done
with a|b.  The values usually have some effect on the output, but as
redundant entries are removed, they lose a significant part of their
individuality.

The important thing to realize is that all of these different ways of
"combining" objects has different conceptual behaviour and different
performance characteristics.  Get them confused or attempt to over-
generalize and you will be bitten.


[1] Math/crypto has some exceptions.  Stop mentally poking holes in my
argument. :)

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Is LOAD_GLOBAL really that slow?

2007-08-30 Thread Rhamphoryncus

On Aug 29, 8:33 pm, Carsten Haese <[EMAIL PROTECTED]> wrote:
> On Wed, 2007-08-29 at 19:23 -0600, Adam Olsen wrote:
> > It seems a common opinion that global access is much slower than local
> > variable access.  However, my benchmarks show a relatively small
> > difference:
>
> > ./python -m timeit -r 10 -v -s 'x = [None] * 1
> > def foo():
> >   for i in x:
> > list; list; list; list; list; list; list; list; list; list' 'foo()'
> > 10 loops -> 0.0989 secs100 loops -> 0.991 secs
> > raw times: 0.999 0.985 0.987 0.985 0.985 0.982 0.982 0.982 0.981 0.985
> > 100 loops, best of 10: 9.81 msec per loop
>
> > ./python -m timeit -r 10 -v -s 'x = [None] * 1
> > def foo():
> >   mylist = list
> >   for i in x:
> > mylist; mylist; mylist; mylist; mylist; mylist; mylist; mylist;
> > mylist; mylist' 'foo()'
> > 10 loops -> 0.0617 secs
> > 100 loops -> 0.61 secs
> > raw times: 0.603 0.582 0.582 0.583 0.581 0.583 0.58 0.583 0.584 0.582
> > 100 loops, best of 10: 5.8 msec per loop
>
> > So global access is about 70% slower than local variable access.  To
> > put that in perspective, two local variable accesses will take longer
> > than a single global variable access.
>
> > This is a very extreme benchmark though.  In practice, other overheads
> > will probably drop the difference to a few percent at most.  Not that
> > important in my book.
>
> Your comparison is flawed, because the function call and the inner for
> loop cause a measurement offset that makes the locals advantage seems
> smaller than it is. In the interest of comparing the times for just the
> local lookup versus just the global lookup, I think the following
> timings are more appropriate:

That's why I used far more name lookups, to minimize the overhead.


> $ python2.5 -mtimeit -r10 -s"y=42" -s"def f(x): pass" "f(42)"
> 100 loops, best of 10: 0.3 usec per loop
> $ python2.5 -mtimeit -r10 -s"y=42" -s"def f(x): x" "f(42)"
> 100 loops, best of 10: 0.331 usec per loop
> $ python2.5 -mtimeit -r 10 -s"y=42" -s"def f(x): y" "f(42)"
> 100 loops, best of 10: 0.363 usec per loop

On my box, the best results I got after several runs were 0.399,
0.447, 0464.  Even less difference than my original results.


> There is no loop overhead here, and after subtracting the function call
> overhead, I get 31 nanoseconds per local lookup and 63 nanoseconds per
> global lookup, so local lookups are just about twice as fast as global
> lookups.
>
> True, whether this difference is significant does depend on how many
> name lookups your code makes and how much else it's doing, but if you're
> doing a lot of number crunching and not a lot of I/O, the difference
> might be significant. Also, even if using local names is only slightly
> faster than using globals, it's still not slower, and the resulting code
> is still more readable and more maintainable. Using locals is a win-win
> scenario.

You get very small speed gains (assuming your code is doing anything
significant), for a lot of effort (trying out different options,
seeing if they're actually faster on different boxes.)  The
readability cost is there, even if it is smaller than many of the
other obfuscations people attempt.  If the speed gains were really
that important you should rewrite in C, where you'd get far greater
speed gains.

So it only seems worthwhile when you really, *really* need to get a
slight speedup on your box, you don't need to get any more speedup
than that, and C is not an option.

Fwiw, I posted this after developing yet another patch to optimize
global lookups.  It does sometimes show an improvement on specific
benchmarks, but overall it harms performance.  Looking into why, it
doesn't make sense that a python dictionary lookup can have less cost
than two simple array indexes, but there you go.  Python dictionaries
are already damn fast.

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Is LOAD_GLOBAL really that slow?

2007-08-30 Thread Rhamphoryncus

On Aug 30, 12:04 pm, "Chris Mellon" <[EMAIL PROTECTED]> wrote:
> On 8/30/07, Rhamphoryncus <[EMAIL PROTECTED]> wrote:
>
> > On Aug 29, 8:33 pm, Carsten Haese <[EMAIL PROTECTED]> wrote:
> > > On Wed, 2007-08-29 at 19:23 -0600, Adam Olsen wrote:
> > > There is no loop overhead here, and after subtracting the function call
> > > overhead, I get 31 nanoseconds per local lookup and 63 nanoseconds per
> > > global lookup, so local lookups are just about twice as fast as global
> > > lookups.
>
> __builtins__ lookups are an extra dict lookup slower than just global
> variables, too. Don't forget those.

Heh right, I forgot that.  That's what my benchmark was actually
testing.


> > > True, whether this difference is significant does depend on how many
> > > name lookups your code makes and how much else it's doing, but if you're
> > > doing a lot of number crunching and not a lot of I/O, the difference
> > > might be significant. Also, even if using local names is only slightly
> > > faster than using globals, it's still not slower, and the resulting code
> > > is still more readable and more maintainable. Using locals is a win-win
> > > scenario.
>
> > You get very small speed gains (assuming your code is doing anything
> > significant), for a lot of effort (trying out different options,
> > seeing if they're actually faster on different boxes.)  The
> > readability cost is there, even if it is smaller than many of the
> > other obfuscations people attempt.  If the speed gains were really
> > that important you should rewrite in C, where you'd get far greater
> > speed gains.
>
> I've doubled the speed of a processing loop by moving globals lookups
> out of the loop. Rewriting in C would have taken at least a day, even
> with Pyrex, localizing the lookup took about 2 minutes.

I'm guessing that was due to deep voodoo involving your processor's
pipeline, branch prediction, caching, etc.  I'd be interested in
seeing small examples of it though.


> > So it only seems worthwhile when you really, *really* need to get a
> > slight speedup on your box, you don't need to get any more speedup
> > than that, and C is not an option.
>
> It's not a huge optimization, but it's really easy to write if you
> don't mind adding fake kwargs to your functions. Just for the heck of
> it I also wrote a decorator that will re-write the bytecode so that
> any global that can be looked up at function definition will be
> re-written as a local (actually with LOAD_CONST). You can see it 
> athttp://code.google.com/p/wxpsvg/wiki/GlobalsOptimization. Disclaimer:
> While I've tested it with a variety of functions and it's never broken
> anything, I've never actually used this for anything except an
> intellectual exercise. Use at your own risk.

Doubling the throughput while doing *real work* is definitely more
significant than maybe-or-maybe-not-quite-doubling without any real
work.

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: concise code (beginner)

2007-09-07 Thread Rhamphoryncus

On Sep 6, 1:56 pm, Karthik Gurusamy <[EMAIL PROTECTED]> wrote:
> That said, it may be a good future language enhancement to define a
> reasonable consistent behavior for an iterator over a changing
> collection. This occurs quite common when we walk a collection and
> usually delete the current item.
>
> For a sequence, what the expected behavior is quite obvious (just
> remove this element and go over to the next). For other collections
> like dictionary/set, again if the operation is delete, the expected
> behavior is obvious. If we are doing insertion, for sequence a well-
> defined behavior can be formulated (based on insert before or after
> current position -- if after we will see it in the walk, if before we
> won't see it) . For dict/set I see this isn't simple (as based on hash
> key we may insert ahead or later of the current 'cursor'/position.

Removing from a list while you iterate will had quadratic performance
though.  O(n) to find the element you wish to remove and move over
everything after it, multiplied by your original O(n) of iterating,
gives O(n**2).  That, combined with the fact that adding enough
accounting to invalidate or update your iterator would be a cost on
all the correct users too, is why it's not done.

The best approach in almost all cases in python is to create a new
container as you iterate over the old one.  After you finish, you
replace the old one with the new one.  This lets you keep an overall
O(n) performance, as well as avoiding the tricky semantics.

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: threading and iterator crashing interpreter

2007-03-12 Thread Rhamphoryncus

On Mar 11, 8:35 am, "Janto Dreijer" <[EMAIL PROTECTED]> wrote:
> On Mar 11, 3:27 pm, "Gabriel Genellina" <[EMAIL PROTECTED]>
> wrote:
> > At least, the problem of using the same generator from different threads
> > still remains, if you don't use my modified code. In general, when using
> > multiple threads you always need some way to syncronize access to shared
> > objects. You are lucky with Sessions because append is an atomic operation
> > on lists; for more information see  
> > http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-threa...
>
> Again you're right. But even putting a try-except:return around the
> random.choice still manages to break the interpreter. I'm not looking
> for any meaningful output really. I just want it not to break.

I think you need to clarify what you mean by "break".  Python tries to
prevent C-level corruption (segfaults, etc), but does NOT attempt to
prevent python-level corruption.  The python-level corruption will
*probably* be localized (and the objects involved could be deleted, no
harm done), but there's no guarantee that they won't involve some
global too.

But the examples you give involve C-level corruption.  They're not
using some low-level interface either, so they're genuine bugs.

login1 did segfault for me, but the output is random and usually very
long.  login2 is somewhat random as well, but much shorter and to the
point:

Exception exceptions.SystemError: '../Python/traceback.c:96: bad
argument to internal function' in 
ignored
Exception exceptions.SystemError: '../Python/traceback.c:96: bad
argument to internal function' in 
ignored
Exception exceptions.SystemError: '../Python/traceback.c:96: bad
argument to internal function' in 
ignored
Exception exceptions.SystemError: '../Python/traceback.c:96: bad
argument to internal function' in 
ignored
Segmentation fault

Blah!  I was trying to trace it down, but then the phase of the moon
changed and login2 (as well as my variations therein) stopped
failing.  login1 still segfaults though.  And login1 does sometimes
give me that weird restricted mode error, so you're not alone.

I do remember reading something about generators having a pointer to
the thread they're created in, and having problems when migrated
between threads.  I don't have time to find the posts or bug reports
right now, but at least that gives you something to look for (before
filing a bug report).

Actually, I have a few more minutes.
http://mail.python.org/pipermail/python-dev/2006-October/069470.html
http://sourceforge.net/tracker/index.php?func=detail&aid=1579370&group_id=5470&atid=105470

That refers to a generator crash.  You are using generators, but also
getting a weird dict error.  Maybe related, maybe not.

I'll figure out if I've got a "fixed" version or not when I get back.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: threading and iterator crashing interpreter

2007-03-12 Thread Rhamphoryncus

On Mar 12, 4:47 pm, "Klaas" <[EMAIL PROTECTED]> wrote:
> On Mar 12, 1:10 pm, "Rhamphoryncus" <[EMAIL PROTECTED]> wrote:
>
> >http://sourceforge.net/tracker/index.php?func=detail&aid=1579370&grou...
>
> > That refers to a generator crash.  You are using generators, but also
> > getting a weird dict error.  Maybe related, maybe not.
>
> > I'll figure out if I've got a "fixed" version or not when I get back.

The bug report mentions 2.5.1, which hasn't be released yet, so
obviously I'm not using it. :)


> I was the one who filed the first bug.  login2 is definitely the same
> bug: you have a generator running in a thread, but the thread is being
> garbage collected before the generator's .close() method is run (which
> references the thread state).
>
> Try the patch I posted in the bug above; if it works, then python
> trunk/5.1maint should work.

And with a fresh checkout of trunk I still get a crash with login1.
login2 isn't crashing, although it's not as consistent for me anyway.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: tuples, index method, Python's design

2007-04-13 Thread Rhamphoryncus

On Apr 13, 1:32 am, Antoon Pardon <[EMAIL PROTECTED]> wrote:
> Suppose someone writes a function that acts on a sequence.
> The algorithm used depending on the following invariant.
>
>   i = s.index(e) => s[i] = e
>
> Then this algorithm is no longer guaranteed to work with strings.

It never worked correctly on unicode strings anyway (which becomes the
canonical string in python 3.0).  The base unit exposed by the
implementation is rarely what you want to operate upon.

The terminology is pretty confusing, but let's see if I can lay out
the relationships here:

byte ≤ code unit ≤ code point ≤ scalar value ≤ grapheme cluster ~
character ≤ syllable ≤ word ≤ sentence ≤ paragraph

"12" in "123" allows you to handle bytes through scalar values the
same way, glossing over the implementation details (such as UTF-32 on
linux and UTF-16 on windows).

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: tuples, index method, Python's design

2007-04-14 Thread Rhamphoryncus

On Apr 13, 11:05 pm, Paul Rubin <http://[EMAIL PROTECTED]> wrote:
> "Rhamphoryncus" <[EMAIL PROTECTED]> writes:
> > >   i = s.index(e) => s[i] = e
> > > Then this algorithm is no longer guaranteed to work with strings.
> > It never worked correctly on unicode strings anyway (which becomes the
> > canonical string in python 3.0).
>
> What?!   Are you sure?  That sounds broken to me.

Nope, it's pretty fundamental to working with text, unicode only being
an extreme example: there's a wide number of ways to break down a
chunk of text, making the odds of "e" being any particular one fairly
low.  Python's unicode type only makes this slightly worse, not
promising any particular one is available.

For example, if you had an algorithm designed for ascii that gathered
statistics on how common each "character" is, you'd want to redesign
it to use either grapheme clusters or scalar values, then improve it
to merge duplicate characters.  You'd need to roll your own iterator
though, Python doesn't provide a method that's specifically grapheme
clusters or scalar values (and if I'm wrong I'd love to hear it!).

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: tuples, index method, Python's design

2007-04-14 Thread Rhamphoryncus

On Apr 14, 11:59 am, Paul Rubin <http://[EMAIL PROTECTED]> wrote:
> "Rhamphoryncus" <[EMAIL PROTECTED]> writes:
> > Nope, it's pretty fundamental to working with text, unicode only being
> > an extreme example: there's a wide number of ways to break down a
> > chunk of text, making the odds of "e" being any particular one fairly
> > low.  Python's unicode type only makes this slightly worse, not
> > promising any particular one is available.
>
> I don't understand this.  I thought that unicode was a character
> coding system like ascii, except with an enormous character set
> combined with a bunch of different algorithms for encoding unicode
> strings as byte sequences.  But I've thought of those algorithms
> (UTF-8 and so forth) as basically being kludgy data compression
> schemes, and unicode strings are still just sequences of code points.

Indexing cost, memory efficiency, and canonical representation: pick
two.  You can't use a canonical representation (scalar values) without
some sort of costly search when indexing (O(log n) probably) or by
expanding to the worst-case size (UTF-32).  Python has taken the
approach of always providing efficient indexing (O(1)), but you can
compile it with either UTF-16 (better memory efficiency) or UTF-32
(canonical representation).

As an aside, I feel the need to clarify the terms "code points" and
"scalar values".  The only difference is that "code points" includes
the surrogates, whereas "scalar values" does not.  As the surrogates
are just an encoding detail of UTF-16 I feel this makes "scalar
values" the more canonical term.  It's all quite confusing though x_x.

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: tuples, index method, Python's design

2007-04-15 Thread Rhamphoryncus

On Apr 15, 1:55 am, Paul Rubin <http://[EMAIL PROTECTED]> wrote:
> "Rhamphoryncus" <[EMAIL PROTECTED]> writes:
> > Indexing cost, memory efficiency, and canonical representation: pick
> > two.  You can't use a canonical representation (scalar values) without
> > some sort of costly search when indexing (O(log n) probably) or by
> > expanding to the worst-case size (UTF-32).  Python has taken the
> > approach of always providing efficient indexing (O(1)), but you can
> > compile it with either UTF-16 (better memory efficiency) or UTF-32
> > (canonical representation).
>
> I still don't get it.  UTF-16 is just a data compression scheme, right?
> I mean, s[17] isn't the 17th character of the (unicode) string regardless
> of which memory byte it happens to live at?  It could be that that accessing
> it takes more than constant time, but that's hidden by the implementation.
>
> So where does the invariant c==s[s.index(c)] fail, assuming s contains c?

On linux (UTF-32):
>>> c = u'\U0010'
>>> c
u'\U0010'
>>> list(c)
[u'\U0010']

On windows (UTF-32):
>>> c = u'\U0010'
>>> c
u'\U0010'
>>> list(c)
[u'\udbff', u'\udfff']

The unicode type's repr hides the distinction but you can see it with
list.  Your "single character" is actually two surrogate code points.
s[s.index(c)] would only give you the first surrogate character

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: tuples, index method, Python's design

2007-04-15 Thread Rhamphoryncus

On Apr 15, 8:56 am, Roel Schroeven <[EMAIL PROTECTED]>
wrote:
> Paul Rubin schreef:
>
> > "Rhamphoryncus" <[EMAIL PROTECTED]> writes:
> >> Indexing cost, memory efficiency, and canonical representation: pick
> >> two.  You can't use a canonical representation (scalar values) without
> >> some sort of costly search when indexing (O(log n) probably) or by
> >> expanding to the worst-case size (UTF-32).  Python has taken the
> >> approach of always providing efficient indexing (O(1)), but you can
> >> compile it with either UTF-16 (better memory efficiency) or UTF-32
> >> (canonical representation).
>
> > I still don't get it.  UTF-16 is just a data compression scheme, right?
> > I mean, s[17] isn't the 17th character of the (unicode) string regardless
> > of which memory byte it happens to live at?  It could be that that accessing
> > it takes more than constant time, but that's hidden by the implementation.
>
> > So where does the invariant c==s[s.index(c)] fail, assuming s contains c?
>
> I didn't get it either, but now I understand. Like you, I thought Python
> Unicode strings contain a canonical representation (in interface, not
> necessarily in implementation) but apparently that is not true; see
> Neil's post and the reference manual
> (http://docs.python.org/ref/types.html#l2h-22).
>
> A simple example on my Python installation, apparently compiled to use
> UTF-16 (sys.maxunicode == 65535):
>
>  >>> s = u'\u1d400'

You're confusing \u, which is followed by 4 digits, and \U, which is
followed by eight:
>>> list(u'\u1d400')
[u'\u1d40', u'0']
>>> list(u'\U0001d400')
[u'\U0001d400']  # UTF-32 output, sys.maxunicode == 1114111
[u'\ud835', u'\udc00']  # UTF-16 output, sys.maxunicode == 65535

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Using fractions instead of floats

2007-10-01 Thread Rhamphoryncus

On Sep 30, 7:35 pm, andresj <[EMAIL PROTECTED]> wrote:
> I was doing some programming in Python, and the idea came to my mind:
> using fractions instead of floats when doing 2/5.

The core problem with rationals (other than the inability to handle
irrationals) is their tendency to require more and more memory as
calculations progress.  This means they get mysteriously slower and
slower.

http://mail.python.org/pipermail/python-list/2002-October/166630.html

My own "pet numeric type" is fixed point.  You get as much precision
as you specify.  Alas, 2/5 would likely result in 0. ;)

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python Interview Questions

2007-10-31 Thread Rhamphoryncus

On Oct 31, 2:58 am, konryd <[EMAIL PROTECTED]> wrote:
> > - string building...do they use "+=" or do they build a list
> >and use .join() to recombine them efficiently
>
> I'm not dead sure about that, but I heard recently that python's been
> optimized for that behaviour. That means: using += is almost as fast
> as joining list. Besides, "+=" is more obvious than the other choice
> and since performance is not a problem until it's a problem, I'd
> rather use it. Anyway: I wouldn't pick it for an interview question.

Concatenating strings using += will often perform quadratically with
the number of concatenations.  Your testing will likely use a small
number and not take a noticeable amount of time.  Then it'll get
deployed and someone will plug in a much larger number, wondering why
the program is so slow.  The recent optimizations just make it more
obscure.

+= shouldn't be an obvious choice for sequences.  If it's mutable,
use .append().  If it's immutable, build up in a mutable sequence,
then convert.

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Maths error

2007-01-15 Thread Rhamphoryncus

Nick Maclaren wrote:
> The problem with it is that it is an unrealistically pessimal model,
> and there are huge classes of algorithm that it can't handle at all;
> anything involving iterative convergence for a start.  It has been
> around for yonks (I first dabbled with it 30+ years ago), and it has
> never reached viability for most real applications.  In 30 years, it
> has got almost nowhere.
>
> Don't confuse interval methods with interval arithmetic, because you
> don't need the latter for the former, despite the claims that you do.
>
> |> For people just getting into it, it can be shocking to realize just how
> |> wide the interval can become after some computations.
>
> Yes.  Even when you can prove (mathematically) that the bounds are
> actually quite tight :-)

I've been experimenting with a fixed-point interval type in python.  I
expect many algorithms would require you to explicitly
round/collapse/whatever-term the interval as they go along, essentially
making it behave like a float.  Do you think it'd suitable for
general-use, assuming you didn't mind the explicit rounding?

Unfortunately I lack a math background, so it's unlikely to progress
past an experiment.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: threading and multicores, pros and cons

2007-02-15 Thread Rhamphoryncus

On Feb 14, 4:30 pm, "MRAB" <[EMAIL PROTECTED]> wrote:
> Hmm. I wonder whether it would be possible to have a pair of python
> cores, one for single-threaded code (no locks necessary) and the other
> for multi-threaded code. When the Python program went from single-
> threaded to multi-threaded or multi-threaded to single-threaded there
> would be a switch from one core to the other.

I have explored this option (and some simpler variants).  Essentially,
you end up rewriting a massive amount of CPython's codebase to change
the refcount API.  Even all the C extension types assume the refcount
can be statically initialized (which may not be true if you're trying
to make it efficient on multiple CPUs.)

Once you realize the barrier for entry is so high you start
considering alternative implementations.  Personally, I'm watching
PyPy to see if they get reasonable performance using JIT.  Then I can
start hacking on it.

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Weird result returned from adding floats depending on order I add them

2007-02-21 Thread Rhamphoryncus

On Feb 20, 6:08 am, "John Machin" <[EMAIL PROTECTED]> wrote:
> >>> for x in b:
>
> ...tot += x
> ...print repr(x), repr(tot)
> ...
> 0.20001 0.20001
> 0.20001 0.40002
> 0.20001 0.60009
> 0.20001 0.80004
> 0.10001 0.90002
> 0.10001 1.0
>
>
>
> As you can see, 0.1 and 0.2 can't be represented exactly as floating
> point numbers. Consequently there is only a rough chance that they
> will add up to what you think they should add up to.

Although your point is correct, this is actually a myth about repr.
The goal of repr is to be able to round-trip all numbers, so
eval(repr(n)) == n.  From that perspective it would be perfectly legal
if it printed out a nice and short "0.2" or "0.1".

As for the actual value, although you can't express all non-repeating
base-10 values with non-repeating base-2, you can express base-2 with
base-10.  It just gets a little long:

>>> '%.60f' % 0.2
'0.20001110223024625156540423631668090820312500'
>>> '%.60f' % 0.1
'0.1555111512312578270211815834045410156250'

Unfortunately this method of printing out floats won't work for
smaller values, since the %f formatting limits the number of decimal
places.

But if you want a more compact exact representation I have bodged
together a way of printing out floats in base-16:

>>> hexfloat(0.2)
hexfloat('0.34')
>>> hexfloat(0.1)
hexfloat('0.1A')

Interesting, if a bit confusing.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Populating huge data structures from disk

2007-11-07 Thread Rhamphoryncus

On Nov 6, 2:42 pm, "Michael Bacarella" <[EMAIL PROTECTED]> wrote:
> > Note that you're not doing the same thing at all. You're
> > pre-allocating the array in the C code, but not in Python (and I don't
> > think you can). Is there some reason you're growing a 8 gig array 8
> > bytes at a time?
>
> > They spend about the same amount of time in system, but Python spends 4.7x
> > as much
> > CPU in userland as C does.
>
> > Python has to grow the array. It's possible that this is tripping a
> > degenerate case in the gc behavior also (I don't know if array uses
> > PyObjects for its internal buffer), and if it is you'll see an
> > improvement by disabling GC.
>
> That does explain why it's consuming 4.7x as much CPU.
>
> > > x = lengthy_number_crunching()
> > > magic.save_mmap("/important-data")
>
> > > and in the application do...
>
> > > x = magic.mmap("/important-data")
> > > magic.mlock("/important-data")
>
> > > and once the mlock finishes bringing important-data into RAM, at
> > > the speed of your disk I/O subsystem, all accesses to x will be
> > > hits against RAM.
>
> > You've basically described what mmap does, as far as I can tell. Have
> > you tried just mmapping the file?
>
> Yes, that would be why my fantasy functions have 'mmap' in their names.
>
> However, in C you can mmap arbitrarily complex data structures whereas
> in Python all you can mmap without transformations is an array or a string.
> I didn't say this earlier, but I do need to pull more than arrays
> and strings into RAM.  Not being able to pre-allocate storage is a big
> loser for this approach.

I don't see how needing transformations is an issue, as it's just a
constant overhead (in big-O terms.)

The bigger concern is if what your storing is self contained (numbers,
strings) or if it contains inter-object references.  The latter may
require you to translate between indexes and temporary handle
objects.  This in turn may require some sort of garbage collection
scheme (refcounting, tracing).  Note that the addresses of the mmap'd
region can change each time the program runs, so even in C you may
need to use indexes.  (You may also want to eliminate C's padding,
although that would make the objects not directly accessible (due to
hardware alignment requirements.))

Having a separate process (or thread) that occasionally pokes each
page (as suggested by Paul Rubin) seems like the cleanest way to
ensure they stay "hot".  One pass during startup is insufficient, as
unused portions of a long-running program may get swapped out.  Also
note that poking need only touch 1 byte per page, much cheaper than
copying the entire page (so long as the page is already loaded from
disk.)

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Confused about closures and scoping rules

2007-11-07 Thread Rhamphoryncus

On Nov 6, 5:37 pm, "Chris Mellon" <[EMAIL PROTECTED]> wrote:
> On Nov 6, 2007 6:23 PM, Jean-Paul Calderone <[EMAIL PROTECTED]> wrote:
>
> > On Tue, 06 Nov 2007 17:07:47 -0700, Fernando Perez <[EMAIL PROTECTED]> 
> > wrote:
> > > [snip]
>
> > >This struck me as counterintuitive, but I couldn't find anything in the
> > >official docs indicating what the expected behavior should be. Any
> > >feedback/enlightenment would be welcome.  This problem appeared deep 
> > >inside a
> > >complicated code and it took me almost two days to track down what was 
> > >going
> > >on...
>
> > Lots of people ask about this.  The behavior you observed is the expected
> > (by the implementors, anyway) behavior.
>
> Are there languages where closures *don't* behave like this? A closure
> that used a copy of the state rather than the actual state itself
> doesn't seem as useful. For references sake, JavaScript (the only
> language that a) has closures and b) I have a handy way to test with)
> does the same thing.

I've never needed to repeatedly modify closure variables.  However, I
may not set them until after the function is defined.  A simple
example is that of a recursive function:

def foo():
def bar():
bar()  # useful work omitted
return bar

Obviously, bar can't be set until after the function object is
created.

Showing changes also matches the behaviour of globals, which is a good
thing IMO.


--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Using python as primary language

2007-11-09 Thread Rhamphoryncus

On Nov 9, 9:14 am, "Chris Mellon" <[EMAIL PROTECTED]> wrote:
> The heavy use of dicts is one of the problems - they're all over the
> place and even if you removed the GIL, a "global dict lock" would give
> essentially the same effect. And a per-dict lock means that there will
> be a *lot* more locking and unlocking, which means a slower
> interpreter.

It's worse than that.  Not only are they slower for a single thread,
but when using multiple threads the cores will contend over the cache
lines containing the locks, leaving you yet slower still!

I've starting putting together a site explaining my approach to
solving these problems:

http://code.google.com/p/python-safethread/

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Using python as primary language

2007-11-09 Thread Rhamphoryncus

On Nov 9, 1:45 pm, "Terry Reedy" <[EMAIL PROTECTED]> wrote:
> "Martin Vilcans" <[EMAIL PROTECTED]> wrote in message
>
> news:[EMAIL PROTECTED]
> | But that's not what my question was about. It was about whether it
> | would make sense to, on the same python installation, select between
> | the GIL and fine-grained locks at startup. Because even if the locks
> | slows down the interpreter, if they let you utilize a 32 core CPU, it
> | may not be so bad anymore. Or am I missing something?
>
> 1. The need for some group to develop and maintain what anounts to a forked
> version of CPython.

That's pretty much what I'm doing.

> 2. If micro-locked Python ran, say, half as fast, then you can have a lot
> of IPC (interprocess communition) overhead and still be faster with
> multiple processes rather than multiple threads.

Of course you'd be faster still if you rewrote key portions in C.
That's usually not necessary though, so long as Python gives a roughly
constant overhead compared to C, which in this case would be true so
long as Python scaled up near 100% with the number of cores/threads.

The bigger question is one of usability.  We could make a usability/
performance tradeoff if we had more options, and there's a lot that
can give good performance, but at this point they all offer poor to
moderate usability, none having good usability.  The crux of the
"multicore crisis" is that lack of good usability.

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Using python as primary language

2007-11-12 Thread Rhamphoryncus

On Nov 12, 2:28 am, "Martin Vilcans" <[EMAIL PROTECTED]> wrote:
> On Nov 10, 2007 12:48 AM, Rhamphoryncus <[EMAIL PROTECTED]> wrote:
>
> > On Nov 9, 1:45 pm, "Terry Reedy" <[EMAIL PROTECTED]> wrote:
> > > 2. If micro-locked Python ran, say, half as fast, then you can have a lot
> > > of IPC (interprocess communition) overhead and still be faster with
> > > multiple processes rather than multiple threads.
>
> > Of course you'd be faster still if you rewrote key portions in C.
> > That's usually not necessary though, so long as Python gives a roughly
> > constant overhead compared to C, which in this case would be true so
> > long as Python scaled up near 100% with the number of cores/threads.
>
> > The bigger question is one of usability.  We could make a usability/
> > performance tradeoff if we had more options, and there's a lot that
> > can give good performance, but at this point they all offer poor to
> > moderate usability, none having good usability.  The crux of the
> > "multicore crisis" is that lack of good usability.
>
> Certainly. I guess it would be possible to implement GIL-less
> threading in Python quite easily if we required the programmer to
> synchronize all data access (like the synchronized keyword in Java for
> example), but that gets harder to use. Am I right that this is the
> problem?
>
> Actually, I would prefer to do parallell programming at a higher
> level. If Python can't do efficient threading at low level (such as in
> Java or C), then so be it. Perhaps multiple processes with message
> passing is the way to go. It just that it seems so... primitive.

What you've got to ask is "what is a message?"  Can you define your
own class and use it as a message?  Doing so seems very useful.  If
you allow that you end up with something like Concurrent Pascal's
monitors (which enforce their boundary, so they lack the memory model
needed by java or C.)

Once you make message passing easy to use you end up with something
that looks very much like threads anyway, just lacking the shared-
state and the resulting memory model.

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Troubleshooting garbage collection issues

2007-11-18 Thread Rhamphoryncus

On Nov 17, 10:34 am, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
wrote:
> Hi folks - wondering if anyone has any pointers on troubleshooting
> garbage collection.  My colleagues and I are running into an
> interesting problem:
>
> Intermittently, we get into a situation where the garbage collection
> code is running in an infinite loop.  The data structures within the
> garbage collector have been corrupted, but it is unclear how or why.
> The problem is extremely difficult to reproduce consistently as it is
> unpredictable.
>
> The infinite loop itself occurs in gcmodule.c, update_refs.  After
> hitting this in the debugger a couple of times, it appears that that
> one of the nodes in the second or third generation list contains a
> pointer to the first generation head node.  The first generation was
> cleared shortly before the call into this function, so it contains a
> prev and next which point to itself.  Once this loop hits that node,
> it spins infinitely.
>
> Chances are another module we're depending on has done something
> hinkey with GC.  The challenge is tracking that down.  If anyone has
> seen something like this before and has either pointers to specific GC
> usage issues that can create this behavior or some additional thoughts
> on tricks to track it down to the offending module, they would be most
> appreciated.
>
> You can assume we've done some of the "usual" things - hacking up
> gcmodule to spit information when the condition occurs, various
> headstands and gymnastics in an attempt to identify reliable steps to
> reproduce - the challenge is the layers of indirection that we think
> are likely present between the manifestation of the problem and the
> module that produced it.

Does "usual things" also include compiling with --with-pydebug?

You could also try the various memory debuggers.  A refcounting error
is the first thing that comes to mind, although I can't see off hand
how this specific problem would come about.

Are you using threading at all?

Do you see any pattern to the types that have the bogus pointers?

--
Adam Olsen, aka Rhamphoryncus
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: removeall() in list

2008-01-12 Thread Rhamphoryncus

On Jan 12, 1:37 am, [EMAIL PROTECTED] wrote:
> On Jan 11, 8:04 pm, Paul Rubin  wrote:
>
>
>
> > [EMAIL PROTECTED] writes:
> > > Could you:
>
> > > lockerA= Locker( listA, listB )
> > > lockerA.op( listB.reverse )
> > > lockerA.op( listA.pop )
>
> > > Where lockerA ops acquire the locks on all its threads?
>
> > I don't understand that question.  The main thing to understand is
> > that multi-threaded programming is complicated (especially if you're
> > after high performance), and very difficult to get right without
> > knowing exactly what you're doing.  The generally preferred approach
> > in Python is to keep things simple at some performance cost.
>
> > Your Locker approach above looks sort of reasonable if you can be
> > absolutely sure that nothing else can mess with listA or listB
> > concurrently with those locker operations.  Normally you would put
> > listA and listB into a single object along with a lock, then do
> > operations on that object.
>
> > You might check the Python Cookbook for some specific recipes and
> > sample code for this stuff.  If you've used Java, Python's general
> > threading mechanisms are similar, but they are in the library rather
> > than built into the language (i.e. there is no "synchronized"
> > keyword, you have to do that locking explicitly).
>
> > What is the actual application, if you don't mind saying?  Are you
> > sure that you really need concurrency?
>
> I'm writing an NxN observer pattern, mostly for my own personal
> exploration.  Two threads -might- be calling 'Disconnect' at the same
> time, and I can't even guarantee that the function runs properly.
>
> for emelem in [ e for e in emlist if e.func is func ]:
> try:
> emlist.remove( emelem )
> except ValueError:
> pass

Is there a reason you're using a list, rather than a dict?  Note that
each call to list.remove() is O(n), whereas deleting a key from a dict
is O(1).
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: super, decorators and gettattribute

2008-01-15 Thread Rhamphoryncus

On Jan 13, 5:51 am, Richard Szopa <[EMAIL PROTECTED]> wrote:
> On Jan 13, 8:59 am, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote:
>
> > On Sat, 12 Jan 2008 14:23:52 -0800, Richard Szopa wrote:
> > > However, I am very surprised to learn that
>
> > > super_object.__getattr__(name)(*args, **kwargs)
>
> > > getattr(super_object, name)(*args, **kwargs)
>
> > > are not equivalent. This is quite odd, at least when with len()
> > > and .__len__, str() and .__str__. Do you maybe know what's the
> > > rationale behind not following that convention by getattr?
>
> > I think you are confusing `__getattr__` and `__getattribute__` here!
> > `getattr()` maps to `__getattr__()`, it's `__getattribute__` that's
> > different.
>
> Well, in my code calling super_object.__getattr__(name)(*args,
> **kwargs) and getattr(super_object, name)(*args, **kwargs) gives
> *different* effects (namely, the latter works, while the former
> doesn't). That kinda suggests that they don't map to each other :-).
> And that makes me feel confused.

Don't think of them as mappings.  Think of them as a way for a class
to hook into getattr's protocol, conveniently named similar to getattr
(__getattr__ and __getattribute__).  getattr() may call several
methods, no methods at all, change the arguments, etc.  Although len()
may seem simple, many others are not so simple.

--
Adam Olsen, aka Rhamphoryncus
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Multiple interpreters retaining huge amounts of memory

2008-02-07 Thread Rhamphoryncus

On Feb 2, 10:32 pm, Graham Dumpleton <[EMAIL PROTECTED]>
wrote:
> The multi interpreter feature has some limitations, but if you know
> what you are doing and your application can be run within those
> limitations then it works fine.

I've been wondering about this for a while.  Given the severe
limitations of it, what are the use cases where multiple interpreters
do work?  All I can think of is that it keeps separate copies of
loaded python modules, but since you shouldn't be monkey-patching them
anyway, why should you care?

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Article of interest: Python pros/cons for the enterprise

2008-02-23 Thread Rhamphoryncus

On Feb 23, 11:39 am, Nicola Musatti <[EMAIL PROTECTED]> wrote:
> Paul Rubin wrote:
> > Nicola Musatti <[EMAIL PROTECTED]> writes:
> >>>a = [f(x) + g(y) for x,y in izip(m1, m2) if h(x,y).frob() == 7]
> [...]
> > There you replace one line of code with 40+ lines to get around the
> > absence of GC.  Sounds bug-prone among other things.
>
> Come on, you didn't define f, g, izip, h or frob either. It's more like
> 5 to one. Furthermore your code is more compact due to the existence of
> list comprehensions, not because of GC. Had you written a loop the
> difference would be smaller.

So here's three versions, for comparison:


a = [f(x) + g(y) for x,y in izip(m1, m2) if h(x,y).frob() == 7]


a = []
for x, y in izip(m1, m2):
if h(x, y).frob() == 7:
a.append(f(x) + g(y))


std::vector a;
for (std::someiterable::iterator i = izip(m1, m2); !
i.finished(); ++i) {
if (h(i->first, i->second).frob() == 7)
a.push_back(f(i->first) + g(i->second));
}

Although the the benefits of static typing can be argued, surely you
can see the clarity tradeoffs that you make for it?

Some notes:
* izip() takes two iterables as input and returns an iterator, using
only O(1) memory at any given time.  Your original version got this
wrong.
* I fudged the "!i.finished()" as I couldn't find an appropriate
official version in time for this post
* The iterator can't sanely be copied, so it probably needs to use
refcounting internally.  Oops, that's GC...
* Whoever decided to keep C's i++ syntax for iterators should be shot
* x and y aren't named.  They could be extracted, but it's just enough
effort that it's probably not worth it in C++.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Implementing file reading in C/Python

2009-01-09 Thread Rhamphoryncus

On Jan 9, 2:14 pm, Marc 'BlackJack' Rintsch  wrote:
> On Fri, 09 Jan 2009 15:34:17 +, MRAB wrote:
> > Marc 'BlackJack' Rintsch wrote:
>
> >> def iter_max_values(blocks, block_count):
> >>     for i, block in enumerate(blocks):
> >>         histogram = defaultdict(int)
> >>         for byte in block:
> >>             histogram[byte] += 1
>
> >>         yield max((count, byte)
> >>                   for value, count in histogram.iteritems())[1]
>
> > [snip]
> > Would it be faster if histogram was a list initialised to [0] * 256?
>
> Don't know.  Then for every byte in the 2 GiB we have to call `ord()`.  
> Maybe the speedup from the list compensates this, maybe not.
>
> I think that we have to to something with *every* byte of that really
> large file *at Python level* is the main problem here.  In C that's just
> some primitive numbers.  Python has all the object overhead.

struct's B format might help here.  Also, struct.unpack_from could
probably be combined with mmap to avoid copying the input.  Not to
mention that the 0..256 ints are all saved and won't be allocated/
deallocated.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Python 3 isinstance

2009-01-15 Thread Rhamphoryncus

On Jan 14, 3:14 pm, "Lambert, David W (S&T)" 
wrote:
> Please, why isn't a set permitted as the second argument to isinstance?

The real problem is it would be misleading.  isinstance(a, myset)
suggests implementation like type(a) in myset, but it's really any
(isinstance(a, t) for t in myset).

If myset is arbitrarily large (as the examples suggest), and you're
checking on every call, the code would be so slow as to be considered
broken.  This invalidates the use case.

The abc module in 2.6/3.0 is a possible alternative.  Create an ABC
type, register each type in myset as part of it, then do isinstance(a,
MyABC).  Internally the ABC will cache lookups, effectively getting
you back to O(1) cost.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Does Python really follow its philosophy of "Readability counts"?

2009-01-19 Thread Rhamphoryncus

On Jan 18, 12:51 pm, "Russ P."  wrote:
> And even if all your developers were excellent, data hiding would
> still be a convenient mechanism to simplify their jobs so they can
> focus on higher level problems -- and not have to rely on an ugly
> naming convention.

That's just it — the cost of maintaining friend lists and similar
mechanisms exceeds the benefit enforced privates.

Doing it for performance or for a secure sandbox of course changes
this, but they're separate issues.

Java hides low-level details because it thinks the programmer is
incompetent.  Python hides them because it thinks the programmer has
better things to do.  An underscore is all a competent programmer
needs.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Does Python really follow its philosophy of "Readability counts"?

2009-01-20 Thread Rhamphoryncus

On Jan 20, 12:04 pm, "Russ P."  wrote:
 Hey, if pylint can reliably detect private data access violations,
> that's good news to me. I haven't used it, so I don't know. (I used
> pychecker a while back, but I haven't used that for a while either.)
>
> If pylint can check access violations, then it seems to me that
> someone (who is familiar with the internals of the Python interpreter)
> should be able to integrate that feature into Python itself relatively
> easily.
>
> Actually, in addition to the enforcement of "private," you also need
> the enforcement of "protected." If you only enforce "private," derived
> classes will not have access to data they need. And if you don't
> enforce "protected," then anyone can trivially gain access to private
> data by simply deriving a new class. It would be like having a lock on
> the door with the key hanging right there on a string.
>
> I realize that this complicates matters. As I said before, I am not
> claiming that Python should necessarily get enforced data hiding. All
> I am saying is that, if it doesn't get it, it will never be
> appropriate for certain domains. But maybe nobody cares about that
> except me.

If pylint had "private" it should behave like "protected".  Basic
encapsulation is about good style (separation of concerns), not
security, and subclassing is a clear statement that this concern is
closely related.

Of course if you accept that the extremism of security is a separate
issue then you have the question of how much is necessary to encourage
separation of concerns, and and what should be available to work
around it...


> The other problem with the pylint approach is aesthetic: the
> requirement for a leading underscore to indicate private data. I
> realize that some actually like that convention, but I don't. I spend
> most of my development time working with "private" data, and why
> should I have to look at that litter everywhere? I use Python in part
> because I want clean looking code, and leading underscores bother me
> -- just as the leading dollar signs in Perl bother many Python
> programmers.

There needs to be *some* indicator that separates a public property
from a private one.  What would you suggest?
--
http://mail.python.org/mailman/listinfo/python-list

Re: what's the point of rpython?

2009-01-20 Thread Rhamphoryncus

On Jan 16, 5:37 pm, "Brendan Miller"  wrote:
> So I kind of wanted to ask this question on the pypy mailing list..
> but there's only a pypy-dev list, and I don't want to put noise on the
> dev list.
>
> What's the point of RPython? By this, I don't mean "What is RPython"?
> I get that. I mean, why?

There are some distinct benefits of RPython:

* starts up as real python code, letting you define globals that later
get "snapshotted" into static code
* can be executed using a real python interpreter, getting much more
reliable debugging (but broken rpython code might run fine under
python)
* provides a clear avenue for extension, by adding more python
features

You might argue just having a python syntax is also a benefit, but the
semantics are so different as to more than counteract it.  Add in the
cost of implementing your own compiler... yeah.
--
http://mail.python.org/mailman/listinfo/python-list

Re: what's the point of rpython?

2009-01-20 Thread Rhamphoryncus

On Jan 19, 9:00 pm, "Brendan Miller"  wrote:
> Maybe I'm missing something here but a lock free algorithm for
> reference counting seems pretty trivial. As long as you can atomically
> increment and decrement an integer without locking you are pretty much
> done.

"lock free" is largely meaningless.  What it really means is "we use
small hardware locks rather than big software locks, thereby reducing
(but not eliminating!) the contention".

Atomic refcounting is easy.  If done sparingly to minimize contention
it works great.  Python uses refcounting massively with heavily
contended objects (think about your builtin types, constants, globals,
classes, etc.)  It does not perform acceptably for python.

The second issue is the objects themselves, like a list which is
mutable.  If you're using it in a single thread or writing from
multiple threads this is a non-trivial constant cost.  If your object
is not modified after creation and is read from many threads a lock
would be a point of contention, preventing you from scaling freely.
The dicts used by classes and globals are an import example of this,
and a successful implementation needs something non-contending.  I
assume Jython and IronPython do this.
--
http://mail.python.org/mailman/listinfo/python-list

Re: what's the point of rpython?

2009-01-20 Thread Rhamphoryncus

On Jan 20, 2:39 pm, Paul Rubin <http://[email protected]> wrote:
> Rhamphoryncus  writes:
> > "lock free" is largely meaningless.  What it really means is "we use
> > small hardware locks rather than big software locks, thereby reducing
> > (but not eliminating!) the contention".
>
> At least in the case of Haskell's "software transactional memory",
> reads are genuinely lock free in the normal uncontended case.  You use
> LOCK XCHG to increment a counter in the object when you want to
> update, and increment it again when you are done updating.  Since the
> counter starts at 0, any time it has an odd value, an update is in
> process and any other thread can notice that the object is locked and
> try again later without asserting any locks itself.  If the counter
> has an even value, it is unlocked, but there is a chance that a writer
> thread can lock and udpate the object while a reader thread has a
> lockless access in progress.  The reader detects this has occurred
> when it finishes its transaction, read the counter a second time, and
> notices that the counter has been incremented (maybe more than once)
> since the transaction started; it then rolls back its operation and
> tries again.  I'm not explaining that well so you can read about it in
> SPJ's paper or Keir Fraser's.

a) The contended case is the issue, not the uncontended case.  An
uncontended lock is just constant overhead, not a barrier to
scalability
b) Locks on linux (since the switch to futexes) are pretty close to
free when uncontended.

Transactions are really for different properties:
a) read patterns can be uncontended (fully scalable)
b) a preempted writer does not block other writers, guaranteeing
forward progress
c) ad-hoc combination of several objects into a single atomic update


> > The second issue is the objects themselves, like a list which is
> > mutable.  If you're using it in a single thread or writing from
> > multiple threads this is a non-trivial constant cost.  If your object
> > is not modified after creation and is read from many threads a lock
> > would be a point of contention, preventing you from scaling freely.
> > The dicts used by classes and globals are an import example of this,
> > and a successful implementation needs something non-contending.  I
> > assume Jython and IronPython do this.
>
> I'm pretty sure Jython makes no attempt at all to mess with ref
> counts--it just relies on the underlying Java gc.  I have no idea
> about IronPython.

The second issue has *nothing* to do with refcounts.  It is the
objects themselves.  A classic example is trying to do "i += 1", which
breaks down into "i = i + 1", which isn't an atomic operation.

Java's ConcurrentHash map gets this right.
--
http://mail.python.org/mailman/listinfo/python-list

Re: what's the point of rpython?

2009-01-21 Thread Rhamphoryncus

On Jan 21, 12:57 pm, MRAB  wrote:
> I'm not sure whether multicore processors share a cache or, if not, have
> some other on-chip mechanism. Multiprocessor machines, however, are a
> different matter...

They share some, but also have some exclusive.  How much of which
depends entirely on which CPU you have.
--
http://mail.python.org/mailman/listinfo/python-list

Re: what's the point of rpython?

2009-01-21 Thread Rhamphoryncus

On Jan 21, 9:46 pm, Paul Rubin <http://[email protected]> wrote:
> Rhamphoryncus  writes:
> > a) The contended case is the issue, not the uncontended case.  An
> > uncontended lock is just constant overhead, not a barrier to
> > scalability
>
> a1) Really what matters is the actual mix between contended and
> uncontended accesses, and the synchronization strategy affects the
> amount of contention.  For example, the traditional locking strategy
> involves acquiring a lock before reading the object, so two
> simultaneous read-only accesses would create lock contention.  With
> STM, only updates acquire a lock, so multiple read-only threads can
> access the object simultaneously with no contention.  

Aye, but my point is really about the myth of lock-free algorithms
being uncontending — it's simply not true, and CAN'T be true.  A write
is inherently a mutually exclusive operation.  There's all sorts of
ways to avoid contending for reads, spread out the writes and have a
single thread coalesce them, etc, but fundamentally the write will
involve some mutual exclusion.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Does Python really follow its philosophy of "Readability counts"?

2009-01-22 Thread Rhamphoryncus

On Jan 22, 7:48 pm, Terry Reedy  wrote:
> Steven D'Aprano wrote:
>
> > Here's a thought experiment for you. You've suggested that the values
> > returned by cmp() are allowed to change "at whim". Okay, let's do it:
> > make a patch that changes cmp() to return -17, 0 or 53, and promise to
> > support it for at least three years. Try to get it accepted on python-
> > dev. What do you expect they will say?
>
> > My money is on them saying "No, this will pointlessly break code for no
> > good reason. Rejected."
>
> I have occasionally thought that things documented to possibly change
> *should* be changed just to expose the *bug* of depending on them not
> changing.  Or maybe, the doc should be changed.

Aye.  There are places where we're deliberately open, but this isn't
one of them.  I think this is just a C-ism and the docs should be more
specific.

Or finish ripping it out in 3.x ;)
--
http://mail.python.org/mailman/listinfo/python-list

Re: Why GIL? (was Re: what's the point of rpython?)

2009-01-22 Thread Rhamphoryncus

On Jan 22, 9:38 pm, Carl Banks  wrote:
> On Jan 22, 6:00 am, [email protected] (Aahz) wrote:
>
>
>
> > In article <[email protected]>,
> > Paul Rubin   wrote:
>
> > >alex23  writes:
>
> > >> Here's an article by Guido talking about the last attempt to remove
> > >> the GIL and the performance issues that arose:
>
> > >> "I'd welcome a set of patches into Py3k *only if* the performance for
> > >> a single-threaded program (and for a multi-threaded but I/O-bound
> > >> program) *does not decrease*."
>
> > >The performance decrease is an artifact of CPython's rather primitive
> > >storage management (reference counts in every object).  This is
> > >pervasive and can't really be removed.  But a new implementation
> > >(e.g. PyPy) can and should have a real garbage collector that doesn't
> > >suffer from such effects.
>
> > CPython's "primitive" storage management has a lot to do with the
> > simplicity of interfacing CPython with external libraries.  Any solution
> > that proposes to get rid of the GIL needs to address that.
>
> I recently was on a long road trip, and was not driver, and with
> nothing better to do thought quite a bit about how this.
>
> I concluded that, aside from one major trap, it wouldn't really be
> more difficult to inteface Python to external libraries, just
> differently difficult.  Here is briefly what I came up with:
>
> 1. Change the singular Python type into three metatypes:
> immutable_type, mutable_type, and mutable_dict_type.  (In the latter
> case, the object itself is immutable but the dict can be modified.
> This, of course, would be the default metaclass in Python.)  Only
> mutable_types would require a mutex when accessing.
>
> 2. API wouldn't have to change much.  All regular API would assume
> that objects are unlocked (if mutable) and in a consistent state.
> It'll lock any mutable objects it needs to access.  There would also
> be a low-level API that assumes the objects are locked (if mutable)
> and does not require objects to be consistent.  I imagine most
> extensions would call the standard API most of the time.
>
> 3. If you are going to use the low-level API on a mutable object, or
> are going to access the object structure directly, you need to acquire
> the object's mutex. Macros such as Py_LOCK(), Py_LOCK2(), Py_UNLOCK()
> would be provided.
>
> 4. Objects would have to define a method, to be called by the GC, that
> marks every object it references.  This would be a lot like the
> current tp_visit, except it has to be defined for any object that
> references another object, not just objects that can participate in
> cycles.  (A conservative garbage collector wouldn't suffice for Python
> because Python quite often allocates blocks but sets the pointer to an
> offset within the block.  In fact, that's true of almost any Python-
> defined type.)  Unfortunately, references on the stack would need to
> be registered as well, so "PyObject* p;" might have to be replaced
> with something like "Py_DECLARE_REF(PyObject,p);" which magically
> registers it.  Ugly.
>
> 5. Py_INCREF and Py_DECREF are gone.
>
> 6. GIL is gone.
>
> So, you gain the complexity of a two-level API, having to lock mutable
> objects sometimes, and defining more visitor methods than before, but
> you don't have to keep INCREFs and DECREFs straight, which is no small
> thing.
>
> The major trap is the possibily of deadlock.  To help minimize the
> risk there would be macros to lock multiple objects at once.  Py_LOCK2
> (a,b), which guarantess that if in another thread is calling Py_LOCK2
> (b,a) at the same time, it won't result in a deadlock.  What's
> disappointing is that the deadlocking possibility is always with you,
> much like the reference counts are.

IMO, locking of the object is a secondary problem.  Python-safethread
provides one solution, but it's not the only conceivable one.  For the
sake of discussion it's easier to assume somebody else is solving it
for you.

Instead, focus on just the garbage collection.  What are the practical
issues of modifying CPython to use a tracing GC throughout?  It
certainly is possible to write an exact GC in C, but the stack
manipulation would be hideous.  It'd also require significant rewrites
of the entire code base.  Throw on that the performance is unclear (it
could be far worse for a single-threaded program), with no
straightforward way to make it a compile-time option..

Got any ideas for that?
--
http://mail.python.org/mailman/listinfo/python-list

Re: Why GIL? (was Re: what's the point of rpython?)

2009-01-23 Thread Rhamphoryncus

On Jan 22, 11:09 pm, Carl Banks  wrote:
> On Jan 22, 9:38 pm, Rhamphoryncus  wrote:
>
>
>
> > On Jan 22, 9:38 pm, Carl Banks  wrote:
>
> > > On Jan 22, 6:00 am, [email protected] (Aahz) wrote:
>
> > > > In article <[email protected]>,
> > > > Paul Rubin  <http://[email protected]> wrote:
>
> > > > >alex23  writes:
>
> > > > >> Here's an article by Guido talking about the last attempt to remove
> > > > >> the GIL and the performance issues that arose:
>
> > > > >> "I'd welcome a set of patches into Py3k *only if* the performance for
> > > > >> a single-threaded program (and for a multi-threaded but I/O-bound
> > > > >> program) *does not decrease*."
>
> > > > >The performance decrease is an artifact of CPython's rather primitive
> > > > >storage management (reference counts in every object).  This is
> > > > >pervasive and can't really be removed.  But a new implementation
> > > > >(e.g. PyPy) can and should have a real garbage collector that doesn't
> > > > >suffer from such effects.
>
> > > > CPython's "primitive" storage management has a lot to do with the
> > > > simplicity of interfacing CPython with external libraries.  Any solution
> > > > that proposes to get rid of the GIL needs to address that.
>
> > > I recently was on a long road trip, and was not driver, and with
> > > nothing better to do thought quite a bit about how this.
>
> > > I concluded that, aside from one major trap, it wouldn't really be
> > > more difficult to inteface Python to external libraries, just
> > > differently difficult.  Here is briefly what I came up with:
>
> > > 1. Change the singular Python type into three metatypes:
> > > immutable_type, mutable_type, and mutable_dict_type.  (In the latter
> > > case, the object itself is immutable but the dict can be modified.
> > > This, of course, would be the default metaclass in Python.)  Only
> > > mutable_types would require a mutex when accessing.
>
> > > 2. API wouldn't have to change much.  All regular API would assume
> > > that objects are unlocked (if mutable) and in a consistent state.
> > > It'll lock any mutable objects it needs to access.  There would also
> > > be a low-level API that assumes the objects are locked (if mutable)
> > > and does not require objects to be consistent.  I imagine most
> > > extensions would call the standard API most of the time.
>
> > > 3. If you are going to use the low-level API on a mutable object, or
> > > are going to access the object structure directly, you need to acquire
> > > the object's mutex. Macros such as Py_LOCK(), Py_LOCK2(), Py_UNLOCK()
> > > would be provided.
>
> > > 4. Objects would have to define a method, to be called by the GC, that
> > > marks every object it references.  This would be a lot like the
> > > current tp_visit, except it has to be defined for any object that
> > > references another object, not just objects that can participate in
> > > cycles.  (A conservative garbage collector wouldn't suffice for Python
> > > because Python quite often allocates blocks but sets the pointer to an
> > > offset within the block.  In fact, that's true of almost any Python-
> > > defined type.)  Unfortunately, references on the stack would need to
> > > be registered as well, so "PyObject* p;" might have to be replaced
> > > with something like "Py_DECLARE_REF(PyObject,p);" which magically
> > > registers it.  Ugly.
>
> > > 5. Py_INCREF and Py_DECREF are gone.
>
> > > 6. GIL is gone.
>
> > > So, you gain the complexity of a two-level API, having to lock mutable
> > > objects sometimes, and defining more visitor methods than before, but
> > > you don't have to keep INCREFs and DECREFs straight, which is no small
> > > thing.
>
> > > The major trap is the possibily of deadlock.  To help minimize the
> > > risk there would be macros to lock multiple objects at once.  Py_LOCK2
> > > (a,b), which guarantess that if in another thread is calling Py_LOCK2
> > > (b,a) at the same time, it won't result in a deadlock.  What's
> > > disappointing is that the deadlocking possibility is always with you,
> > > much like the reference counts are.
>
> > IMO, locking of the object

Re: Pythonic list/tuple/dict layout?

2009-01-25 Thread Rhamphoryncus

d1
--
http://mail.python.org/mailman/listinfo/python-list

Re: Does Python really follow its philosophy of "Readability counts"?

2009-01-27 Thread Rhamphoryncus

On Jan 27, 12:13 pm, "Russ P."  wrote:
> On Jan 26, 6:09 am, Steve Holden  wrote:
>
> > Quite. Python is a language "for consenting adults". It has perceived
> > deficiencies for certain software engineering environments. Can we drop
> > the subject now? This horse was flogged to death long ago, and it's
> > pointless and cruel to keep on beating the remains.
>
> Judging from this thread, not everyone got the memo yet. At least
> three or four people on this thread alone have argued that enforced
> data hiding is of no value whatsoever for any application or domain.
> And more than one of them has argued that Python is perfectly
> appropriate for even the largest and most safety-critical projects.
>
> We are moving into an era of increasing dependence on computers and
> software for safety-critical, mission-critical, and  financial
> systems. If people who do not understand the principles  necessary for
> ultra-reliable software get in charge of developing these systems, we
> will have serious problems that could have been avoided.
>
> I suggested that maybe -- maybe! -- the versatility of Python could be
> enhanced with enforced data hiding. I was careful to say several times
> that I don't know if that can even be done in Python (with all its
> introspection and so forth). And it would always be optional, of
> course (as far as I know, no language forces anyone to declare
> anything private).
>
> Several people here seem to take that suggestion as an assault on
> Python and, by projection, an assault on their worldview. We all know
> that Python is a fantastic language for many purposes, but it is only
> a language, and failing to recognize and address its limitations
> serves no useful purpose.

What you need is a middle ground.  Something that can be *easily*
circumvented for debugging, unit tests, and "friend" functions/modules/
class.

Without suggesting a middle ground people are left assuming C++-style
privates/protected, which would be a significant burden on everybody.
The only way it wouldn't is if nobody actually uses it, except in
specialized high-assurance software, but at that point you might as
well fork python (or use metaclass trickery).
--
http://mail.python.org/mailman/listinfo/python-list

Re: Why GIL? (was Re: what's the point of rpython?)

2009-01-27 Thread Rhamphoryncus

On Jan 27, 12:47 pm, Steve Holden  wrote:
> Paul Rubin wrote:
> > GIL-less Python (i.e. Jython) already exists and beats CPython in
> > performance a lot of the time, including on single processors.
> > Whether the GIL can be eliminated from CPython without massive rework
> > to every extension module ever written is a separate question, of
> > course.  Jython can be viewed a proof of concept.
>
> . I think probably the GIL will never be extracted successfully.
>
> Also IronPython and PyPy (though the latter only in concept for now, I
> believe). Even Guido admits that CPython doesn't necessarily represent
> the dominant future strain ...

IMO it's possible to rewrite only the core while keeping the refcount
API for external compatibility, but a tracing GC API in portable C is
hideous.  Enough to make me want to find or make a better
implementation language.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Flattening lists

2009-02-06 Thread Rhamphoryncus

On Feb 5, 1:16 pm, Michele Simionato 
wrote:
> On Feb 5, 7:24 pm, [email protected] (Aahz) wrote:
>
> > In article 
> > ,
> > Michele Simionato   wrote:
>
> > >Looks fine to me. In some situations you may also use hasattr(el,
> > >'__iter__') instead of isinstance(el, list) (it depends if you want to
> > >flatten generic iterables or only lists).
>
> > Of course, once you do that, you need to special-case strings...
>
> Strings are iterable but have no __iter__ method, which is fine in
> this context, since I would say 99.9% of times one wants to treat them
> as atomic objects, so no need to special case.

Don't worry, that little oddity was fixed for you:

Python 3.0+ (unknown, Dec  8 2008, 14:26:15)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> str.__iter__

>>> bytes.__iter__

>>> bytearray.__iter__



I'm in the "why do you need more than 1 depth?" camp.  Dispatching
based on your own type should be given an extra look.  Dispatching
based passed in types should be given three extra looks.

I didn't realize itertools.chain(*iterable) worked.  I guess that
needs to be pushed as the canonical form.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Flattening lists

2009-02-07 Thread Rhamphoryncus

On Feb 6, 10:21 pm, [email protected] wrote:
> Quoth Mensanator :
> > def flatten(listOfLists):
> >     return list(chain.from_iterable(listOfLists))
>
>     Python 2.6.1 (r261:67515, Jan  7 2009, 17:09:13)
>     [GCC 4.3.2] on linux2
>     Type "help", "copyright", "credits" or "license" for more information.
>     >>> from itertools import chain
>     >>> list(chain.from_iterable([1, 2, [3, 4]]))
>     Traceback (most recent call last):
>       File "", line 1, in 
>     TypeError: 'int' object is not iterable
>     >>> list(chain(*[1, 2, [3, 4]]))
>     Traceback (most recent call last):
>       File "", line 1, in 
>     TypeError: 'int' object is not iterable
>     >>> list(chain.from_iterable(['abcd', 'efg', [3, 4]]))
>     ['a', 'b', 'c', 'd', 'e', 'f', 'g', 3, 4]

What usecase do you have for such inconsistently structured data?

If I'm building a tree I use my own type for the nodes, keeping them
purely internal, so I can always use isinstance without worrying about
getting something inconvenient passed in.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Flattening lists

2009-02-07 Thread Rhamphoryncus

On Feb 7, 1:39 pm,  wrote:
> On Sat, 7 Feb 2009 01:06:06 -0800 (PST)
> Rhamphoryncus  wrote:
>
> > What usecase do you have for such inconsistently structured data?
>
> I have a similar use case in pyspread, which is a Python spreadsheet
> that employs numpy object arrays. Since the Python objects in the numpy
> arrays are derived from user input, they can be anything, including
> nested lists as well as strings, etc.
>
> Since I consider my work-around that treats strings as a special case a
> rather ugly hack, I would welcome a robust, generic approach to the
> OP's problem.

Can you explain this in a little more detail?
--
http://mail.python.org/mailman/listinfo/python-list

Re: Flattening lists

2009-02-08 Thread Rhamphoryncus

On Feb 7, 3:07 pm,  wrote:
> On Sat, 7 Feb 2009 12:50:22 -0800 (PST)
> Rhamphoryncus  wrote:
> > Can you explain this in a little more detail?
>
> In the application, there is one main numpy array of type "O".
> Each element of the array corresponds to one cell in a grid. The user
> may enter a Python expression into the grid cell. The input is
> evaled and the result is stored in the numpy array (the actual process
> is a bit more complicated). Therefore, the object inside a numpy array
> element may be an inconsistent, nested, iterable type.
>
> The user now may access the result grid via __getitem__. When doing
> this, a numpy array that is as flat as possible while comprising the
> maximum possible data depth is returned, i.e.:
>
> 1. Non-string and non-unicode iterables of similar length for each of
> the cells form extra dimensions.
>
> 2. In order to remove different container types, the result is
> flattened, cast into a numpy.array and re-shaped.
>
> 3. Dimensions of length 1 are eliminated.
>
> Therefore, the user can conveniently use numpy ufuncs on the results.
>
> I am referring to the flatten operation in step 2

I'm afraid I still don't understand it, but I'm only minimally
familiar with numpy, nevermind your spreadsheet app.
--
http://mail.python.org/mailman/listinfo/python-list

Re: can the sequence of entries in a dictionary be depended on?

2008-11-24 Thread Rhamphoryncus

On Nov 23, 6:43 pm, John Machin <[EMAIL PROTECTED]> wrote:
> On Nov 24, 11:59 am, Carsten Haese <[EMAIL PROTECTED]> wrote:
>
> > Diez B. Roggisch wrote:
> > > AFAIK the order is deterministic as long as you don't alter the dict 
> > > between
> > > iterations. However, this is an implementation detail.
>
> > It's not an implementation detail. It's documented behavior. Thus 
> > quothhttp://docs.python.org/library/stdtypes.html#mapping-types-dict:
>
> > """
> > If items(), keys(), values(), iteritems(), iterkeys(), and itervalues()
> > are called with no intervening modifications to the dictionary, the
> > lists will directly correspond.
> > """
>
> Changing the value attached to an existing key is a "modification to
> the dictionary", but it's hard to see how that would change the
> iteration order.

Although the referenced docs don't clarify it, changing the value
without adding or removing a key (even temporarily) does NOT reorder
the dict.  This has been declared officially on python-dev.


> "the lists will directly correspond"? What does that mean? Why not
> "the lists will be equal i.e. list1 == list2"?
>
> How about "Provided that keys are neither added nor deleted, the order
> of iteration will not change"?

Python 3.0 never returns lists directly (it provides views), so the
wording has already been tweaked.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Rich Comparisons Gotcha

2008-12-08 Thread Rhamphoryncus

On Dec 7, 4:20 pm, Steven D'Aprano <[EMAIL PROTECTED]
cybersource.com.au> wrote:
> On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:
> > Rasmus Fogh wrote:
>
> >> Current behaviour is both inconsistent and counterintuitive, as these
> >> examples show.
>
> > x = float('NaN')
> > x == x
> >> False
>
> > Blame IEEE for that one. Rich comparisons have nothing to do with that
> > one.
>
> There is nothing to blame them for. This is the correct behaviour. NaNs
> should *not* compare equal to themselves, that's mathematically
> incoherent.

Mathematically, NaNs shouldn't be comparable at all.  They should
raise an exception when compared.  In fact, they should raise an
exception when *created*.  But that's not what we want.  What we want
is a dummy value that silently plods through our calculations.  For a
dummy value it seems a lot more sense to pick an arbitrary yet
consistent sort order (I suggest just above -Inf), rather than quietly
screwing up the sort.

Regarding the mythical IEEE 754, although it's extremely rare to find
quotations, I have one on just this subject.  And it does NOT say "x
== NaN gives false".  It says it gives *unordered*.  It is C and
probably most other languages that turn that into false (as they want
a dummy value, not an error.)

http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thread/ead0392e646b7cc0/a5bc354cd46f2c49?lnk=st&q=why+does+NaN+not+equal+itself%3F&rnum=3&hl=en&pli=1
--
http://mail.python.org/mailman/listinfo/python-list

Re: Rich Comparisons Gotcha

2008-12-08 Thread Rhamphoryncus

On Dec 8, 11:54 am, Robert Kern <[EMAIL PROTECTED]> wrote:
> Rhamphoryncus wrote:
> > On Dec 7, 4:20 pm, Steven D'Aprano <[EMAIL PROTECTED]
> > cybersource.com.au> wrote:
> >> On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:
> >>> Rasmus Fogh wrote:
> >>>> Current behaviour is both inconsistent and counterintuitive, as these
> >>>> examples show.
> >>>>>>> x = float('NaN')
> >>>>>>> x == x
> >>>> False
> >>> Blame IEEE for that one. Rich comparisons have nothing to do with that
> >>> one.
> >> There is nothing to blame them for. This is the correct behaviour. NaNs
> >> should *not* compare equal to themselves, that's mathematically
> >> incoherent.
>
> > Mathematically, NaNs shouldn't be comparable at all.  They should
> > raise an exception when compared.  In fact, they should raise an
> > exception when *created*.  But that's not what we want.  What we want
> > is a dummy value that silently plods through our calculations.  For a
> > dummy value it seems a lot more sense to pick an arbitrary yet
> > consistent sort order (I suggest just above -Inf), rather than quietly
> > screwing up the sort.
>
> Well, there are explicitly two kinds of NaNs: signalling NaNs and quiet NaNs, 
> to
> accommodate both requirements. Additionally, there is significant flexibility 
> in
> trapping the signals.

Right, but most of that's lower level.  By the time it reaches Python
we only care about quiet NaNs.


> > Regarding the mythical IEEE 754, although it's extremely rare to find
> > quotations, I have one on just this subject.  And it does NOT say "x
> > == NaN gives false".  It says it gives *unordered*.  It is C and
> > probably most other languages that turn that into false (as they want
> > a dummy value, not an error.)
>
> >http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thr...
>
> Table 4 on page 9 of the standard is pretty clear on the subject. When the two
> operands are unordered, the operator == returns False. The standard defines 
> how
> to do comparisons notionally; two operands can be "greater than", "less than",
> "equal" or "unordered". It then goes on to map these notional concepts to
> programming language boolean predicates.

Ahh, interesting.  Still though, does it give an explanation for such
behaviour, or use cases?  There must be some situation where blindly
returning false is enough benefit to trump screwing up sorting.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Rich Comparisons Gotcha

2008-12-08 Thread Rhamphoryncus

On Dec 8, 1:04 pm, Robert Kern <[EMAIL PROTECTED]> wrote:
> Rhamphoryncus wrote:
> > On Dec 8, 11:54 am, Robert Kern <[EMAIL PROTECTED]> wrote:
> >> Rhamphoryncus wrote:
> >>> On Dec 7, 4:20 pm, Steven D'Aprano <[EMAIL PROTECTED]
> >>> cybersource.com.au> wrote:
> >>>> On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:
> >>>>> Rasmus Fogh wrote:
> >>>>>> Current behaviour is both inconsistent and counterintuitive, as these
> >>>>>> examples show.
> >>>>>>>>> x = float('NaN')
> >>>>>>>>> x == x
> >>>>>> False
> >>>>> Blame IEEE for that one. Rich comparisons have nothing to do with that
> >>>>> one.
> >>>> There is nothing to blame them for. This is the correct behaviour. NaNs
> >>>> should *not* compare equal to themselves, that's mathematically
> >>>> incoherent.
> >>> Mathematically, NaNs shouldn't be comparable at all.  They should
> >>> raise an exception when compared.  In fact, they should raise an
> >>> exception when *created*.  But that's not what we want.  What we want
> >>> is a dummy value that silently plods through our calculations.  For a
> >>> dummy value it seems a lot more sense to pick an arbitrary yet
> >>> consistent sort order (I suggest just above -Inf), rather than quietly
> >>> screwing up the sort.
> >> Well, there are explicitly two kinds of NaNs: signalling NaNs and quiet 
> >> NaNs, to
> >> accommodate both requirements. Additionally, there is significant 
> >> flexibility in
> >> trapping the signals.
>
> > Right, but most of that's lower level.  By the time it reaches Python
> > we only care about quiet NaNs.
>
> No, signaling NaNs raise the exception that you are asking for. You're right
> that if you get a Python float object that is a NaN, it is probably going to 
> be
> quiet, but signaling NaNs can affect Python in the way that you want.
>
> >>> Regarding the mythical IEEE 754, although it's extremely rare to find
> >>> quotations, I have one on just this subject.  And it does NOT say "x
> >>> == NaN gives false".  It says it gives *unordered*.  It is C and
> >>> probably most other languages that turn that into false (as they want
> >>> a dummy value, not an error.)
> >>>http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thr...
> >> Table 4 on page 9 of the standard is pretty clear on the subject. When the 
> >> two
> >> operands are unordered, the operator == returns False. The standard 
> >> defines how
> >> to do comparisons notionally; two operands can be "greater than", "less 
> >> than",
> >> "equal" or "unordered". It then goes on to map these notional concepts to
> >> programming language boolean predicates.
>
> > Ahh, interesting.  Still though, does it give an explanation for such
> > behaviour, or use cases?  There must be some situation where blindly
> > returning false is enough benefit to trump screwing up sorting.
>
> Well, the standard was written in the days of Fortran. You didn't really have
> generic sorting routines. You *could* implement whatever ordering you wanted
> because you *had* to implement the ordering yourself. You didn't have to use a
> limited boolean predicate.
>
> Basically, the boolean predicates have to return either True or False. Neither
> one is really satisfactory, but that's the constraint you're under.

"We've always done it that way" is NOT a use case!  Certainly, it's a
factor, but it seems quite weak compared to the sort use case.

I suppose what I'm hoping for is an small example program (one or a
few functions) that needs the "always false" behaviour of NaN.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Rich Comparisons Gotcha

2008-12-08 Thread Rhamphoryncus

On Dec 8, 2:51 pm, Robert Kern <[EMAIL PROTECTED]> wrote:
> Rhamphoryncus wrote:
> > "We've always done it that way" is NOT a use case!  Certainly, it's a
> > factor, but it seems quite weak compared to the sort use case.
>
> I didn't say it was. I was explaining that sorting was probably *not* a use 
> case
> for the boolean predicates at the time of writing of the standard. In fact, it
> suggests implementing a Compare() function that returns "greater than", "less
> than", "equal" or "unordered" in addition to the boolean predicates. That 
> Python
> eventually chose to use a generic boolean predicate as the basis of its 
> sorting
> routine many years after the IEEE-754 standard is another matter entirely.

I interpret that to mean IEEE 754's semantics are for different
circumstances and are inapplicable to Python.


> In any case, the standard itself is quite short, and does not spend much time
> justifying itself in any detail.

A pity, as it is often invoked to explain language design.


> > I suppose what I'm hoping for is an small example program (one or a
> > few functions) that needs the "always false" behaviour of NaN.
>
> Steven D'Aprano gave one earlier in the thread.

I see examples of behaviour, but no use cases.


> Additionally, (x!=x) is a simple
> test for NaNs if an IsNaN(x) function is not available.

That's a trick to work around the lack of IsNaN(x).  Again, not a use
case.


> Really, though, the
> result falls out from the way that IEEE-754 constructed the logic of the
> system. It is not defined that (NaN==NaN) should return False, per se. Rather,
> all of the boolean predicates are defined in terms of that Compare(x,y)
> function. If that function returns "unordered", then (x==y) is False. It 
> doesn't
> matter if one or both are NaNs; in either case, the result is "unordered".

And if I arbitrarily dictate that NaN is a single value which is
orderable, sorting just above -Infinity, then all the behaviour makes
a lot more sense AND I fix sort.

So you see the predicament I'm in.  On the one hand we have a problem
and an obvious solution.  On the other hand we've got historical
behaviour which everybody insists *must* remain, reasons unknown.  It
reeks of the Parable of the Monkeys.

I think I should head over to one of the math groups and see if they
can find a reason for it.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Rich Comparisons Gotcha

2008-12-08 Thread Rhamphoryncus

On Dec 8, 7:44 pm, Steven D'Aprano
<[EMAIL PROTECTED]> wrote:
> On Mon, 08 Dec 2008 10:20:56 -0800, Rhamphoryncus wrote:
> > On Dec 7, 4:20 pm, Steven D'Aprano <[EMAIL PROTECTED]
> > cybersource.com.au> wrote:
> >> On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:
> >> > Rasmus Fogh wrote:
>
> >> >> Current behaviour is both inconsistent and counterintuitive, as
> >> >> these examples show.
>
> >> >>>>> x = float('NaN')
> >> >>>>> x == x
> >> >> False
>
> >> > Blame IEEE for that one. Rich comparisons have nothing to do with
> >> > that one.
>
> >> There is nothing to blame them for. This is the correct behaviour. NaNs
> >> should *not* compare equal to themselves, that's mathematically
> >> incoherent.
>
> > Mathematically, NaNs shouldn't be comparable at all.  They should raise
> > an exception when compared.  In fact, they should raise an exception
> > when *created*.  But that's not what we want.  What we want is a dummy
> > value that silently plods through our calculations.  For a dummy value
> > it seems a lot more sense to pick an arbitrary yet consistent sort order
> > (I suggest just above -Inf), rather than quietly screwing up the sort.
>
> > Regarding the mythical IEEE 754,
>
> It's hardly mythical.
>
> http://ieeexplore.ieee.org/ISOL/standardstoc.jsp?punumber=4610933

I consider it to be mythical because most knowledge of it is
indirect.  Few who use floating point have the documents available to
them.  Requiring purchase/membership is the cause of this.


> > although it's extremely rare to find
> > quotations, I have one on just this subject.  And it does NOT say "x ==
> > NaN gives false".  It says it gives *unordered*.
>
> Unordered means that none of the following is true:
>
> x > NaN
> x < NaN
> x == NaN
>
> It doesn't mean that comparing a NaN with something else is an error.

Robert Kern already clarified that.  My confusion was due to relying
on second-hand knowledge.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Rich Comparisons Gotcha

2008-12-09 Thread Rhamphoryncus

You grossly overvalue using the "in" operator on lists.  It's far more
common to use a dict or set for containment tests, due to O(1)
performance rather than O(n).  I doubt the numpy array supports
hashing, so an error for misuse is all you should expect.

In the rare case that you want to test for identity in a list, you can
easily write your own function to do it upfront:

def idcontains(seq, obj):
for i in seq:
if i is obj:
return True
return False
--
http://mail.python.org/mailman/listinfo/python-list

Re: Rich Comparisons Gotcha

2008-12-10 Thread Rhamphoryncus

On Dec 10, 7:49 am, Rasmus Fogh <[EMAIL PROTECTED]> wrote:
> Rhamphoryncus wrote:
> > You grossly overvalue using the "in" operator on lists.
>
> Maybe. But there is more to it than just 'in'. If you do:>>> c = 
> numpy.zeros((2,))
> >>> ll = [1, c, 3.]
>
> then the following all throw errors:
> 3 in ll, 3 not in ll, ll.index(3), ll.count(3), ll.remove(3)
> c in ll, c not in ll, ll.index(c), ll.count(c), ll.remove(c)
>
> Note how the presence of c in the list makes it behave wrong for 3 as
> well.

All of these are O(n).  Use a set or dict.  What is your use case
anyway?


> > It's far more
> > common to use a dict or set for containment tests, due to O(1)
> > performance rather than O(n).  I doubt the numpy array supports
> > hashing, so an error for misuse is all you should expect.
>
> Indeed it doees not. So there is not much to be gained from modifying
> equality comparison with sets/dicts.
>
> > In the rare case that you want to test for identity in a list, you can
> > easily write your own function to do it upfront:
> > def idcontains(seq, obj):
> >     for i in seq:
> >         if i is obj:
> >             return True
> >     return False
>
> Again, you can code around any particular case (though wrappers look like
> a more robust solution). Still, why not get rid of this wart, if we can
> find a way?

The wart is a feature.  I agree that it's confusing, but the cost of
adding a special case to work around it is far in excess of the
original problem.

Now if you phrased it as a hypothetical discussion for the purpose of
learning about language design, that'd be another matter.
--
http://mail.python.org/mailman/listinfo/python-list

Re: future and compile: unrecognised flags

2008-12-14 Thread Rhamphoryncus

On Dec 13, 6:03 am, Steve Holden  wrote:
> Poor Yorick wrote:
> > I have a future statement in a script which is intended to work in 2.6 and 
> > 3.
> > Shouldn't compile flags in __future__ objects essentially be noops for 
> > versions
> > that already support the feature? doctest is complaining about unrecognised
> > flags.  This illustrates the problem:
>
> >     Python 3.0 (r30:67507, Dec  3 2008, 20:14:27) [MSC v.1500 32 bit 
> > (Intel)] on win
> >     32
> >     Type "help", "copyright", "credits" or "license" for more information.
> >     >>> from __future__ import unicode_literals
> >     >>> src = 'a = "hello"'
> >     >>> c1 = compile(src,'','exec',unicode_literals.compiler_flag)
> >     Traceback (most recent call last):
> >       File "", line 1, in 
> >     ValueError: compile(): unrecognised flags
>
> This could arguably be classed as a bug given that the 2.6 documentation
> for __future__ says "No feature description will ever be deleted from
> __future__." However I suspect that the feature has been removed because
> all string literals are Unicode in 3.0 and up, and 3.0 is allowed to
> break compatibility.

I don't think the rule about __future__ applies to compile().
--
http://mail.python.org/mailman/listinfo/python-list

Re: Are python objects thread-safe?

2008-12-22 Thread Rhamphoryncus

On Dec 21, 11:51 am, RajNewbie  wrote:
> Say, I have two threads, updating the same dictionary object - but for
> different parameters:
> Please find an example below:
> a = {file1Data : '',
>        file2Data : ''}
>
> Now, I send it to two different threads, both of which are looping
> infinitely:
> In thread1:
> a['file1Data'] = open(filename1).read
>           and
> in thread2:
> a['file2Data'] = open(filename2).read
>
> My question is  - is this object threadsafe? - since we are working on
> two different parameters in the object. Or should I have to block the
> whole object?

In general, python makes few promises.  It has a *strong* preference
towards failing gracefully (ie an exception rather than a segfault),
which implies atomic operations underneath, but makes no promise as to
the granularity of those atomic operations.

In practice though, it is safe to update two distinct keys in a dict.
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-22 Thread Rhamphoryncus

On Oct 22, 10:32 am, Andy <[EMAIL PROTECTED]> wrote:
> Dear Python dev community,
>
> I'm CTO at a small software company that makes music visualization
> software (you can check us out atwww.soundspectrum.com).  About two
> years ago we went with decision to use embedded python in a couple of
> our new products, given all the great things about python.  We were
> close to using lua but for various reasons we decided to go with
> python.  However, over the last two years, there's been one area of
> grief that sometimes makes me think twice about our decision to go
> with python...
>
> Some background first...   Our software is used for entertainment and
> centers around real time, high-performance graphics, so python's
> performance, embedded flexibility, and stability are the most
> important issues for us.  Our software targets a large cross section
> of hardware and we currently ship products for Win32, OS X, and the
> iPhone and since our customers are end users, our products have to be
> robust, have a tidy install footprint, and be foolproof.  Basically,
> we use embedded python and use it to wrap our high performance C++
> class set which wraps OpenGL, DirectX and our own software renderer.
> In addition to wrapping our C++ frameworks, we use python to perform
> various "worker" tasks on worker thread (e.g. image loading andprocessing).  
> However, we require *true* thread/interpreter
> independence so python 2 has been frustrating at time, to say the
> least.  Please don't start with "but really, python supports multiple
> interpreters" because I've been there many many times with people.
> And, yes, I'm aware of the multiprocessing module added in 2.6, but
> that stuff isn't lightweight and isn't suitable at all for many
> environments (including ours).  The bottom line is that if you want to
> perform independentprocessing (in python) on different threads, using
> the machine's multiple cores to the fullest, then you're out of luck
> under python 2.
>
> Sadly, the only way we could get truly independent interpreters was to
> put python in a dynamic library, have our installer make a *duplicate*
> copy of it during the installationprocess(e.g. python.dll/.bundle ->
> python2.dll/.bundle) and load each one explicitly in our app, so we
> can get truly independent interpreters.  In other words, we load a
> fresh dynamic lib for each thread-independent interpreter (you can't
> reuse the same dynamic library because the OS will just reference the
> already-loaded one).
>
> From what I gather from the python community, the basis for not
> offering "real" muti-threaded support is that it'd add to much
> internal overhead--and I couldn't agree more.  As a high performance C
> and C++ guy, I fully agree that thread safety should be at the high
> level, not at the low level.  BUT, the lack of truly independent
> interpreters is what ultimately prevents using python in cool,
> powerful ways.  This shortcoming alone has caused game developers--
> both large and small--to choose other embedded interpreters over
> python (e.g. Blizzard chose lua over python).  For example, Apple's
> QuickTime API is powerful in that high-level instance objects can
> leverage performance gains associated with multi-threadedprocessing.
> Meanwhile, the QuickTime API simply lists the responsibilities of the
> caller regarding thread safety and that's all its needs to do.  In
> other words, CPython doesn't need to step in an provide a threadsafe
> environment; it just needs to establish the rules and make sure that
> its own implementation supports those rules.
>
> More than once, I had actually considered expending company resources
> to develop a high performance, truly independent interpreter
> implementation of the python core language and modules but in the end
> estimated that the size of that project would just be too much, given
> our company's current resources.  Should such an implementation ever
> be developed, it would be very attractive for companies to support,
> fund, and/or license.  The truth is, we just love python as a
> language, but it's lack of true interpreter independence (in a
> interpreter as well as in a thread sense) remains a *huge* liability.
>
> So, my question becomes: is python 3 ready for true multithreaded
> support??  Can we finally abandon our Frankenstein approach of loading
> multiple identical dynamic libs to achieve truly independent
> interpreters?? I've reviewed all the new python 3 C API module stuff,
> and all I have to say is: whew--better late then never!!  So, although
> that solves modules offering truly independent interpreter support,
> the following questions remain:
>
> - In python 3, the C module API now supports true interpreter
> independence, but have all the modules in the python codebase been
> converted over?  Are they all now truly compliant?  It will only take
> a single static/global state variable in a module to potentially cause
> no end of pain in a multiple interpr

Re: 2.6, 3.0, and truly independent intepreters

2008-10-22 Thread Rhamphoryncus

On Oct 22, 7:04 pm, Andy <[EMAIL PROTECTED]> wrote:
> > What you describe, truly independent interpreters, is not threading at
> > all: it is processes, emulated at the application level, with all the
> > memory cost and none of the OS protections.  True threading would
> > involve sharing most objects.
>
> > Your solution depends on what you need:
> > * Killable "threads" -> OS processes
> > * multicore usage (GIL removal) -> OS processes or alternative Python
> > implementations (PyPy/Jython/IronPython)
> > * Sane shared objects -> safethread
>
> I realize what you're saying, but it's better said there's two issues
> at hand:
>
> 1) Independent interpreters (this is the easier one--and solved, in
> principle anyway, by PEP 3121, by Martin v. Löwis, but is FAR from
> being carried through in modules as he pointed out).  As you point
> out, this doesn't directly relate to multi-threading BUT it is
> intimately tied to the issue because if, in principle, every module
> used instance data (rather than static data), then python would be
> WELL on its way to "free threading" (as Jesse Noller calls it), or as
> I was calling it "true multi-threading".

If you want processes, use *real* processes.  Your arguments fail to
get transaction because you don't provide a good, justified reason why
they don't and can't work.

Although isolated interpreters would be convenient to you, it's a
specialized use case, and bad language design.  There's far more use
cases that aren't isolated (actual threading), so why exclude them?


> 2) Barriers to "free threading".  As Jesse describes, this is simply
> just the GIL being in place, but of course it's there for a reason.
> It's there because (1) doesn't hold and there was never any specs/
> guidance put forward about what should and shouldn't be done in multi-
> threaded apps (see my QuickTime API example).  Perhaps if we could go
> back in time, we would not put the GIL in place, strict guidelines
> regarding multithreaded use would have been established, and PEP 3121
> would have been mandatory for C modules.  Then again--screw that, if I
> could go back in time, I'd just go for the lottery tickets!! :^)

You seem confused.  PEP 3121 is for isolated interpreters (ie emulated
processes), not threading.

Getting threading right would have been a massive investment even back
then, and we probably wouldn't have as mature of a python we do
today.  Make no mistake, the GIL has substantial benefits.  It may be
old and tired, surrounded by young bucks, but it's still winning most
of the races.


> Anyway, I've been at this issue for quite a while now (we're
> approaching our 3rd release cycle), so I'm pretty comfortable with the
> principles at hand.  I'd say the theme of your comments share the
> theme of others here, so perhaps consider where end-user software
> houses (like us) are coming from.  Specifically, developing commercial
> software for end users imposes some restrictions that open source
> development communities aren't often as sensitive to, namely:
>
> - Performance -- emulation is a no-go (e.g. Jython)

Got some real benchmarks to back that up?  How about testing it on a
16 core (or more) box and seeing how it scales?


> - Maturity and Licensing -- experimental/academic projects are no-go
> (PyPy)
> - Cross platform support -- love it or hate it, Win32 and OS X are all
> that matter when you're talking about selling (and supporting)
> software to the masses.  I'm just the messenger here (ie. this is NOT
> flamebait).  We publish for OS X, so IronPython is therefore out.

You might be able to use Java on one, IronPython on another, and PyPy
in between.  Regardless, my point is that CPython will *never* remove
the GIL.  It cannot be done in an effective, highly scalable fashion
without a total rewrite.


> Basically, our company is at a crossroads where we really need light,
> clean "free threading" as Jesse calls it (e.g. on the iPhone, using
> our python drawing wrapper to do primary drawing while running python
> jobs on another thread doing image decoding and processing).  In our
> current iPhone app, we achieve this by using two python bundles
> (dynamic libs) in the way I described in my initial post.  Sure, thus
> solves our problem, but it's pretty messy, sucks up resources, and has
> been a pain to maintain.

Is the iPhone multicore, or is it an issue of fairness (ie a soft
realtime app)?


> Moving forward, please understand my posts here are also intended to
> give the CPython dev community a glimpse of the issues that may not be
> as visible to you guys (as they are for dev houses like us).  For
> example, it'd be pretty cool if Blizzard went with python instead of
> lua, wouldn't you think?  But some of the issues I've raised here no
> doubt factor in to why end-user dev houses ultimately may have to pass
> up python in favor of another interpreted language.
>
> Bottom line: why give prospective devs any reason to turn down python--
> there's just so man

Re: 2.6, 3.0, and truly independent intepreters

2008-10-22 Thread Rhamphoryncus

On Oct 22, 10:31 pm, Andy <[EMAIL PROTECTED]> wrote:
> > You seem confused.  PEP 3121 is for isolated interpreters (ie emulated
> > processes), not threading.
>
> Please reread my points--inherently isolated interpreters (ie. the top
> level object) are indirectly linked to thread independence.  I don't
> want to argue, but you seem hell-bent on not hearing what I'm trying
> to say here.

I think the confusion is a matter of context.  Your app, written in C
or some other non-python language, shares data between the threads and
thus treats them as real threads.  However, from python's perspective
nothing is shared, and thus it is processes.

Although this contradiction is fine for embedding purposes, python is
a general purpose language, and needs to be capable of directly
sharing objects.  Imagine you wanted to rewrite the bulk of your app
in python, with only a relatively small portion left in a C extension
module.


> > Got some real benchmarks to back that up?  How about testing it on a
> > 16 core (or more) box and seeing how it scales?
>
> I don't care to argue with you, and you'll have to take it on faith
> that I'm not spouting hot air.  But just to put this to rest, I'll
> make it clear in this Jython case:
>
> You can't sell software to end users and expect them have a recent,
> working java distro.  Look around you: no real commercial software
> title that sells to soccer moms and gamers use java.  There's method
> to commercial software production, so please don't presume that you
> know my job, product line, and customers better than me, ok?
>
> Just to put things in perspective, I already have exposed my company
> to more support and design liability than I knew I was getting into by
> going with python (as a result of all this thread safety and
> interpreter independence business).  I love to go into that one, but
> it's frankly just not a good use of my time right now.  Please just
> accept that when someone says an option is a deal breaker, then it's a
> deal breaker.  This isn't some dude's masters thesis project here--we
> pay our RENT and put our KIDS through school because we sell and ship
> software that works is meant to entertain people happy.

Consider it accepted.  I understand that PyPy/Jython/IronPython don't
fit your needs.  Likewise though, CPython cannot fit my needs.  What
we both need simply does not exist today.


> > I'd like to see python used more, but fixing these things properly is
> > not as easy as believed.  Those in the user community see only their
> > immediate problem (threads don't use multicore).  People like me see
> > much bigger problems.  We need consensus on the problems, and how to
> > solve it, and a commitment to invest what's required.
>
> Well, you seem to come down pretty hard on people that at your
> doorstep saying their WILLING and INTERESTED in supporting python
> development.  And, you're exactly right:  users see only their
> immediate problem--but that's the definition of being a user.  If
> users saw the whole picture from the dev side, then they be
> developers, not users.
>
> Please consider that you're representing the python dev community
> here; I'm you're friend here, not your enemy.

I'm sorry if I came across harshly.  My intent was merely to push you
towards supporting long-term solutions, rather than short-term ones.
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-23 Thread Rhamphoryncus

On Oct 23, 11:30 am, Glenn Linderman <[EMAIL PROTECTED]> wrote:
> On approximately 10/23/2008 12:24 AM, came the following characters from
> the keyboard of Christian Heimes:
>
> > Andy wrote:
> >> 2) Barriers to "free threading".  As Jesse describes, this is simply
> >> just the GIL being in place, but of course it's there for a reason.
> >> It's there because (1) doesn't hold and there was never any specs/
> >> guidance put forward about what should and shouldn't be done in multi-
> >> threaded apps (see my QuickTime API example).  Perhaps if we could go
> >> back in time, we would not put the GIL in place, strict guidelines
> >> regarding multithreaded use would have been established, and PEP 3121
> >> would have been mandatory for C modules.  Then again--screw that, if I
> >> could go back in time, I'd just go for the lottery tickets!! :^)
>
> I've been following this discussion with interest, as it certainly seems
> that multi-core/multi-CPU machines are the coming thing, and many
> applications will need to figure out how to use them effectively.
>
> > I'm very - not absolute, but very - sure that Guido and the initial
> > designers of Python would have added the GIL anyway. The GIL makes
> > Python faster on single core machines and more stable on multi core
> > machines. Other language designers think the same way. Ruby recently
> > got a GIL. The article
> >http://www.infoq.com/news/2007/05/ruby-threading-futuresexplains the
> > rationales for a GIL in Ruby. The article also holds a quote from
> > Guido about threading in general.
>
> > Several people inside and outside the Python community think that
> > threads are dangerous and don't scale. The paper
> >http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdfsums
> > it up nicely, It explains why modern processors are going to cause
> > more and more trouble with the Java approach to threads, too.
>
> Reading this PDF paper is extremely interesting (albeit somewhat
> dependent on understanding abstract theories of computation; I have
> enough math background to follow it, sort of, and most of the text can
> be read even without fully understanding the theoretical abstractions).
>
> I have already heard people talking about "Java applications are
> buggy".  I don't believe that general sequential programs written in
> Java are any buggier than programs written in other languages... so I
> had interpreted that to mean (based on some inquiry) that complex,
> multi-threaded Java applications are buggy.  And while I also don't
> believe that complex, multi-threaded programs written in Java are any
> buggier than complex, multi-threaded programs written in other
> languages, it does seem to be true that Java is one of the currently
> popular languages in which to write complex, multi-threaded programs,
> because of its language support for threads and concurrency primitives.  
> These reports were from people that are not programmers, but are field
> IT people, that have bought and/or support software and/or hardware with
> drivers, that are written in Java, and seem to have non-ideal behavior,
> (apparently only) curable by stopping/restarting the application or
> driver, or sometimes requiring a reboot.
>
> The paper explains many traps that lead to complex, multi-threaded
> programs being buggy, and being hard to test.  I have worked with
> parallel machines, applications, and databases for 25 years, and can
> appreciate the succinct expression of the problems explained within the
> paper, and can, from experience, agree with its premises and
> conclusions.  Parallel applications only have been commercial successes
> when the parallelism is tightly constrained to well-controlled patterns
> that could be easily understood.  Threads, especially in "cooperation"
> with languages that use memory pointers, have the potential to get out
> of control, in inexplicable ways.

Although the paper is correct in many ways, I find it fails to
distinguish the core of the problem from the chaff surrounding it, and
thus is used to justify poor language designs.

For example, the amount of interaction may be seen as a spectrum: at
one end is C or Java threads, with complicated memory models, and a
tendency to just barely control things using locks.  At the other end
would be completely isolated processes with no form of IPC.  The later
is considered the worst possible, while the latter is the best
possible (purely sequential).

However, the latter is too weak for many uses.  At a minimum we'd like
some pipes to communicate.  Helps, but it's still too weak.  What if
you have a large amount of data to share, created at startup but
otherwise not modified?  So we add some read only types and ways to
define your own read only types.  A couple of those types need a
process associated with them, so we make sure process handles are
proper objects too.

What have we got now?  It's more on the thread end of the spectrum
than the process end, but it's definitely not a

Re: 2.6, 3.0, and truly independent intepreters

2008-10-24 Thread Rhamphoryncus

On Oct 24, 1:02 pm, Glenn Linderman <[EMAIL PROTECTED]> wrote:
> On approximately 10/24/2008 8:42 AM, came the following characters from
> the keyboard of Andy O'Meara:
>
> > Glenn, great post and points!
>
> Thanks. I need to admit here that while I've got a fair bit of
> professional programming experience, I'm quite new to Python -- I've not
> learned its internals, nor even the full extent of its rich library. So
> I have some questions that are partly about the goals of the
> applications being discussed, partly about how Python is constructed,
> and partly about how the library is constructed. I'm hoping to get a
> better understanding of all of these; perhaps once a better
> understanding is achieved, limitations will be understood, and maybe
> solutions be achievable.
>
> Let me define some speculative Python interpreters; I think the first is
> today's Python:
>
> PyA: Has a GIL. PyA threads can run within a process; but are
> effectively serialized to the places where the GIL is obtained/released.
> Needs the GIL because that solves lots of problems with non-reentrant
> code (an example of non-reentrant code, is code that uses global (C
> global, or C static) variables – note that I'm not talking about Python
> vars declared global... they are only module global). In this model,
> non-reentrant code could include pieces of the interpreter, and/or
> extension modules.
>
> PyB: No GIL. PyB threads acquire/release a lock around each reference to
> a global variable (like "with" feature). Requires massive recoding of
> all code that contains global variables. Reduces performance
> significantly by the increased cost of obtaining and releasing locks.
>
> PyC: No locks. Instead, recoding is done to eliminate global variables
> (interpreter requires a state structure to be passed in). Extension
> modules that use globals are prohibited... this eliminates large
> portions of the library, or requires massive recoding. PyC threads do
> not share data between threads except by explicit interfaces.
>
> PyD: (A hybrid of PyA & PyC). The interpreter is recoded to eliminate
> global variables, and each interpreter instance is provided a state
> structure. There is still a GIL, however, because globals are
> potentially still used by some modules. Code is added to detect use of
> global variables by a module, or some contract is written whereby a
> module can be declared to be reentrant and global-free. PyA threads will
> obtain the GIL as they would today. PyC threads would be available to be
> created. PyC instances refuse to call non-reentrant modules, but also
> need not obtain the GIL... PyC threads would have limited module support
> initially, but over time, most modules can be migrated to be reentrant
> and global-free, so they can be used by PyC instances. Most 3rd-party
> libraries today are starting to care about reentrancy anyway, because of
> the popularity of threads.

PyE: objects are reclassified as shareable or non-shareable, many
types are now only allowed to be shareable.  A module and its classes
become shareable with the use of a __future__ import, and their
shareddict uses a read-write lock for scalability.  Most other
shareable objects are immutable.  Each thread is run in its own
private monitor, and thus protected from the normal threading memory
module nasties.  Alas, this gives you all the semantics, but you still
need scalable garbage collection.. and CPython's refcounting needs the
GIL.


> > Our software runs in real time (so performance is paramount),
> > interacts with other static libraries, depends on worker threads to
> > perform real-time image manipulation, and leverages Windows and Mac OS
> > API concepts and features.  Python's performance hits have generally
> > been a huge challenge with our animators because they often have to go
> > back and massage their python code to improve execution performance.
> > So, in short, there are many reasons why we use python as a part
> > rather than a whole.
[...]
> > As a python language fan an enthusiast, don't let lua win!  (I say
> > this endearingly of course--I have the utmost respect for both
> > communities and I only want to see CPython be an attractive pick when
> > a company is looking to embed a language that won't intrude upon their
> > app's design).

I agree with the problem, and desire to make python fill all niches,
but let's just say I'm more ambitious with my solution. ;)
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-24 Thread Rhamphoryncus

On Oct 24, 2:59 pm, Glenn Linderman <[EMAIL PROTECTED]> wrote:
> On approximately 10/24/2008 1:09 PM, came the following characters from
> the keyboard of Rhamphoryncus:
> > PyE: objects are reclassified as shareable or non-shareable, many
> > types are now only allowed to be shareable.  A module and its classes
> > become shareable with the use of a __future__ import, and their
> > shareddict uses a read-write lock for scalability.  Most other
> > shareable objects are immutable.  Each thread is run in its own
> > private monitor, and thus protected from the normal threading memory
> > module nasties.  Alas, this gives you all the semantics, but you still
> > need scalable garbage collection.. and CPython's refcounting needs the
> > GIL.
>
> Hmm.  So I think your PyE is an instance is an attempt to be more
> explicit about what I said above in PyC: PyC threads do not share data
> between threads except by explicit interfaces.  I consider your
> definitions of shared data types somewhat orthogonal to the types of
> threads, in that both PyA and PyC threads could use these new shared
> data items.

Unlike PyC, there's a *lot* shared by default (classes, modules,
function), but it requires only minimal recoding.  It's as close to
"have your cake and eat it too" as you're gonna get.


> I think/hope that you meant that "many types are now only allowed to be
> non-shareable"?  At least, I think that should be the default; they
> should be within the context of a single, independent interpreter
> instance, so other interpreters don't even know they exist, much less
> how to share them.  If so, then I understand most of the rest of your
> paragraph, and it could be a way of providing shared objects, perhaps.

There aren't multiple interpreters under my model.  You only need
one.  Instead, you create a monitor, and run a thread on it.  A list
is not shareable, so it can only be used within the monitor it's
created within, but the list type object is shareable.

I've no interest in *requiring* a C/C++ extension to communicate
between isolated interpreters.  Without that they're really no better
than processes.
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-24 Thread Rhamphoryncus

On Oct 24, 3:02 pm, Glenn Linderman <[EMAIL PROTECTED]> wrote:
> On approximately 10/23/2008 2:24 PM, came the following characters from the
> keyboard of Rhamphoryncus:
>>
>> On Oct 23, 11:30 am, Glenn Linderman <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> On approximately 10/23/2008 12:24 AM, came the following characters from
>>> the keyboard of Christian Heimes
>>>>
>>>> Andy wrote:
>>>> I'm very - not absolute, but very - sure that Guido and the initial
>>>> designers of Python would have added the GIL anyway. The GIL makes
>>>> Python faster on single core machines and more stable on multi core
>>>> machines.
>
> Actually, the GIL doesn't make Python faster; it is a design decision that
> reduces the overhead of lock acquisition, while still allowing use of global
> variables.
>
> Using finer-grained locks has higher run-time cost; eliminating the use of
> global variables has a higher programmer-time cost, but would actually run
> faster and more concurrently than using a GIL. Especially on a
> multi-core/multi-CPU machine.

Those "globals" include classes, modules, and functions.  You can't
have *any* objects shared.  Your interpreters are entirely isolated,
much like processes (and we all start wondering why you don't use
processes in the first place.)

Or use safethread.  It imposes safe semantics on shared objects, so
you can keep your global classes, modules, and functions.  Still need
garbage collection though, and on CPython that means refcounting and
the GIL.


>> Another peeve I have is his characterization of the observer pattern.
>> The generalized form of the problem exists in both single-threaded
>> sequential programs, in the form of unexpected reentrancy, and message
>> passing, with infinite CPU usage or infinite number of pending
>> messages.
>>
>
> So how do you get reentrancy is a single-threaded sequential program? I
> think only via recursion? Which isn't a serious issue for the observer
> pattern. If you add interrupts, then your program is no longer sequential.

Sorry, I meant recursion.  Why isn't it a serious issue for
single-threaded programs?  Just the fact that it's much easier to
handle when it does happen?


>> Try looking at it on another level: when your CPU wants to read from a
>> bit of memory controlled by another CPU it sends them a message
>> requesting they get it for us.  They send back a message containing
>> that memory.  They also note we have it, in case they want to modify
>> it later.  We also note where we got it, in case we want to modify it
>> (and not wait for them to do modifications for us).
>>
>
> I understand that level... one of my degrees is in EE, and I started college
> wanting to design computers (at about the time the first microprocessor chip
> came along, and they, of course, have now taken over). But I was side-lined
> by the malleability of software, and have mostly practiced software during
> my career.
>
> Anyway, that is the level that Herb Sutter was describing in the Dr Dobbs
> articles I mentioned. And the overhead of doing that at the level of a cache
> line is high, if there is lots of contention for particular memory locations
> between threads running on different cores/CPUs. So to achieve concurrency,
> you must not only limit explicit software locks, but must also avoid memory
> layouts where data needed by different cores/CPUs are in the same cache
> line.

I suspect they'll end up redesigning the caching to use a size and
alignment of 64 bits (or smaller).  Same cache line size, but with
masking.

You still need to minimize contention of course, but that should at
least be more predictable.  Having two unrelated mallocs contend could
suck.


>> Message passing vs shared memory isn't really a yes/no question.  It's
>> about ratios, usage patterns, and tradeoffs.  *All* programs will
>> share data, but in what way?  If it's just the code itself you can
>> move the cache validation into software and simplify the CPU, making
>> it faster.  If the shared data is a lot more than that, and you use it
>> to coordinate accesses, then it'll be faster to have it in hardware.
>>
>
> I agree there are tradeoffs... unfortunately, the hardware architectures
> vary, and the languages don't generally understand the hardware. So then it
> becomes an OS API, which adds the overhead of an OS API call to the cost of
> the synchronization... It could instead be (and in clever applications is) a
> non-portable assembly level function that wraps on OS locking or waiting
> API.

In practice I highly doubt we'll see anythin

Re: 2.6, 3.0, and truly independent intepreters

2008-10-25 Thread Rhamphoryncus

On Oct 25, 12:29 am, greg <[EMAIL PROTECTED]> wrote:
> Rhamphoryncus wrote:
> > A list
> > is not shareable, so it can only be used within the monitor it's
> > created within, but the list type object is shareable.
>
> Type objects contain dicts, which allow arbitrary values
> to be stored in them. What happens if one thread puts
> a private object in there? It becomes visible to other
> threads using the same type object. If it's not safe
> for sharing, bad things happen.
>
> Python's data model is not conducive to making a clear
> distinction between "private" and "shared" objects,
> except at the level of an entire interpreter.

shareable type objects (enabled by a __future__ import) use a
shareddict, which requires all keys and values to themselves be
shareable objects.

Although it's a significant semantic change, in many cases it's easy
to deal with: replace mutable (unshareable) global constants with
immutable ones (ie list -> tuple, set -> frozenset).  If you've got
some global state you move it into a monitor (which doesn't scale, but
that's your design).  The only time this really fails is when you're
deliberately storing arbitrary mutable objects from any thread, and
later inspecting them from any other thread (such as our new ABC
system's cache).  If you want to store an object, but only to give it
back to the original thread, I've got a way to do that.
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-28 Thread Rhamphoryncus

On Oct 26, 6:57 pm, "Andy O'Meara" <[EMAIL PROTECTED]> wrote:
> Grrr... I posted a ton of lengthy replies to you and other recent
> posts here using Google and none of them made it, argh. Poof. There's
> nothing that fires more up more than lost work,  so I'll have to
> revert short and simple answers for the time being.  Argh, damn.
>
> On Oct 25, 1:26 am, greg <[EMAIL PROTECTED]> wrote:
>
>
>
> > Andy O'Meara wrote:
> > > I would definitely agree if there was a context (i.e. environment)
> > > object passed around then perhaps we'd have the best of all worlds.
>
> > Moreover, I think this is probably the *only* way that
> > totally independent interpreters could be realized.
>
> > Converting the whole C API to use this strategy would be
> > a very big project. Also, on the face of it, it seems like
> > it would render all existing C extension code obsolete,
> > although it might be possible to do something clever with
> > macros to create a compatibility layer.
>
> > Another thing to consider is that passing all these extra
> > pointers around everywhere is bound to have some effect
> > on performance.
>
> I'm with you on all counts, so no disagreement there.  On the "passing
> a ptr everywhere" issue, perhaps one idea is that all objects could
> have an additionalfieldthat would point back to their parent context
> (ie. their interpreter).  So the only prototypes that would have to be
> modified to contain the context ptr would be the ones that don't
> inherently operate on objects (e.g. importing a module).

Trying to directly share objects like this is going to create
contention.  The refcounting becomes the sequential portion of
Amdahl's Law.  This is why safethread doesn't scale very well: I share
a massive amount of objects.

An alternative, actually simpler, is to create proxies to your real
object.  The proxy object has a pointer to the real object and the
context containing it.  When you call a method it serializes the
arguments, acquires the target context's GIL (while releasing yours),
and deserializes in the target context.  Once the method returns it
reverses the process.

There's two reasons why this may perform well for you: First,
operations done purely in C may cheat (if so designed).  A copy from
one memory buffer to another memory buffer may be given two proxies as
arguments, but then operate directly on the target objects (ie without
serialization).

Second, if a target context is idle you can enter it (acquiring its
GIL) without any context switch.

Of course that scenario is full of "maybes", which is why I have
little interest in it..

An even better scenario is if your memory buffer's methods are in pure
C and it's a simple object (no pointers).  You can stick the memory
buffer in shared memory and have multiple processes manipulate it from
C.  More "maybes".

An evil trick if you need pointers, but control the allocation, is to
take advantage of the fork model.  Have a master process create a
bunch of blank files (temp files if linux doesn't allow /dev/zero),
mmap them all using MAP_SHARED, then fork and utilize.  The addresses
will be inherited from the master process, so any pointers within them
will be usable across all processes.  If you ever want to return
memory to the system you can close that file, then have all processes
use MAP_SHARED|MAP_FIXED to overwrite it.  Evil, but should be
disturbingly effective, and still doesn't require modifying CPython.
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-28 Thread Rhamphoryncus

On Oct 28, 9:30 am, "Andy O'Meara" <[EMAIL PROTECTED]> wrote:
> On Oct 25, 9:46 am, "M.-A. Lemburg" <[EMAIL PROTECTED]> wrote:
>
> > These discussion pop up every year or so and I think that most of them
> > are not really all that necessary, since the GIL isn't all that bad.
>
> Thing is, if the topic keeps coming up, then that may be an indicator
> that change is truly needed.  Someone much wiser than me once shared
> that a measure of the usefulness and quality of a package (or API) is
> how easily it can be added to an application--of any flavors--without
> the application needing to change.
>
> So in the rising world of idle cores and worker threads, I do see an
> increasing concern over the GIL.  Although I recognize that the debate
> is lengthy, heated, and has strong arguments on both sides, my reading
> on the issue makes me feel like there's a bias for the pro-GIL side
> because of the volume of design and coding work associated with
> considering various alternatives (such as Glenn's "Py*" concepts).
> And I DO respect and appreciate where the pro-GIL people come from:
> who the heck wants to do all that work and recoding so that a tiny
> percent of developers can benefit?  And my best response is that as
> unfortunate as it is, python needs to be more multi-threaded app-
> friendly if we hope to attract the next generation of app developers
> that want to just drop python into their app (and not have to change
> their app around python).  For example, Lua has that property, as
> evidenced by its rapidly growing presence in commercial software
> (Blizzard uses it heavily, for example).
>
>
>
> > Furthermore, there are lots of ways to tune the CPython VM to make
> > it more or less responsive to thread switches via the various sys.set*()
> > functions in the sys module.
>
> > Most computing or I/O intense C extensions, built-in modules and object
> > implementations already release the GIL for you, so it usually doesn't
> > get in the way all that often.
>
> The main issue I take there is that it's often highly useful for C
> modules to make subsequent calls back into the interpreter. I suppose
> the response to that is to call the GIL before reentry, but it just
> seems to be more code and responsibility in scenarios where it's no
> necessary.  Although that code and protocol may come easy to veteran
> CPython developers, let's not forget that an important goal is to
> attract new developers and companies to the scene, where they get
> their thread-independent code up and running using python without any
> unexpected reengineering.  Again, why are companies choosing Lua over
> Python when it comes to an easy and flexible drop-in interpreter?  And
> please take my points here to be exploratory, and not hostile or
> accusatory, in nature.
>
> Andy

Okay, here's the bottom line:
* This is not about the GIL.  This is about *completely* isolated
interpreters; most of the time when we want to remove the GIL we want
a single interpreter with lots of shared data.
* Your use case, although not common, is not extraordinarily rare
either.  It'd be nice to support.
* If CPython had supported it all along we would continue to maintain
it.
* However, since it's not supported today, it's not worth the time
invested, API incompatibility, and general breakage it would imply.
* Although it's far more work than just solving your problem, if I
were to remove the GIL I'd go all the way and allow shared objects.

So there's really only two options here:
* get a short-term bodge that works, like hacking the 3rd party
library to use your shared-memory allocator.  Should be far less work
than hacking all of CPython.
* invest yourself in solving the *entire* problem (GIL removal with
shared python objects).
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-29 Thread Rhamphoryncus

On Oct 29, 7:20 am, Paul Boddie <[EMAIL PROTECTED]> wrote:
> On 28 Okt, 21:03, Rhamphoryncus <[EMAIL PROTECTED]> wrote:
>
> > * get a short-term bodge that works, like hacking the 3rd party
> > library to use your shared-memory allocator.  Should be far less work
> > than hacking all of CPython.
>
> Did anyone come up with a reason why shared memory couldn't be used
> for the purpose described by the inquirer? With the disadvantages of
> serialisation circumvented, that would leave issues of contention, and
> on such matters I have to say that I'm skeptical about solutions which
> try and make concurrent access to CPython objects totally transparent,
> mostly because it appears to be quite a lot of work to get right (as
> POSH illustrates, and as your own safethread work shows), and also
> because systems where contention is spread over a large "surface" (any
> object can potentially be accessed by any process at any time) are
> likely to incur a lot of trouble for the dubious benefit of being
> vague about which objects are actually being shared.

I believe large existing libraries were the reason.  Thus my
suggestion of the evil fork+mmap abuse.
--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

2008-10-30 Thread Rhamphoryncus

On Oct 30, 8:23 pm, "Patrick Stinson" <[EMAIL PROTECTED]>
wrote:
> Speaking of the big picture, is this how it normally works when
> someone says "Here's some code and a problem and I'm willing to pay
> for a solution?" I've never really walked that path with a project of
> this complexity (I guess it's the backwards-compatibility that makes
> it confusing), but is this problem just too complex so we have to keep
> talking and talking on forum after forum? Afraid to fork? I know I am.
> How many people are qualified to tackle Andy's problem? Are all of
> them busy or uninterested? Is the current code in a tight spot where
> it just can't be fixed without really jabbing that FORK in so deep
> that the patch will die when your project does?
>
> Personally I think this problem is super-awesome on the hobbyest's fun
> scale. I'd totally take the time to let my patch do the talking but I
> haven't read enough of the (2.5) code. So, I resort to simply reading
> the newsgroups and python code to better understand the mechanics
> problem :(

The scale of this issue is why so little progress gets made, yes.  I
intend to solve it regardless of getting paid (and have been working
on various aspects for quite a while now), but as you can see from
this thread it's very difficult to convince anybody that my approach
is the *right* approach.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Python 3.0 - is this true?

2008-11-09 Thread Rhamphoryncus

On Nov 8, 10:14 pm, Kay Schluehr <[EMAIL PROTECTED]> wrote:
> I guess building a multiset is a little more expensive than just O(n).
> It is rather like building a dict from a list which is O(k*n) with a
> constant but small factor k. The comparison is of the same order. To
> enable the same behavior as the applied sorted() a multiset must
> permit unhashable elements. dicts don't do the job.

Although it has a runtime of k*n, it is still O(n).  big-O notation
ignores constant factors, dealing only with scalability.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Python 3.0 - is this true?

2008-11-10 Thread Rhamphoryncus

On Nov 10, 8:16 am, Terry Reedy <[EMAIL PROTECTED]> wrote:
> Duncan Grisby wrote:
> > In article <[EMAIL PROTECTED]>,
> >  Terry Reedy  <[EMAIL PROTECTED]> wrote:
>
> >> Have you written any Python code where you really wanted the old,
> >> unpredictable behavior?
>
> > I have an object database written in Python. It, like Python, is
> > dynamically typed. It heavily relies on being able to sort lists where
> > some of the members are None. To some extent, it also sorts lists of
> > other mixed types. It will be very hard to migrate this aspect of it
> > to Python 3.
>
> Very Hard? Several key functions have been suggested on this thread.
> Given that 2.x only sorts most but not all types and that the sort is
> only guaranteed to be consistent within a session, as I remember, I
> suspect you can choose or write something at least as good for your
> purposes.

It's actually worse than that: A different input within a single
session will produce different results.  Sorting relies on
transitivity, and comparing such types doesn't provide it.

You might as well comment out the sort and call it good.  That's what
you really had in 2.x.  It was close enough most of the time to *look*
right, yet in truth it silently failed.  3.0 makes it an explicit
failure.

(There were some situations that worked, but they're exceptional.  You
can still do them now, but you need to be explicit (via a key function
or a special singleton.))
--
http://mail.python.org/mailman/listinfo/python-list

Re: Python 3.0 - is this true?

2008-11-10 Thread Rhamphoryncus

On Nov 10, 6:25 pm, Steven D'Aprano
<[EMAIL PROTECTED]> wrote:
> On Mon, 10 Nov 2008 11:43:59 -0800, Rhamphoryncus wrote:
> > You might as well comment out the sort and call it good.  That's what
> > you really had in 2.x.  It was close enough most of the time to *look*
> > right, yet in truth it silently failed.  3.0 makes it an explicit
> > failure.
>
> I don't doubt that this is correct, but I think the argument that sorting
> in Python 2.x has silent bugs would be much stronger if somebody could
> demonstrate arrays that sort wrongly.
>
> A shiny wooden nickel for the first person to show such an example!
>
> --
> Steven

>>> sorted([2, 1.5, Decimal('1.6'), 2.7, 2])
[1.5, 2.7002, Decimal("1.6"), 2, 2]

Where's my nickel? :P
--
http://mail.python.org/mailman/listinfo/python-list

Re: Python 3.0 - is this true?

2008-11-10 Thread Rhamphoryncus

On Nov 10, 9:31 pm, Rhamphoryncus <[EMAIL PROTECTED]> wrote:
> On Nov 10, 6:25 pm, Steven D'Aprano
>
>
>
> <[EMAIL PROTECTED]> wrote:
> > On Mon, 10 Nov 2008 11:43:59 -0800, Rhamphoryncus wrote:
> > > You might as well comment out the sort and call it good.  That's what
> > > you really had in 2.x.  It was close enough most of the time to *look*
> > > right, yet in truth it silently failed.  3.0 makes it an explicit
> > > failure.
>
> > I don't doubt that this is correct, but I think the argument that sorting
> > in Python 2.x has silent bugs would be much stronger if somebody could
> > demonstrate arrays that sort wrongly.
>
> > A shiny wooden nickel for the first person to show such an example!
>
> > --
> > Steven
> >>> sorted([2, 1.5, Decimal('1.6'), 2.7, 2])
>
> [1.5, 2.7002, Decimal("1.6"), 2, 2]
>
> Where's my nickel? :P

Ahh, I knew I had a copy of the time machine keys burried in that
drawer..

http://mail.python.org/pipermail/python-dev/2005-December/059166.html
--
http://mail.python.org/mailman/listinfo/python-list

Re: threading - race condition?

2008-05-12 Thread Rhamphoryncus

On May 11, 10:16 am, skunkwerk <[EMAIL PROTECTED]> wrote:
> On May 10, 1:31 pm, Dennis Lee Bieber <[EMAIL PROTECTED]> wrote:
>
>
>
> > On Fri, 9 May 2008 08:40:38 -0700 (PDT),skunkwerk<[EMAIL PROTECTED]>
> > declaimed the following in comp.lang.python:
>
> > Coming in late...
>
> > > On May 9, 12:12 am, John Nagle <[EMAIL PROTECTED]> wrote:
> > > >skunkwerkwrote:
> > > > > i've declared a bunch of workerthreads(100) and a queue into which
> > > > > new requests are inserted, like so:
>
> > 
>
> > > > > queue = Queue.Queue(0)
> > > > >  WORKERS=100
> > > > > for i in range(WORKERS):
> > > > >thread = SDBThread(queue)
> > > > >thread.setDaemon(True)
> > > > >thread.start()
>
> > > > > the thread:
>
> > > > > class SimpleDBThread ( threading.Thread ):
> > > > >def __init__ ( self, queue ):
> > > > >self.__queue = queue
>
> > Note: double-leading __ means "name mangling" -- typically only
> > needed when doing multiple layers of inheritance where different parents
> > have similar named items that need to be kept independent; a single _ is
> > the convention for "don't touch me unless you know what you are doing"
>
> > > > >threading.Thread.__init__ ( self )
> > > > >def run ( self ):
> > > > >while 1:
> > > > >item = self.__queue.get()
> > > > >if item!=None:
> > > > >model = domain.get_item(item[0])
> > > > >logger.debug('sdbthread item:'+item[0])
> > > > >title = model['title']
> > > > >scraped = model['scraped']
> > > > >logger.debug("sdbthread title:"+title)
>
> > > > > any suggestions?
> > > > > thanks
>
> > 
>
> > > thanks John, Gabriel,
> > >   here's the 'put' side of the requests:
>
> > > def prepSDBSearch(results):
> > >modelList = [0]
> > >counter=1
> > >for result in results:
> > >data = [result.item, counter, modelList]
> > >queue.put(data)
> > >counter+=1
> > >while modelList[0] < len(results):
> > >print 'waiting...'#wait for them to come home
> > >modelList.pop(0)#now remove '0'
> > >return modelList
>
> > My suggestion, if you really want diagnostic help -- follow the
> > common recommendation of posting the minimal /runable (if erroneous)/
> > code... If "domain.get_item()" is some sort of RDBM access, you might
> > fake it using a pre-loaded dictionary -- anything that allows it to
> > return something when given the key value.
>
> > > responses to your follow ups:
> > > 1)  'item' in thethreadsis a list that corresponds to the 'data'
> > > list in the above function.  it's not global, and the initial values
> > > seem ok, but i'm not sure if every time i pass in data to the queue it
> > > passes in the same memory address or declares a new 'data' list (which
> > > I guess is what I want)
>
> > Rather confusing usage... In your "put" you have a list whose first
> > element is "result.item", but then in the work thread, you refer to the
> > entire list as "item"
>
> > > 3)  the first item in the modelList is a counter that keeps track of
> > > the number ofthreadsfor this call that have completed - is there any
> > > better way of doing this?
>
> > Where? None of your posted code shows either "counter" or modelList
> > being used by thethreads.
>
> > And yes, if you havethreadstrying to update a shared mutable, you
> > have a race condition.
>
> > You also have a problem if you are using "counter" to define where
> > in modelList a thread is supposed to store its results -- as you can not
> > access an element that doesn't already exist...
>
> > a = [0]
> > a[3] = 1#failure, need to create elements 1, 2, 3 first
>
> > Now, if position is irrelevant, and a thread just appends its
> > results to modelList, then you don't need some counter, all you need is
> > to check the length of modelList against the count expected.
>
> > Overall -- even though you are passing things via the queue, the
> > contents being pass via the queue are being treated as if they were
> > global entities (you could make modelList a global, remove it from the
> > queue entries, and have the same net access)...
>
> > IOWs, you have too much coupling between thethreadsand the feed
> > routine...
>
> > As for me... I'd be using a second queue for return values...
>
> > WORKERTHREADS = 100
> > feed = Queue.Queue()
> > result = Queue.Queue()
>
> > def worker():
> > while True:
> > (ID, item) = feed.get() #I leave the queues 
> > globals
> > 
> > #since they perform locking
> > 
> > #internally
> > mod

Re: threading - race condition?

2008-05-12 Thread Rhamphoryncus

On May 12, 1:31 pm, skunkwerk <[EMAIL PROTECTED]> wrote:
> On May 12, 1:40 am, Rhamphoryncus <[EMAIL PROTECTED]> wrote:
>
>
>
> > On May 11, 10:16 am,skunkwerk<[EMAIL PROTECTED]> wrote:
>
> > > On May 10, 1:31 pm, Dennis Lee Bieber <[EMAIL PROTECTED]> wrote:
>
> > > > On Fri, 9 May 2008 08:40:38 -0700 (PDT),skunkwerk<[EMAIL PROTECTED]>
> > > > declaimed the following in comp.lang.python:
>
> > > > Coming in late...
>
> > > > > On May 9, 12:12 am, John Nagle <[EMAIL PROTECTED]> wrote:
> > > > > >skunkwerkwrote:
> > > > > > > i've declared a bunch of workerthreads(100) and a queue into which
> > > > > > > new requests are inserted, like so:
>
> > > > 
>
> > > > > > > queue = Queue.Queue(0)
> > > > > > >  WORKERS=100
> > > > > > > for i in range(WORKERS):
> > > > > > >thread = SDBThread(queue)
> > > > > > >thread.setDaemon(True)
> > > > > > >thread.start()
>
> > > > > > > the thread:
>
> > > > > > > class SimpleDBThread ( threading.Thread ):
> > > > > > >def __init__ ( self, queue ):
> > > > > > >self.__queue = queue
>
> > > > Note: double-leading __ means "name mangling" -- typically only
> > > > needed when doing multiple layers of inheritance where different parents
> > > > have similar named items that need to be kept independent; a single _ is
> > > > the convention for "don't touch me unless you know what you are doing"
>
> > > > > > >threading.Thread.__init__ ( self )
> > > > > > >def run ( self ):
> > > > > > >while 1:
> > > > > > >item = self.__queue.get()
> > > > > > >if item!=None:
> > > > > > >model = domain.get_item(item[0])
> > > > > > >logger.debug('sdbthread item:'+item[0])
> > > > > > >title = model['title']
> > > > > > >scraped = model['scraped']
> > > > > > >logger.debug("sdbthread title:"+title)
>
> > > > > > > any suggestions?
> > > > > > > thanks
>
> > > > 
>
> > > > > thanks John, Gabriel,
> > > > >   here's the 'put' side of the requests:
>
> > > > > def prepSDBSearch(results):
> > > > >modelList = [0]
> > > > >counter=1
> > > > >for result in results:
> > > > >data = [result.item, counter, modelList]
> > > > >queue.put(data)
> > > > >counter+=1
> > > > >while modelList[0] < len(results):
> > > > >print 'waiting...'#wait for them to come home
> > > > >modelList.pop(0)#now remove '0'
> > > > >return modelList
>
> > > > My suggestion, if you really want diagnostic help -- follow the
> > > > common recommendation of posting the minimal /runable (if erroneous)/
> > > > code... If "domain.get_item()" is some sort of RDBM access, you might
> > > > fake it using a pre-loaded dictionary -- anything that allows it to
> > > > return something when given the key value.
>
> > > > > responses to your follow ups:
> > > > > 1)  'item' in thethreadsis a list that corresponds to the 'data'
> > > > > list in the above function.  it's not global, and the initial values
> > > > > seem ok, but i'm not sure if every time i pass in data to the queue it
> > > > > passes in the same memory address or declares a new 'data' list (which
> > > > > I guess is what I want)
>
> > > > Rather confusing usage... In your "put" you have a list whose 
> > > > first
> > > > element is "result.item", but then in the work thread, you refer to the
> > > > entire list as "item"
>
> > > > > 3)  the first item in the modelList is a counter th

Re: waiting on an event blocks all signals

2008-05-18 Thread Rhamphoryncus

On May 18, 9:05 am, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote:
> alan schrieb:
>
> > This ignores CTRL-C on every platform I've tested:
>
> > python -c "import threading; threading.Event().wait()"
> > ^C^C^C^C
>
> > It looks to me like all signals are masked before entering wait(). Can
> > someone familiar with the internals explain and/or justify this
> > behavior? Thanks,
>
> They aren't masked. Read the docs:
>
> # Although Python signal handlers are called asynchronously as far as
> the Python user is concerned, they can only occur between the ``atomic''
> instructions of the Python interpreter. This means that signals arriving
> during long calculations implemented purely in C (such as regular
> expression matches on large bodies of text) may be delayed for an
> arbitrary amount of time.
>
> (http://docs.python.org/lib/module-signal.html)
>
> Which is what is happening here - wait is implemented in C. So the
> arriving signal is stored flagged in the interpreter so that the next
> bytecode would be the signal-handler set - but the interpreter still
> waits for the blocked call.

If a signal arrives while the thread is in a syscall it will usually
interrupt the syscall.  Most code in the interpreter will see the
EINTR error code and check if we have a signal to process, which leads
to raising a KeyboardInterrupt exception.  However, most thread
synchronization functions specified in POSIX are not interrupted by
signals, so the interpreter never has a chance do anything.  Why would
we care anyway, as Python doesn't let you intentionally interrupt a
non-main thread?
--
http://mail.python.org/mailman/listinfo/python-list

Re: Thread killing - I know I know!

2008-05-19 Thread Rhamphoryncus

On May 19, 11:31 am, "Chris Mellon" <[EMAIL PROTECTED]> wrote:
> On Mon, May 19, 2008 at 11:36 AM, Roger Heathcote
> > Fair point, but for sub processes that need to be in close contact with the
> > original app, or very small functions that you'd like 100s or 1000s of it
> > seems like a kludge having to spawn whole new processes build in socket
> > communications and kill via explicit OS calls. I can't see that approach
> > scaling particularly well but I guess there's no choice.
>
> It's not a kludge - the whole reason why killing a thread is undefined
> and a "cardinal sin" is because it's in your own address space and you
> can't guarantee anything about how it left things when you killed it.
> You simply can't have it both ways. If you want to be able to safely
> and sanely kill an uncooperative thread (*especially* third party code
> you don't control) you have to isolate it from your address space, and
> this means a process. This is a fundamental problem with the entire
> idea of shared-state concurrency. For the record, you can't use 1000s
> of threads either - if you really need 1000s of concurrent things
> running you need to re-think your concurrency model and possibly your
> choice of languages and environment.

Strictly speaking, you CAN use 1000s of threads, but it requires some
special effort.

It's not a good idea though, at least for the near future.  If you
want to kill a thread you have to do it cooperatively, and for the
effort required to get that right you might as well make the whole
thing event-driven.


> The messiness of the real world is *why* you should use processes, not
> a reason to avoid them.
>
> > Does anyone think it likely that the threading module (and threading in
> > general) will be improved and augmented with features like timeouts and
> > arbitrary threadicide in the next year or two?  Seems there's little scope
> > for tapping the power of the current generation of multiple cores with the
> > current pythons, tho I appreciate breakneck speed has never been a design
> > objective it looks like multicore is set to be the next generation PC
> > architecture.
>
> They absolutely should not be. Adding an API which can't work in the
> general case and is dangerous even when it can work does not do anyone
> any favors.

I agree, if it's not safe there's no point adding it.

However, if you change the parameters a little bit.. it's quite
possible to redefine many blocking functions as cooperative
cancellation points.  This is the approach I take with python-
safethread, and it should satisfy most purposes.
--
http://mail.python.org/mailman/listinfo/python-list

Re: conventions/requirements for 'is' vs '==', 'not vs '!=', etc

2008-05-20 Thread Rhamphoryncus

On May 20, 12:09 pm, "Terry Reedy" <[EMAIL PROTECTED]> wrote:
> Intern() is gone in 3.0

But not gone far:

>>> import sys
>>> sys.intern

--
http://mail.python.org/mailman/listinfo/python-list

Re: Threads and import

2008-05-29 Thread Rhamphoryncus

On May 28, 1:14 pm, [EMAIL PROTECTED] wrote:
> Hi,
>
> I'm trying to work out some strange (to me) behaviour that I see when
> running a python script in two different ways (I've inherited some
> code that needs to be maintained and integrated with another lump of
> code). The sample script is:
>
> # Sample script, simply create a new thread and run a
> # regular expression match in it.
> import re
>
> import threading
> class TestThread(threading.Thread):
>
> def run(self):
> print('start')
> try:
> re.search('mmm', '')
> except Exception, e:
> print e
> print('finish')
>
> tmpThread = TestThread()
> tmpThread.start()
> tmpThread.join()
> import time
> for i in range(10):
> time.sleep(0.5)
> print i
>
> # end of sample script
>
> Now if I run this using:
>
> $ python ThreadTest.py
>
> then it behaves as expected, ie an output like:
>
> start
> finish
> 0
> 1
> 2
> ...
>
> But if I run it as follows (how the inherited code was started):
>
> $ python -c "import TestThread"
>
> then I just get:
>
> start
>
> I know how to get around the problem but could someone with more
> knowledge of how python works explain why this is the case?

For reasons I haven't figured out, child threads always acquire the
import lock.  Since the main thread is already in an import, and is
waiting for the child thread to finish, this deadlocks.

"python ThreadTest.py" doesn't deadlock because ThreadTest isn't
loaded as a module - it's loaded as a script.  A script doesn't hold
the import lock while it executes.

The solution is to move all the thread spawning and whatnot into a
main() function, use the "if __name__ == '__main__': main()" trick for
when you are a script, and if a module require the caller to do
ThreadTest.main() after importing.
--
http://mail.python.org/mailman/listinfo/python-list

Re: ThreadPoolingMixIn

2008-05-31 Thread Rhamphoryncus

On May 30, 2:40 pm, [EMAIL PROTECTED] wrote:
> Hi, everybody!
>
> I wrote a useful class ThreadPoolingMixIn which can be used to create
> fast thread-based servers. This mix-in works much faster than
> ThreadingMixIn because it doesn't create a new thread on each request.

Do you have any benchmarks demonstrating the performance difference/


> t.setDaemon(True)

Unfortunately, shutdown with daemon threads is fairly buggy. :/
--
http://mail.python.org/mailman/listinfo/python-list

Re: ThreadPoolingMixIn

2008-05-31 Thread Rhamphoryncus

On May 31, 1:40 pm, "Giampaolo Rodola'" <[EMAIL PROTECTED]> wrote:
> On 30 Mag, 22:40, [EMAIL PROTECTED] wrote:
>
>
>
> > Hi, everybody!
>
> > I wrote a useful class ThreadPoolingMixIn which can be used to create
> > fast thread-based servers. This mix-in works much faster than
> > ThreadingMixIn because it doesn't create a new thread on each request.
>
> > Is it worth including in SocketServer.py?
>
> > from __future__ import with_statement
> > from SocketServer import ThreadingMixIn
> > import threading
> > import Queue
> > class ThreadPoolingMixIn(ThreadingMixIn):
> > """Mix-in class to handle requests in a thread
> > pool.
>
> > The pool grows and thrinks depending on
> > load.
>
> > For instance, a threading UDP server class is created as
> > follows:
>
> > class ThreadPoolingUDPServer(ThreadPoolingMixIn, UDPServer):
> > pass
>
> > """
> > __author__ = 'Pavel Uvarov <[EMAIL PROTECTED]>'
>
> > def init_thread_pool(self, min_workers = 5,
> >  max_workers = 100, min_spare_workers = 5):
> > """Initialize thread pool."""
> > self.q = Queue.Queue()
> > self.min_workers = min_workers
> > self.max_workers = max_workers
> > self.min_spare_workers = min_spare_workers
> > self.num_workers = 0
> > self.num_busy_workers = 0
> > self.workers_mutex = threading.Lock()
> > self.start_workers(self.min_workers)
>
> > def start_workers(self, n):
> > """Start n workers."""
> > for i in xrange(n):
> > t = threading.Thread(target = self.worker)
> > t.setDaemon(True)
> > t.start()
>
> > def worker(self):
> > """A function of a working
> > thread.
>
> > It gets a request from queue (blocking if
> > there
> > are no requests) and processes
> > it.
>
> > After processing it checks how many spare
> > workers
> > are there now and if this value is greater
> > than
> > self.min_spare_workers then the worker
> > exits.
> > Otherwise it loops
> > infinitely.
>
> > """
> > with self.workers_mutex:
> > self.num_workers += 1
> > while True:
> > (request, client_address) = self.q.get()
> > with self.workers_mutex:
> > self.num_busy_workers += 1
> > self.process_request_thread(request, client_address)
> > self.q.task_done()
> > with self.workers_mutex:
> > self.num_busy_workers -= 1
> > if self.num_workers - self.num_busy_workers > \
> > self.min_spare_workers:
> > self.num_workers -= 1
> > return
>
> > def process_request(self, request, client_address):
> > """Puts a request into
> > queue.
>
> > If the queue size is too large, it adds extra
> > worker.
>
> > """
> > self.q.put((request, client_address))
> > with self.workers_mutex:
> > if self.q.qsize() > 3 and self.num_workers <
> > self.max_workers:
> > self.start_workers(1)
>
> > def join(self):
> > """Wait for all busy threads"""
> > self.q.join()
>
> This is not the right place to discuss about such a thing.
> Post this same message on python-dev ml or, even better, open a new
> ticket on the bug tracker attaching the patch and, most important, a
> benchmark demonstrating the speed improvement.

It's perfectly acceptable to discuss ideas here before bringing them
up on python-ideas, and then python-dev.
--
http://mail.python.org/mailman/listinfo/python-list

Re: How to kill a thread?

2008-06-06 Thread Rhamphoryncus

On Jun 6, 12:44 pm, The Pythonista <[EMAIL PROTECTED]> wrote:
> It's always been my understanding that you can't forcibly kill a thread
> in Python (at least not in a portable way).  The best you can do is
> politely ask it to die, IIRC.

Inherently, the best you can do in most languages is ask them politely
to die.  Otherwise you'll leave locks and various other datastructures
in an inconvenient state, which is too complex to handle correctly.
The exception is certain functional languages, which aren't capable of
having threads and complex state in the same sense.

However, you could go quite far with a standardized mechanism of
politely asking threads to die.  For instance, all blocking I/O
functions could be made to support it.  This is how cancellation works
in python-safethread.
--
http://mail.python.org/mailman/listinfo/python-list

1 2 >

1 - 100 of 143 matches

Mail list logo