Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-26 Thread Antoine Pitrou
On Mon, 25 Sep 2017 17:42:02 -0700
Nathaniel Smith  wrote:
> On Sat, Sep 23, 2017 at 2:45 AM, Antoine Pitrou  wrote:
> >> As to "running_interpreters()" and "idle_interpreters()", I'm not sure
> >> what the benefit would be.  You can compose either list manually with
> >> a simple comprehension:
> >>
> >> [interp for interp in interpreters.list_all() if interp.is_running()]
> >> [interp for interp in interpreters.list_all() if not 
> >> interp.is_running()]  
> >
> > There is a inherit race condition in doing that, at least if
> > interpreters are running in multiple threads (which I assume is going
> > to be the overly dominant usage model).  That is why I'm proposing all
> > three variants.  
> 
> There's a race condition no matter what the API looks like -- having a
> dedicated running_interpreters() lets you guarantee that the returned
> list describes the set of interpreters that were running at some
> moment in time, but you don't know when that moment was and by the
> time you get the list, it's already out-of-date.

Hmm, you're right of course.

> >> Likewise,
> >> queue.Queue.send() supports blocking, in addition to providing a
> >> put_nowait() method.  
> >
> > queue.Queue.put() never blocks in the usual case (*), which is of an
> > unbounded queue.  Only bounded queues (created with an explicit
> > non-zero max_size parameter) can block in Queue.put().
> >
> > (*) and therefore also never deadlocks :-)  
> 
> Unbounded queues also introduce unbounded latency and memory usage in
> realistic situations.

This doesn't seem to pose much a problem in common use cases, though.
How many Python programs have you seen switch from an unbounded to a
bounded Queue to solve this problem?

Conversely, choosing a buffer size is tricky.  How do you know up front
which amount you need?  Is a fixed buffer size even ok or do you want
it to fluctuate based on the current conditions?

And regardless, my point was that a buffer is desirable.  That send()
may block when the buffer is full doesn't change that it won't block in
the common case.

> There's a reason why sockets
> always have bounded buffers -- it's sometimes painful, but the pain is
> intrinsic to building distributed systems, and unbounded buffers just
> paper over it.

Papering over a problem is sometimes the right answer actually :-)  For
example, most Python programs assume memory is unbounded...

If I'm using a queue or channel to push events to a logging system,
should I really block at every send() call?  Most probably I'd rather
run ahead instead.

> > Also, suddenly an interpreter's ability to exploit CPU time is
> > dependent on another interpreter's ability to consume data in a timely
> > manner (what if the other interpreter is e.g. stuck on some disk I/O?).
> > IMHO it would be better not to have such coupling.  
> 
> A small buffer probably is useful in some cases, yeah -- basically
> enough to smooth out scheduler jitter.

That's not about scheduler jitter, but catering for activities which
occur at inherently different speed or rhythms.  Requiring things run
in lockstep removes a lot of flexibility and makes it harder to exploit
CPU resources fully.

> > I expect more often than expected, in complex systems :-)  For example,
> > you could have a recv() loop that also from time to time send()s some
> > data on another queue, depending on what is received.  But if that
> > send()'s recipient also has the same structure (a recv() loop which
> > send()s from time to time), then it's easy to imagine to two getting in
> > a deadlock.  
> 
> You kind of want to be able to create deadlocks, since the alternative
> is processes that can't coordinate and end up stuck in livelocks or
> with unbounded memory use etc.

I am not advocating we make it *impossible* to create deadlocks; just
saying we should not make them more *likely* than they need to.

> >> I'm not sure I understand your concern here.  Perhaps I used the word
> >> "sharing" too ambiguously?  By "sharing" I mean that the two actors
> >> have read access to something that at least one of them can modify.
> >> If they both only have read-only access then it's effectively the same
> >> as if they are not sharing.  
> >
> > Right.  What I mean is that you *can* share very simple "data" under
> > the form of synchronization primitives.  You may want to synchronize
> > your interpreters even they don't share user-visible memory areas.  The
> > point of synchronization is not only to avoid memory corruption but
> > also to regulate and orchestrate processing amongst multiple workers
> > (for example processes or interpreters).  For example, a semaphore is
> > an easy way to implement "I want no more than N workers to do this
> > thing at the same time" ("this thing" can be something such as disk
> > I/O).  
> 
> It's fairly reasonable to implement a mutex using a CSP-style
> unbuffered channel (send = acquire, receive = release). And the same
> trick turns a channel with a fixed-size b

[Python-Dev] PEP 554 v3 (new interpreters module) - channel type

2017-09-26 Thread francismb
Hi Eric,

>> To make this work, the mutable shared state will be managed by the
>> Python runtime, not by any of the interpreters. Initially we will
>> support only one type of objects for shared state: the channels
>> provided by create_channel(). Channels, in turn, will carefully
>> manage passing objects between interpreters. [0]

Would it make sense to make the default channel type explicit,
something like ``create_channel(bytes)`` ?

Thanks in advance,
--francis

[0] https://www.python.org/dev/peps/pep-0554/


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-26 Thread Walter Dörwald

On 23 Sep 2017, at 3:09, Eric Snow wrote:


[...]

``list_all()``::

   Return a list of all existing interpreters.


See my naming proposal in the previous thread.


Sorry, your previous comment slipped through the cracks.  You 
suggested:


As for the naming, let's make it both unconfusing and explicit?
How about three functions: `all_interpreters()`, 
`running_interpreters()`

and `idle_interpreters()`, for example?

As to "all_interpreters()", I suppose it's the difference between
"interpreters.all_interpreters()" and "interpreters.list_all()".  To
me the latter looks better.


But in most cases when Python returns a container (list/dict/iterator) 
of things, the name of the function/method is the name of the things, 
not the name of the container, i.e. we have sys.modules, dict.keys, 
dict.values etc.. Or if the collection of things itself has a name, it 
is that name, i.e. os.environ, sys.path etc.


Its a little bit unfortunate that the name of the module would be the 
same as the name of the function, but IMHO interpreters() would be 
better than list().



As to "running_interpreters()" and "idle_interpreters()", I'm not sure
what the benefit would be.  You can compose either list manually with
a simple comprehension:

[interp for interp in interpreters.list_all() if 
interp.is_running()]
[interp for interp in interpreters.list_all() if not 
interp.is_running()]


Servus,
   Walter
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Intention to accept PEP 552 soon (deterministic pyc files)

2017-09-26 Thread Guido van Rossum
I've read the current version of PEP 552 over and I think everything looks
good for acceptance. I believe there are no outstanding objections (or they
have been adequately addressed in responses).

Therefore I intend to accept PEP 552 this Friday, unless grave objections
are raised on this mailing list (python-dev).

Congratulations Benjamin. Gotta love those tristate options!

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-26 Thread Nick Coghlan
On 26 September 2017 at 17:04, Antoine Pitrou  wrote:
> On Mon, 25 Sep 2017 17:42:02 -0700 Nathaniel Smith  wrote:
>> Unbounded queues also introduce unbounded latency and memory usage in
>> realistic situations.
>
> This doesn't seem to pose much a problem in common use cases, though.
> How many Python programs have you seen switch from an unbounded to a
> bounded Queue to solve this problem?
>
> Conversely, choosing a buffer size is tricky.  How do you know up front
> which amount you need?  Is a fixed buffer size even ok or do you want
> it to fluctuate based on the current conditions?
>
> And regardless, my point was that a buffer is desirable.  That send()
> may block when the buffer is full doesn't change that it won't block in
> the common case.

It's also the case that unlike Go channels, which were designed from
scratch on the basis of implementing pure CSP, Python has an
established behavioural precedent in the APIs of queue.Queue and
collections.deque: they're unbounded by default, and you have to opt
in to making them bounded.

>> There's a reason why sockets
>> always have bounded buffers -- it's sometimes painful, but the pain is
>> intrinsic to building distributed systems, and unbounded buffers just
>> paper over it.
>
> Papering over a problem is sometimes the right answer actually :-)  For
> example, most Python programs assume memory is unbounded...
>
> If I'm using a queue or channel to push events to a logging system,
> should I really block at every send() call?  Most probably I'd rather
> run ahead instead.

While the article title is clickbaity,
http://www.jtolds.com/writing/2016/03/go-channels-are-bad-and-you-should-feel-bad/
actually has a good discussion of this point. Search for "compose" to
find the relevant section ("Channels don’t compose well with other
concurrency primitives").

The specific problem cited is that only offering unbuffered or
bounded-buffer channels means that every send call becomes a potential
deadlock scenario, as all that needs to happen is for you to be
holding a different synchronisation primitive when the send call
blocks.

>> > Also, suddenly an interpreter's ability to exploit CPU time is
>> > dependent on another interpreter's ability to consume data in a timely
>> > manner (what if the other interpreter is e.g. stuck on some disk I/O?).
>> > IMHO it would be better not to have such coupling.
>>
>> A small buffer probably is useful in some cases, yeah -- basically
>> enough to smooth out scheduler jitter.
>
> That's not about scheduler jitter, but catering for activities which
> occur at inherently different speed or rhythms.  Requiring things run
> in lockstep removes a lot of flexibility and makes it harder to exploit
> CPU resources fully.

The fact that the proposal now allows for M:N sender:receiver
relationships (just as queue.Queue does with threads) makes that
problem worse, since you may now have variability not only on the
message consumption side, but also on the message production side.

Consider this example where you have an event processing thread pool
that we're attempting to isolate from blocking IO by using channels
rather than coroutines.

Desired flow:

1. Listener thread receives external message from socket
2. Listener thread files message for processing on receive channel
3. Listener thread returns to blocking on the receive socket

4. Processing thread picks up message from receive channel
5. Processing thread processes message
6. Processing thread puts reply on the send channel

7. Sending thread picks up message from send channel
8. Sending thread makes a blocking network send call to transmit the message
9. Sending thread returns to blocking on the send channel

When queue.Queue is used to pass the messages between threads, such an
arrangement will be effectively non-blocking as long as the send rate
is greater than or equal to the receive rate. However, the GIL means
it won't exploit all available cores, even if we create multiple
processing threads: you have to switch to multiprocessing for that,
with all the extra overhead that entails.

So I see the essential premise of PEP 554 as being to ask the question
"If each of these threads was running its own *interpreter*, could we
use Sans IO style protocols with interpreter channels to separate
internally "synchronous" processing threads from separate IO threads
operating at system boundaries, without having to make the entire
application pervasively asynchronous?"

If channels are an unbuffered blocking primitive, then we don't get
that benefit: even when there are additional receive messages to be
processed, the processing thread will block until the previous send
has completed. Switching the listener and sender threads over to
asynchronous IO would help with that, but they'd also end up having to
implement their own message buffering to manage the lack of buffering
in the core channel primitive.

By contrast, if the core channels are designed to offer an unbounded
buffer