Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-10-19 Thread Thomas Moreau
Hello,

I have been working on the concurent.futures module lately and I think this
optimization should be avoided in the context of python Pools.

This is an interesting idea, however its implementation will bring many
complicated issues as it breaks the basic paradigm of a Pool: the tasks are
independent and you don't know which worker is going to run which task.

The function is serialized with each task because of this paradigm. This
ensure that any worker picking the task will be able to perform it
independently from the tasks it has run before, given that it as been
initialized correctly at the beginning. This makes it simple to run each
task.

As the Pool comes with no scheduler, with your idea, you would need a
synchronization step to send the function to all workers before you can
launch your task. But if there is already one worker performing a long
running task, does the Pool wait for it to be done before it sends the
function? If the Pool doesn't wait, how does it ensure that this worker
will be able to get the definition of the function before running it?
Also, the multiprocessing.Pool has some features where a worker can shut
itself down after a given number of tasks or a timeout. How does it ensure
that the new worker will have the definition of the function?
It is unsafe to try such a feature (sending only once an object) anywhere
else than in the initializer which is guaranteed to be run once per worker.

On the other hand, you mentioned an interesting point being that making
globals available in the workers could be made simpler. A possible solution
would be to add a "globals" argument in the Pool which would instanciate
global variables in the workers. I have no specific idea but on the
implementation of such features but it would be safer as it would be an
initialization feature.

Regards,
Thomas Moreau

On Thu, Oct 18, 2018, 22:20 Chris Jerdonek  wrote:

> On Thu, Oct 18, 2018 at 9:11 AM Michael Selik 
> wrote:
> > On Thu, Oct 18, 2018 at 8:35 AM Sean Harrington 
> wrote:
> >> Further, let me pivot on my idea of __qualname__...we can use the `id`
> of `func` as the cache key to address your concern, and store this `id` on
> the `task` tuple (i.e. an integer in-lieu of the `func` previously stored
> there).
> >
> >
> > Possible. Does the Pool keep a reference to the passed function in the
> main process? If not, couldn't the garbage collector free that memory
> location and a new function could replace it? Then it could have the same
> qualname and id in CPython. Edge case, for sure. Worse, it'd be hard to
> reproduce as it'd be dependent on the vagaries of memory allocation.
>
> I'm not following this thread closely, but I just wanted to point out
> that __qualname__ won't necessarily be an attribute of the object if
> the API accepts any callable. (I happen to be following an issue on
> the tracker where this came up.)
>
> --Chris
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/thomas.moreau.2010%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: Memory address vs serial number in reprs

2020-07-19 Thread Thomas Moreau
Dear all,

While it would be nice to have simpler identifiers for objects, it would be
hard to make it work for multiprocessing, as objects in different
interpreter would end up having the same repr. Shared objects (locks) might
also have different serial numbers depending on how many objects have been
created before it is communicated to the child process.

regards
Thomas





Le dim. 19 juil. 2020 à 21:26, Antoine Pitrou  a
écrit :

> On Sun, 19 Jul 2020 18:38:30 +0300
> Serhiy Storchaka  wrote:
> > I have problem with the location of hexadecimal memory address in custom
> > reprs.
> >
> >  
> >
> > vs
> >
> >  
>
> How about putting it in parentheses, to point more clearly that it can
> most of the time be ignored:
>
>   
>
> > I do not propose to use serial numbers for all objects, because it would
> > increase the size of objects and the fixed-size integer can be
> > overflowed for some short-living objects created in mass (like numbers,
> > strings, tuples). But only for some custom objects implemented in
> > Python, for which size and creation time are not critical. I want to
> > start with synchronization objects in threading and multiprocessing
> > which did not have custom reprs, than change reprs of locks and asyncio
> > objects.
> >
> > Is it worth to do?
>
> I would like it if it applied to all objects, but doing it only for
> certain objects will be distracting and confusing (does the serial
> number point to a specific feature? it turns out it doesn't, it's just
> an arbitrary aesthetical choice).
>
> Regards
>
> Antoine.
>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/7ZSD6GHNJPS3LB74RE7OCI5J3AB642EE/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HGQWOQ6JPJ33YKG4UK2NQW2OX3BAPRZU/
Code of Conduct: http://python.org/psf/codeofconduct/