[Python-Dev] Re: Adding a "call_once" decorator to functools

2020-04-30 Thread Joao S. O. Bueno
On Thu, 30 Apr 2020 at 00:37, Raymond Hettinger
 wrote:
>
>
>
> > On Apr 29, 2020, at 4:20 PM, Antoine Pitrou  wrote:
> >
> > On Wed, 29 Apr 2020 12:01:24 -0700
> > Raymond Hettinger  wrote:
> >>
> Also, if you know of a real world use case, what solution is currently being 
> used.  I'm not sure what alternative call_once() is competing against.

Of course this is meant to be something simple - so there are no "real
world use cases" that are "wow, it could not have
been done without it". I was one of the first to reply to this on
"python-ideas", as I often need the pattern, but seldon
worrying about rentrancy, or parallel calling. Most of the uses are
just that: initalize a resource lazily, and just
"lru_cache" could work. My first thought was for something more
light-weight than lru_cache (and a friendlier
name).

So, one of the points I'd likely have used this is here:

https://github.com/jsbueno/terminedia/blob/d97976fb11ac54b527db4183497730883ba71515/terminedia/unicode.py#L30

> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/Y2MUKYDCV53PBWRRBU4ZAKB5XED4X4HX/
> Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/DJMIZ7Q3ZD5IKOOB73SEZNFVEAN34RMW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Virtual machine bleeds into generator implementation?

2020-04-30 Thread Skip Montanaro
>
> Thanks for the replies. I will cook up some private API in my cpython
> fork. Whether or not my new vm ever sees the light of day, I think it
> would be worthwhile to consider a proper API (even a _PyEval macro or
> two) for the little dance the two subsystems do.
>

I committed a change to my fork:

https://github.com/smontanaro/cpython/commit/305758a42ec92dcd1d0a181f454af63b5741da5d

This moves direct stack manipulation out of genobject.c into ceval.c and
allows me to work on a non-stack way to deal with these tasks (note all the
calls to Py_FatalError in the CO_REGISTER branches). I am specifically not
holding this up as a proposal for how to do this (I am largely ignorant of
many of the internal or CPython-specific aspects of the C API). Still, the
tests pass and I can start to address those fatal errors.

Skip
.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/U4YAIEGPUOX4X67GC5GWEE3PMHEVCKIR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Adding a "call_once" decorator to functools

2020-04-30 Thread Carl Meyer
On Wed, Apr 29, 2020 at 9:36 PM Raymond Hettinger
 wrote:
> Do you have some concrete examples we could look at?   I'm having trouble 
> visualizing any real use cases and none have been presented so far.

This pattern occurs not infrequently in our Django server codebase at
Instagram. A typical case would be that we need a client object to
make queries to some external service, queries using the client can be
made from various locations in the codebase (and new ones could be
added any time), but there is noticeable overhead to the creation of
the client (e.g. perhaps it does network work at creation to figure
out which remote host can service the needed functionality) and so
having multiple client objects for the same remote service existing in
the same process is waste.

Or another similar case might be creation of a "client" object for
querying a large on-disk data set.

> Presumably, the initialization function would have to take zero arguments,

Right, typically for a globally useful client object there are no
arguments needed, any required configuration is also already available
globally.

> have a useful return value,

Yup, the object which will be used by other code to make network
requests or query the on-disk data set.

> must be called only once,

In our use cases it's more a SHOULD than a MUST. Typically if it were
called two or three times in the process due to some race condition
that would hardly matter. However if it were called anew for every
usage that would be catastrophically inefficient.

> not be idempotent,

Any function like the ones I'm describing can be trivially made
idempotent by initializing a global variable and short-circuit
returning that global if already set. But that's precisely the
boilerplate this utility seeks to replace.

> wouldn't fail if called in two different processes,

Separate processes would each need their own and that's fine.

> can be called from multiple places,

Yes, that's typical for the uses I'm describing.

> and can guarantee that a decref, gc, __del__, or weakref callback would never 
> trigger a reentrant call.

"Guarantee" is too strong, but at least in our codebase use of Python
finalizers is considered poor practice and they are rarely used, and
in any case it would be extraordinarily strange for a finalizer to
make use of an object like this that queries an external resource. So
this is not a practical concern. Similarly it would be very strange
for creation of an instance of a class to call a free function whose
entire purpose is to create and return an instance of that very class,
so reentrancy is also not a practical concern.

> Also, if you know of a real world use case, what solution is currently being 
> used.  I'm not sure what alternative call_once() is competing against.

Currently we typically would use either `lru_cache` or the manual
"cache" using a global variable. I don't think that practically
`call_once` would be a massive improvement over either of those, but
it would be slightly clearer and more discoverable for the use case.

> Do you have any thoughts on what the semantics should be if the inner 
> function raises an exception?  Would a retry be allowed?  Or does call_once() 
> literally mean "can never be called again"?

For the use cases I'm describing, if the method raises an exception
the cache should be left unpopulated and a future call should try
again.

Arguably a better solution for these cases is to push the laziness
internal to the class in question, so it doesn't do expensive or
dangerous work on instantiation but delays it until first use. If that
is done, then a simple module-level instantiation suffices to replace
the `call_once` pattern. Unfortunately in practice we are often
dealing with existing widely-used APIs that weren't designed that way
and would be expensive to refactor, so the pattern continues to be
necessary. (Doing expensive or dangerous work at import time is a
major problem that we must avoid, since it causes every user of the
system to pay that startup cost in time and risk of failure, even if
for their use the object would never be used.)

Carl
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/K73NIHFBXWCM2GUWPVJUNI44TSWASIRD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Adding a "call_once" decorator to functools

2020-04-30 Thread Raymond Hettinger


> On Apr 30, 2020, at 6:32 AM, Joao S. O. Bueno  wrote:
> 
> Of course this is meant to be something simple - so there are no "real
> world use cases" that are "wow, it could not have
> been done without it".

The proposed implementation does something risky, it hold holds a non-reentrant 
lock across a call to an arbitrary user-defined function.  The only reason to 
do so is to absolutely guarantee the function will never be called twice.  We 
really should look for some concrete examples that require that guarantee, and 
it would be nice to see how that guarantee is being implemented currently (it 
isn't obvious to me).

Also, most initialization functions I've encountered take at least one 
argument, so the proposed call_once() implementation wouldn't be usable at all. 

> I was one of the first to reply to this on
> "python-ideas", as I often need the pattern, but seldon
> worrying about rentrancy, or parallel calling. Most of the uses are
> just that: initalize a resource lazily, and just
> "lru_cache" could work. My first thought was for something more
> light-weight than lru_cache (and a friendlier
> name).

Right.  Those cases could be solved trivially if we added:

call_once = lru_cache(maxsize=None)

which is lightweight, very fast, and has a clear name.  Further, it would work 
with multiple arguments and  would not fail if the underlying function turned 
out to be reentrant.

AFAICT, the *only* reason to not use the lru_cache() implementation is that in 
multithreaded code, it can't guarantee that the underlying function doesn't get 
called a second time while still executing the first time. If those are things 
you don't care about, then you don't need the proposed implementation; we can 
give you what you want by adding a single line to functools.

> So, one of the points I'd likely have used this is here:
> 
> https://github.com/jsbueno/terminedia/blob/d97976fb11ac54b527db4183497730883ba71515/terminedia/unicode.py#L30

Thanks — this is a nice example.  Here's what it tells us:

1) There exists at least one use case for a zero argument initialization 
function
2) Your current solution is trivially easy, clear, and fast.   "if CHAR_BASE: 
return".
3) This function returns None, so efforts by call_once() to block and await a 
result are wasted.
4) It would be inconsequential if this function were called twice.
5) A more common way to do this is to move the test into the lookup() function 
-- see below.


Raymond

-

CHAR_BASE = {}

def _init_chars():
for code in range(0, 0x10):
char = chr(code)
values = {}
attrs = "name category east_asian_width"
for attr in attrs.split():
try:
values[attr] = getattr(unicodedata, attr)(char)
except ValueError:
values[attr] = "undefined"
CHAR_BASE[code] = Character(char, code, values["name"], 
values["category"], values["east_asian_width"])

def lookup(name_part, chars_only=False):
if not CHAR_BASE:
  _init_chars()
results = [char for char in CHAR_BASE.values() if re.search(name_part, 
char.name, re.IGNORECASE)]
if not chars_only:
return results
return [char.char for char in results]
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/JZQLF5LXV47SJP6ZSTG27246S6OIYTPM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Adding a "call_once" decorator to functools

2020-04-30 Thread raymond . hettinger
Would either of the existing solutions work for you?

class X:
def __init__(self, name):
self.name = name

@cached_property
def title(self):
  print("compute title once")
  return self.name.title()

@property
@lru_cache
def upper(self):
  print("compute uppper once")
  return self.name.upper()

obj = X("victor")
print(obj.title)
print(obj.title)
print(obj.upper)
print(obj.upper)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4LW5FFI74J6A4FHLUTKWHH3WLWBMXASM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Adding a "call_once" decorator to functools

2020-04-30 Thread Raymond Hettinger



> On Apr 30, 2020, at 10:44 AM, Carl Meyer  wrote:
> 
> On Wed, Apr 29, 2020 at 9:36 PM Raymond Hettinger
>  wrote:
>> Do you have some concrete examples we could look at?   I'm having trouble 
>> visualizing any real use cases and none have been presented so far.
> 
> This pattern occurs not infrequently in our Django server codebase at
> Instagram. A typical case would be that we need a client object to
> make queries to some external service, queries using the client can be
> made from various locations in the codebase (and new ones could be
> added any time), but there is noticeable overhead to the creation of
> the client (e.g. perhaps it does network work at creation to figure
> out which remote host can service the needed functionality) and so
> having multiple client objects for the same remote service existing in
> the same process is waste.
> 
> Or another similar case might be creation of a "client" object for
> querying a large on-disk data set.

Thanks for the concrete example.  AFAICT, it doesn't require (and probably 
shouldn't have) a lock to be held for the duration of the call.  Would it be 
fair to say the 100% of your needs would be met if we just added this to the 
functools module?

  call_once = lru_cache(maxsize=None)

That's discoverable, already works, has no risk of deadlock, would work with 
multiple argument functions, has instrumentation, and has the ability to clear 
or reset.

I'm still looking for an example that actually requires a lock to be held for a 
long duration.


Raymond

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Y3I646QBI7ICASP62ATFBUPROZ2J4TKE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Adding a "call_once" decorator to functools

2020-04-30 Thread Carl Meyer
On Thu, Apr 30, 2020 at 3:12 PM Raymond Hettinger
 wrote:
> Thanks for the concrete example.  AFAICT, it doesn't require (and probably 
> shouldn't have) a lock to be held for the duration of the call.  Would it be 
> fair to say the 100% of your needs would be met if we just added this to the 
> functools module?
>
>   call_once = lru_cache(maxsize=None)
>
> That's discoverable, already works, has no risk of deadlock, would work with 
> multiple argument functions, has instrumentation, and has the ability to 
> clear or reset.

Yep, I think that's fair. We've never AFAIK had a problem with
`lru_cache` races, and if we did, in most cases we'd be fine with
having it called twice.

I can _imagine_ a case where the call loads some massive dataset
directly into memory and we really couldn't afford it being loaded
twice under any circumstance, but even if we have a case like that, we
don't do enough threading for it ever to have been an actual problem
that I'm aware of.

> I'm still looking for an example that actually requires a lock to be held for 
> a long duration.

Don't think I can provide a real-world one from my own experience! Thanks,

Carl
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XLXJFZ4K67RDEI3WUK2FNEKH547C36GK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Adding a "call_once" decorator to functools

2020-04-30 Thread Paul Ganssle
On 4/30/20 4:47 PM, raymond.hettin...@gmail.com wrote:
> Would either of the existing solutions work for you?
>
> class X:
> def __init__(self, name):
> self.name = name
>
> @cached_property
> def title(self):
>   print("compute title once")
>   return self.name.title()
>
> @property
> @lru_cache
> def upper(self):
>   print("compute uppper once")
>   return self.name.upper()

The second one seems a bit dangerous in that it will erroneously keep
objects alive until they are either ejected from the cache or until the
class itself is collected (plus only 128 objects would be in the cache
at one time): https://bugs.python.org/issue19859

> Thanks for the concrete example.  AFAICT, it doesn't require (and probably 
> shouldn't have) a lock to be held for the duration of the call.  Would it be 
> fair to say the 100% of your needs would be met if we just added this to the 
> functools module?
>
>   call_once = lru_cache(maxsize=None)
I am -0 on adding `call_once = lru_cache(maxsize=None)` here. I feel
like it could be misleading in that people might think that it ensures
that the function is called exactly once (it reminds me of the FnOnce
 trait in Rust),
and all it buys us is a nice way to advertise "here's a use case for
lru_cache".

That said, in any of the times I've had one of these "call exactly one
time" situations, the biggest constraint I've had is that I always
wanted the return value to be the same object so that `f(x) is f(x)`,
but I've never had a situation where it was /required/ that the function
be called exactly once, so I rarely if ever have bothered to get that
property.

I suppose I could imagine a situation where calling the function mutates
or consumes an object as part of the call, like:

class LazyList:
    def __init__(self, some_iterator):
    self._iter = some_iterator
    self._list = None

    @call_once
    def as_list(self):
    self._list = list(self._iter)
    return self._list

But I think it's just speculation to imagine anyone needs that or would
find it useful, so I'm in favor of waiting for someone to chime in with
a concrete use case where this property would be valuable.

Best,
Paul



signature.asc
Description: OpenPGP digital signature
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BXOKFCVUNFJZBLMOMUYNLH4K4HFYBSYX/
Code of Conduct: http://python.org/psf/codeofconduct/