Re: [Python-Dev] Adding a tzidx cache to datetime

M.-A. Lemburg Fri, 10 May 2019 00:26:47 -0700

On 10.05.2019 02:58, Paul Ganssle wrote:
> This is only "only" for dateutil in the sense that no one other than
> dateutil implements tzinfo with the interface provided. If dateutil were
> /not/ already implemented with a list of offsets and their indexes, I
> would still propose this, and just re-write dateutil to take advantage
> of it. From a cursory glance at pendulum, it seems that they could take
> advantage of it as well (though they use their own datetime subclass, so
> they have always had the ability to add this).
> 
>> What do you think of adding a private "_cache" attribute which would
>> be an arbitrary Python object? (None by default)
> 
> We cannot use a private attribute (other than to do the actual storage,
> since the thing that gets stored is not directly accessible anyway and
> is instead mediated by a layer that manages the cache) because this is a
> feature explicitly being added for use by tzinfo, /not/ by datetime. If
> it's private then it's not safe for implementations of tzinfo to
> actually use it, which defeats the purpose.
> 
> Regarding the use of an arbitrary Python object: What I'm proposing is
> that we offer a bit of the "free" storage space in the alignment bits to
> tzinfo objects to use as a cache. In /most/ cases this will be very
> useful to someone implementing a tzinfo, because there are really only
> so many ways to accomplish this task, and most time zones are
> expressible as a very short list of offset/name/dst combinations, plus
> some rule for which applies when, which is why a small integer cache is
> sufficient and more or less universal (i.e. not specific to dateutil's
> implementation).
> 
> I will also note that in my design, it is still possible for `tzinfo` to
> return something other than [0, 254], it's just that that information
> will not be cached, so it won't get the benefit of any optimization, but
> the same interface / implementation can be used.
> 
> In my test with gcc, adding an additional PyObject* to the end of the
> PyDateTime_DateTime struct increased the size of the `datetime.datetime`
> object from 64 to 72 bytes, whereas adding an `unsigned char` after the
> `fold` leaves it unchanged. Given that the expansion to arbitrary Python
> objects is speculative and doesn't have any particular use case, I would
> prefer to leave the feature as is, and reconsider the possibility of
> storing arbitrary Python objects on the datetime if there's some
> compelling reason to do so (it would be a backwards-compatible change at
> that point anyway).


Given that many datetime objects in practice don't use timezones
(e.g. in large data stores you typically use UTC and naive datetime
objects), I think that making the object itself larger to accommodate
for a cache, which will only be used a smaller percentage of the use
cases, isn't warranted. Going from 64 bytes to 72 bytes also sounds
like this could have negative effects on cache lines.

If you need a per object cache, you can either use weakref
objects or maintain a separate dictionary in dateutil or other
timezone helpers which indexes objects by id(obj).

That said, if you only add a byte field which doesn't make the object
larger in practice (you merely use space that alignments would
use anyway), this shouldn't be a problem. The use of that field
should be documented, though, so that other implementations can
use/provide it as well.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, May 10 2019)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> Python Database Interfaces ...           http://products.egenix.com/
>>> Plone/Zope Database Interfaces ...           http://zope.egenix.com/
________________________________________________________________________

::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/
                      http://www.malemburg.com/


> On 5/9/19 8:14 PM, Victor Stinner wrote:
>> Hi Paul,
>>
>> The change is basically an optimization. I'm uncomfortable to design
>> it "only" for dateutil. What if tomorrow someone has to store an
>> arbitrary Python object, rather than just an integer (in range [0;
>> 254]), into a datetime for a different optimization?
>>
>> Moreover, I dislike adding a *public* method for an *internal* cache.
>>
>> Right now, it is not possible to create a weak reference to a
>> datetime. If we make it possible, it would be possible to have an
>> external cache implemented with weakref.WeakSet to clear old entries
>> when a datetime object is detroyed.
>>
>> What do you think of adding a private "_cache" attribute which would
>> be an arbitrary Python object? (None by default)
>>
>> Victor
>>
>> Le mar. 7 mai 2019 à 21:46, Paul Ganssle <p...@ganssle.io> a écrit :
>>> Greetings all,
>>>
>>> I have one last feature request that I'd like added to datetime for Python 
>>> 3.8, and this one I think could use some more discussion, the addition of a 
>>> "time zone index cache" to the datetime object. The rationale is laid out 
>>> in detail in bpo-35723. The general problem is that currently, every 
>>> invocation of utcoffset, tzname and dst needs to do full, independent 
>>> calculations of the time zone offsets, even for time zones where the 
>>> mapping is guaranteed to be stable because datetimes are immutable. I have 
>>> a proof of concept implementation: PR #11529.
>>>
>>> I'm envisioning that the `datetime` class will add a private `_tzidx` 
>>> single-byte member (it seems that this does not increase the size of the 
>>> datetime object, because it's just using an unused alignment byte). 
>>> `datetime` will also add a `tzidx()` method, which will return `_tzidx` if 
>>> it's been set and otherwise it will call `self.tzinfo.tzidx()`.  If 
>>> `self.tzinfo.tzidx()` returns a number between 0 and 254 (inclusive), it 
>>> sets `_tzidx` to this value. tzidx() then returns whatever 
>>> self.tzinfo.tzidx() returned.
>>>
>>> The value of this is that as far as I can tell, nearly all non-trivial 
>>> tzinfo implementations construct a list of possible offsets, and implement 
>>> utcoffset(), tzname() and dst() by calculating an index into that list and 
>>> returning it. There are almost always less than 255 distinct offsets. By 
>>> adding this cache on the datetime, we're using a small amount of 
>>> currently-unused memory to prevent unnecessary calculations about a given 
>>> datetime. The feature is entirely opt-in, and has no downsides if it goes 
>>> unused, and it makes it possible to write tzinfo implementations that are 
>>> both lazy and as fast as the "eager calculation" mode that pytz uses (and 
>>> that causes many problems for pytz's users).
>>>
>>> I have explored the idea of using an lru cache of some sort on the tzinfo 
>>> object itself, but there are two problems with this:
>>>
>>> 1. Calculating the hash of a datetime calls .utcoffset(), which means that 
>>> it is necessary to, at minimum, do a `replace` on the datetime (and 
>>> constructing a new datetime is a pretty considerable speed hit)
>>>
>>> 2. It will be a much bigger memory cost, since my current proposal uses 
>>> approximately zero additional memory (not sure if the alignment stuff is 
>>> platform-dependent or something, but it doesn't use additional memory on my 
>>> linux computer).
>>>
>>> I realize this proposal is somewhat difficult to wrap your head around, so 
>>> if anyone would like to chat with me about it in person, I'll be at PyCon 
>>> sprints until Thursday morning.
>>>
>>> Best,
>>> Paul
>>>
>>> _______________________________________________
>>> Python-Dev mailing list
>>> Python-Dev@python.org
>>> https://mail.python.org/mailman/listinfo/python-dev
>>> Unsubscribe: 
>>> https://mail.python.org/mailman/options/python-dev/vstinner%40redhat.com
>>
> 
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/mal%40egenix.com
> 

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adding a tzidx cache to datetime

Reply via email to