On 10.05.2019 02:58, Paul Ganssle wrote: > This is only "only" for dateutil in the sense that no one other than > dateutil implements tzinfo with the interface provided. If dateutil were > /not/ already implemented with a list of offsets and their indexes, I > would still propose this, and just re-write dateutil to take advantage > of it. From a cursory glance at pendulum, it seems that they could take > advantage of it as well (though they use their own datetime subclass, so > they have always had the ability to add this). > >> What do you think of adding a private "_cache" attribute which would >> be an arbitrary Python object? (None by default) > > We cannot use a private attribute (other than to do the actual storage, > since the thing that gets stored is not directly accessible anyway and > is instead mediated by a layer that manages the cache) because this is a > feature explicitly being added for use by tzinfo, /not/ by datetime. If > it's private then it's not safe for implementations of tzinfo to > actually use it, which defeats the purpose. > > Regarding the use of an arbitrary Python object: What I'm proposing is > that we offer a bit of the "free" storage space in the alignment bits to > tzinfo objects to use as a cache. In /most/ cases this will be very > useful to someone implementing a tzinfo, because there are really only > so many ways to accomplish this task, and most time zones are > expressible as a very short list of offset/name/dst combinations, plus > some rule for which applies when, which is why a small integer cache is > sufficient and more or less universal (i.e. not specific to dateutil's > implementation). > > I will also note that in my design, it is still possible for `tzinfo` to > return something other than [0, 254], it's just that that information > will not be cached, so it won't get the benefit of any optimization, but > the same interface / implementation can be used. > > In my test with gcc, adding an additional PyObject* to the end of the > PyDateTime_DateTime struct increased the size of the `datetime.datetime` > object from 64 to 72 bytes, whereas adding an `unsigned char` after the > `fold` leaves it unchanged. Given that the expansion to arbitrary Python > objects is speculative and doesn't have any particular use case, I would > prefer to leave the feature as is, and reconsider the possibility of > storing arbitrary Python objects on the datetime if there's some > compelling reason to do so (it would be a backwards-compatible change at > that point anyway).
Given that many datetime objects in practice don't use timezones (e.g. in large data stores you typically use UTC and naive datetime objects), I think that making the object itself larger to accommodate for a cache, which will only be used a smaller percentage of the use cases, isn't warranted. Going from 64 bytes to 72 bytes also sounds like this could have negative effects on cache lines. If you need a per object cache, you can either use weakref objects or maintain a separate dictionary in dateutil or other timezone helpers which indexes objects by id(obj). That said, if you only add a byte field which doesn't make the object larger in practice (you merely use space that alignments would use anyway), this shouldn't be a problem. The use of that field should be documented, though, so that other implementations can use/provide it as well. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 10 2019) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ > On 5/9/19 8:14 PM, Victor Stinner wrote: >> Hi Paul, >> >> The change is basically an optimization. I'm uncomfortable to design >> it "only" for dateutil. What if tomorrow someone has to store an >> arbitrary Python object, rather than just an integer (in range [0; >> 254]), into a datetime for a different optimization? >> >> Moreover, I dislike adding a *public* method for an *internal* cache. >> >> Right now, it is not possible to create a weak reference to a >> datetime. If we make it possible, it would be possible to have an >> external cache implemented with weakref.WeakSet to clear old entries >> when a datetime object is detroyed. >> >> What do you think of adding a private "_cache" attribute which would >> be an arbitrary Python object? (None by default) >> >> Victor >> >> Le mar. 7 mai 2019 à 21:46, Paul Ganssle <p...@ganssle.io> a écrit : >>> Greetings all, >>> >>> I have one last feature request that I'd like added to datetime for Python >>> 3.8, and this one I think could use some more discussion, the addition of a >>> "time zone index cache" to the datetime object. The rationale is laid out >>> in detail in bpo-35723. The general problem is that currently, every >>> invocation of utcoffset, tzname and dst needs to do full, independent >>> calculations of the time zone offsets, even for time zones where the >>> mapping is guaranteed to be stable because datetimes are immutable. I have >>> a proof of concept implementation: PR #11529. >>> >>> I'm envisioning that the `datetime` class will add a private `_tzidx` >>> single-byte member (it seems that this does not increase the size of the >>> datetime object, because it's just using an unused alignment byte). >>> `datetime` will also add a `tzidx()` method, which will return `_tzidx` if >>> it's been set and otherwise it will call `self.tzinfo.tzidx()`. If >>> `self.tzinfo.tzidx()` returns a number between 0 and 254 (inclusive), it >>> sets `_tzidx` to this value. tzidx() then returns whatever >>> self.tzinfo.tzidx() returned. >>> >>> The value of this is that as far as I can tell, nearly all non-trivial >>> tzinfo implementations construct a list of possible offsets, and implement >>> utcoffset(), tzname() and dst() by calculating an index into that list and >>> returning it. There are almost always less than 255 distinct offsets. By >>> adding this cache on the datetime, we're using a small amount of >>> currently-unused memory to prevent unnecessary calculations about a given >>> datetime. The feature is entirely opt-in, and has no downsides if it goes >>> unused, and it makes it possible to write tzinfo implementations that are >>> both lazy and as fast as the "eager calculation" mode that pytz uses (and >>> that causes many problems for pytz's users). >>> >>> I have explored the idea of using an lru cache of some sort on the tzinfo >>> object itself, but there are two problems with this: >>> >>> 1. Calculating the hash of a datetime calls .utcoffset(), which means that >>> it is necessary to, at minimum, do a `replace` on the datetime (and >>> constructing a new datetime is a pretty considerable speed hit) >>> >>> 2. It will be a much bigger memory cost, since my current proposal uses >>> approximately zero additional memory (not sure if the alignment stuff is >>> platform-dependent or something, but it doesn't use additional memory on my >>> linux computer). >>> >>> I realize this proposal is somewhat difficult to wrap your head around, so >>> if anyone would like to chat with me about it in person, I'll be at PyCon >>> sprints until Thursday morning. >>> >>> Best, >>> Paul >>> >>> _______________________________________________ >>> Python-Dev mailing list >>> Python-Dev@python.org >>> https://mail.python.org/mailman/listinfo/python-dev >>> Unsubscribe: >>> https://mail.python.org/mailman/options/python-dev/vstinner%40redhat.com >> > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/mal%40egenix.com > _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com