Walter Prins wrote: > Hello, > > I have a program where I'm overriding the retrieval of items from a list. > As background: The data held by the lists are calculated but then read > potentially many times thereafter, so in order to prevent needless > re-calculating the same value over and over, and to remove > checking/caching code from the calculation logic code, I therefore created > a subclass of list that will automatically calculate the value in a given > slot automatically if not yet calculated. (So differently put, I'm > implemented a kind of list specific caching/memoization with the intent > that it should be transparent to the client code.) > > The way I've implemented this so far was to simply override > list.__getitem__(self, key) to check if the value needs to be calculated > or not and call a calculation method if required, after which the value is > returned as normal. On subsequent calls __getitem__ then directly returns > the value without calculating it again. > > This worked mostly fine, however yesterday I ran into a slightly > unexpected problem when I found that when the list contents is iterated > over and values retrieved that way rather than via [], then __getitem__ is > in fact *not* called on the list to read the item values from the list, > and consequently I get back the "not yet calculated" entries in the list, > without the calculation routine being automatically called as is intended. > > Here's a test application that demonstrates the issue: > > class NotYetCalculated: > pass > > class CalcList(list): > def __init__(self, calcitem): > super(CalcList, self).__init__() > self.calcitem = calcitem > > def __getitem__(self, key): > """Override __getitem__ to call self.calcitem() if needed""" > print "CalcList.__getitem__(): Enter" > value = super(CalcList, self).__getitem__(key) > if value is NotYetCalculated: > print "CalcList.__getitem__(): calculating" > value = self.calcitem(key) > self[key] = value > print "CalcList.__getitem__(): return" > return value > > def calcitem(key): > # Demo: return square of index > return key*key > > > def main(): > # Create a list that calculates its contents via a given > # method/fn onece only > l = CalcList(calcitem) > # Extend with few entries to demonstrate issue: > l.extend([NotYetCalculated, NotYetCalculated, NotYetCalculated, > NotYetCalculated]) > > print "1) Directly getting values from list works as expected: > __getitem__ is called:" > print "Retrieving value [2]:\n", l[2] > print > print "Retrieving value [3]:\n", l[3] > print > print "Retrieving value [2] again (no calculation this time):\n", l[2] > print > > print "Retrieving values via an iterator doesn't work as expected:" > print "(__getitem__ is not called and the code returns " > print " NotYetCalcualted entries without calling __getitem__. How do I > fix this?)" > print "List contents:" > for x in l: print x > > > if __name__ == "__main__": > main() > > To reiterate: > > What should happen: In test 2) above all entries should be automatically > calculated and output should be numbers only. > > What actually happens: In test 2) above the first 2 list entries > corresponding to list indexes 0 and 1 are output as "NotYetCalculated" and > calcitem is not called when required. > > What's the best way to fix this problem? Do I need to maybe override > another method, perhaps provide my own iterator implementation? For that > matter, why doesn't iterating over the list contents fall back to calling > __getitem__?
Probably an optimisation for the common case where retrieval of list items does not involve any calculation. You can override the __iter__() along the lines of def __iter__(self): for i in range(len(self)): return self[i] If the items are calculated from the index as in your example there's also the option to inherit from collections.Sequence instead of list: from collections import Sequence class List(Sequence): def __init__(self, getitem, size): self.getitem = getitem self._cache = [None] * size def __getitem__(self, index): assert not isinstance(index, (slice, tuple)) value = self._cache[index] if value is None: value = self._cache[index] = self.getitem(index) return value def __len__(self): return len(self._cache) if __name__ == "__main__": items = List(lambda x: x*x, 10) print("items[4] =", items[4]) print("items =", list(items)) But first and foremost I'd seriously reinvestigate your caching scheme. Does it really save enough time to make it worthwhile? _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor