Re: Bulk Delete - Take 3, descriptor style

Robert Wittams Mon, 13 Feb 2006 07:32:18 -0800


Russell Keith-Magee wrote:

I was assuming that cached() would _always_ clone. If the source
QuerySet is non-cached, you get a clone with caching enabled; if the
source is cached, you get a new QuerySet with a clean cache. The use
case I can see is:


p = Article.objects.filter(...) # Original, uncached query
q = p.cached() # Copied, cache version 1
for obj in q: ... # evaluate cache 1
#Add an object that would match q
r = q.cached() # Copy of cached query, cache version 2
for obj in r: ... # evaluate cache 2; new object is in list
for obj in q: ... # Iterate over cache 1; new object not in list

This way, the cache becomes a store of what was in the database when
the query was executed, rather than there being a unique cache for any
given query. More on this later...

Hm, the issue I would have with that is that the execution of the queryis lazy, ie where the .cached() call is does not determine what is beingcached. ( And I think the laziness is more important than the caching ;-)

So people end up doing iter(q) just to fill the cache, it seems a bitstrained : if you want to compare the contents, why not do list(q)? Youknow that will stay the same.


ie.
p = Article.objects.filter(...) # Original, uncached query
q = list(p)
#Add an object that would match q
r = list(p)
for obj in r: ... # new object is in list
for obj in q: ... # new object not in list

I dunno, this seems clearer to me for that particular use case. Don'tcare too much either way, it just seems your model gives .cached() quitea lot of responsibilities. But I think your model is something everyonecan get used to, and can become idiomatic. Also it means less churn whenpeople want to do more than just iterate through a list, eg call acustom method etc. So either way is fine with me.

So without externalising, the choices are:

a) inconsistency with managers and related objects having different
default caching behaviour.
b) the need to have a lot of .cached() being used on related objects,
when that is likely to be what people want in 95% of situations.


I would tend to go with (b), with a chorus of 'fix this in the
documentation'. i.e., make the documentation very clear about the fact
that a cache is available, and might be a good way to optimize
performance in some cases.

Also - 95%? Really? The perfomance hit of non-caching only matters if
you iterate over any given QuerySet more than once per http request.
Maybe I'm being unimaginative, but thinking over my common use cases,
multiple iterations over a QuerySet per http request would be the
exception, rather than the rule (or at the very least, nowhere near
95% of all use cases).


Yes, I was thinking of ' 95% of cases where caching matters at all' ;-)

I agree a lot of the time it doesn't matter. I also agree that b) isprobably preferable to a)...


Also I guess we need to take into account that doing eg

p = list(Article.objects)

would cost more if caching than non caching, and probably covers a lotof cases. ( I hope).

There is also an option (c): make the manager a cached query, and
document the need to use non_cached(). If multiple iterations over a
query set really is the 95% use case, it would make sense to me to
cache across the board.

I think this probably would end up confusing and surprising. Docs canonly go so far ;-)

Externalising caching means that consistent *and* non-ugly behaviour can
be offered for all "query set entry points" - managers and related objects.


Well - for your definition of consistent and non-ugly, anyway :-)


I just meant not having .cached() sprinkled around everywhere, really.

I think we may have different ideas on the cardinality of the
query-cache releationship. Consider:

p = Article.objects.filter(headline='xyz').cached()
q = Article.objects.filter(headline='xyz').cached()

If I am understanding your caching model correctly, you would ideally
like to see p and q using the same cache. This model is almost
impossible to acheive without externalization; the mapping of this
caching model onto the existing framework (where p and q have
different caches unless q is a clone of p) is very much ugly. Ergo,
pro externalization.


Yes, that was the idea.

My problem with this model is that p.reset_cache() would clear q's
cache, too. On top of that, there is the question of when the system
automagically flushes the cache (per http request and per transaction
being two reasonable suggestions).


Yes, in that case I would make reset_cache() a function, eg

reset_cache(q)

I think this conveys that the cache is global.

I would argue that p and q should have independent caches. This way,
once you have iterated over p, you know exactly what is there; if you
iterate over p again, you will get always get exactly the same result,
regardless of what you have done to q. This makes a populated cache a
snapshot of the state of the database at a given time. No need to
worry if someone/something else has changed the database - my snapshot
is always the same.

Yeah, I'm not convinced this is natural with a lazily fetched thingthough. I'm not sure that a huge number of people will internalise the'throwaway iter() to cache' idiom...

It's a slightly different model of caching, but it is a consistent and
easily explainable model, IMHO non-ugly, and doesn't require
externalization to achieve - it is entirely covered by the existing
framework, plus the modifications we have been discussing.

Russ Magee %-)

Yes, I am happy with either model. I do tend to think that the existingmodel will end up with a lot of .cached everywhere, which might bedistracting ( probably a better term than ugly in this case).

Re: Bulk Delete - Take 3, descriptor style

Reply via email to