From: "Antonio Fiol BonnĂ­n" <[EMAIL PROTECTED]>
Date: Sat, 25 Feb 2006 11:26:28 +0100

2006/2/23, Andrew Stevens <[EMAIL PROTECTED]>:
> Looking through the Lucene block's search generator recently, it occurred to > me that a fair amount of the code in there was redundant - all the stuff for > breaking the hits up into pages, and only returning one page full of actual > hits, seems to me to be duplicating the FilterTransformer. In fact, in the
> site I've been working on most recently, we've been using just that
> configuration - set the hits/page count on the generator to -1 (so it
> returns everything) and add the filter transformer into the pipeline after
> it to do the paging.

I suppose that it's there because 99% of searches are paginated and
very few could be cached... Er... Is the search generator cacheable?

Good question, I've no idea. I don't see why it shouldn't be, though. If nothing has been updated in the index files (which ought to be determinable using org.apache.lucene.store.Directory's list() & fileModified(String) methods, or from the file system timestamps), then searching for the same query ought to produce the same results. Moreover, depending on how the pipeline's set up, I'm guessing it ought to be possible to cache all the hits on the first request and re-use them if the user clicks though to the subsequent pages? (thus avoiding calling Lucene repeatedly with the same query)

> So I'm wondering, are there good reasons for the search generator to include
> the same functionality, or would there be any interest in a patch that
> strips it out? The only possiblity that's occurred to me so far is perhaps > it has this in there for performance? The number of hits isn't likely to be > an issue in my particular case (less than a few hundred pages in total on > this site), but I guess it wouldn't be too good if Cocoon had to stream (and
> maybe cache) several million hits.  On the other hand, I can't imagine
> anyone would ever page through all of those anyway, so perhaps just having a
> configurable upper limit on the hit count would be sufficient?

I agree on the upper limit, if this reduces memory usage.

WRT removing the included pagination, I am not against doing so for
2.2, but definitely it should not be done for 2.1.X as it would break
backwards compatibility. Wouldn't it?

Absolutely. Actually, I was only thinking of doing this for trunk (2.2), but neglected to say as much. There's some other tweaks I've been considering for a 2.1.x patch (e.g. adding sitemap parameters that override some of the configuration settings), but they're all fully backwards compatible.


Andrew.


Reply via email to