From: "Antonio Fiol BonnĂn" <[EMAIL PROTECTED]>
Date: Sat, 25 Feb 2006 11:26:28 +0100
2006/2/23, Andrew Stevens <[EMAIL PROTECTED]>:
> Looking through the Lucene block's search generator recently, it
occurred to
> me that a fair amount of the code in there was redundant - all the stuff
for
> breaking the hits up into pages, and only returning one page full of
actual
> hits, seems to me to be duplicating the FilterTransformer. In fact, in
the
> site I've been working on most recently, we've been using just that
> configuration - set the hits/page count on the generator to -1 (so it
> returns everything) and add the filter transformer into the pipeline
after
> it to do the paging.
I suppose that it's there because 99% of searches are paginated and
very few could be cached... Er... Is the search generator cacheable?
Good question, I've no idea. I don't see why it shouldn't be, though. If
nothing has been updated in the index files (which ought to be determinable
using org.apache.lucene.store.Directory's list() & fileModified(String)
methods, or from the file system timestamps), then searching for the same
query ought to produce the same results. Moreover, depending on how the
pipeline's set up, I'm guessing it ought to be possible to cache all the
hits on the first request and re-use them if the user clicks though to the
subsequent pages? (thus avoiding calling Lucene repeatedly with the same
query)
> So I'm wondering, are there good reasons for the search generator to
include
> the same functionality, or would there be any interest in a patch that
> strips it out? The only possiblity that's occurred to me so far is
perhaps
> it has this in there for performance? The number of hits isn't likely
to be
> an issue in my particular case (less than a few hundred pages in total
on
> this site), but I guess it wouldn't be too good if Cocoon had to stream
(and
> maybe cache) several million hits. On the other hand, I can't imagine
> anyone would ever page through all of those anyway, so perhaps just
having a
> configurable upper limit on the hit count would be sufficient?
I agree on the upper limit, if this reduces memory usage.
WRT removing the included pagination, I am not against doing so for
2.2, but definitely it should not be done for 2.1.X as it would break
backwards compatibility. Wouldn't it?
Absolutely. Actually, I was only thinking of doing this for trunk (2.2),
but neglected to say as much. There's some other tweaks I've been
considering for a 2.1.x patch (e.g. adding sitemap parameters that override
some of the configuration settings), but they're all fully backwards
compatible.
Andrew.