Hi,
I think I follow what you said here. Let me check:
It sounds like you are saying that pretty much all getDoc(List|Set)* methods
would need to be modified to take an additional CompositeHitCollector (CHC)
parameter, correct?
Then I'd modify the following methods (these are the methods that use anonymous
HitCollectors and stick docs in some sort or priority queue):
protected DocSet getDocSetNC(Query query, DocSet filter)
private DocList getDocListNC(Query query, DocSet filter, …)
private DocSet getDocListAndSetNC(DocListAndSet out, Query query, DocSet
filter, ...)
I'd have to:
- add a new CompositeHitCollector parameter
- if CHC != null:
hc = new HitCollector { ... the same anonymous HCs that are there now ...}
CHC.setComposite(hc);
And when you said "...then the meat and potatoes methods of SolrIndexSearcher
could take in
your custom written CompositeHitCollector, specify the anonymous inner
HitCollector it needs to use for the case it finds itself in..." - the "use
for the case" refers to if/else/else if cases in the above methods, such as if
sorting is needed, use FieldSortedHitQueue, if not, use ScorePriorityQueue and
such?
If I understood that correctly, I'll get to work, though I'm still not sure how
DocSetHitCollector will fit in all of this.
............
But somehow this "add an additional parameter everywhere" doesn't sound right.
I wish I could write my own WeightedSolrIndexSearcher that extends SolrSearcher
and call some hook methods from SolrIndexSearcher to hook into caching (both
get and set).
public class WeightedHitCollector extends TopDocHitCollector { // TDHC from
Lucene
public void collect(int docId, float score) {
// score * weightFromSomewhere
// stick in PriorityQueue (from super - TDHC)
}
public int[] getDocIds() {
// get them from super.topDocs which returns TopDocs[], from which we can
get ScoreDoc[] and then docIds
}
public class WeightedSolrIndexSearcher extends SolrIndexSearcher {
public DocList getDocList(Query q, ....) {
// check the cache
DocList docList = super.getDocListFromCache(q, ...);
// not cached, got to search
if (docList == null) {
WeightedHitCollector whc = new WeightedHitCollector();
searcher.search(Query, null, whc);
int[] docIds = whc.getDocIds();
// cache
super.cacheDocList(int[] docids);
} else {
return docList;
}
}
}
Super-simplified, but I'm wondering if this is realistic and/or better than
adding the additional CompositeHitCollector param.
Thanks,
Otis
----- Original Message ----
From: Chris Hostetter <[EMAIL PROTECTED]>
To: [email protected]
Sent: Wednesday, May 2, 2007 3:14:23 PM
Subject: Re: Custom HitCollector with SolrIndexSearcher and caching
: I feel like I might be missing something, and there is in fact a way to
: use a custom HitCollector and benefit from caching, but I just don't see
: it now.
I can't think of any easy way to do what you describe ... you can always
use the low level IndexSearcher methods with a custom HitCollector that
wraps a DocSetHitCollector and then explicitly cache the DocSet yourself,
but thta doesn't really help you with the DocList ... there definitely
doesn't seem to be an *easy* way to do what you're describing at the
moment, but with a little refactoring methods like getDocListAndSet
*coult* take in some sort of CompositeHitCollector class with an API
like...
/**
* a HitCollector whose colelct method will delegate to a specified
* HitCollector for each match it wants collected
*/
public abstract class CompositeHitCollector extends HitCollector {
public setComposed(HitCollector inner);
}
...then the meat and potatoes methods of SolrIndexSearcher could take in
your custom written CompositeHitCollector, specify the anonymous inner
HitCollector it needs to use for the case it finds itself in, and now
you've got a window into the collection process where you can much with
scores or igore certain matches.
It would be a non trivial change, but it would be possible.
-Hoss