Re: Custom HitCollector with SolrIndexSearcher and caching

Otis Gospodnetic Thu, 17 May 2007 12:40:01 -0700

Hi,

I think I follow what you said here.  Let me check:


It sounds like you are saying that pretty much all getDoc(List|Set)* methods 
would need to be modified to take an additional CompositeHitCollector (CHC) 
parameter, correct?

Then I'd modify the following methods (these are the methods that use anonymous 
HitCollectors and stick docs in some sort or priority queue):
  protected DocSet getDocSetNC(Query query, DocSet filter)
  private DocList getDocListNC(Query query, DocSet filter, …)
  private DocSet getDocListAndSetNC(DocListAndSet out, Query query, DocSet 
filter, ...)

I'd have to:
  - add a new CompositeHitCollector parameter
  - if CHC != null:
      hc = new HitCollector { ... the same anonymous HCs that are there now ...}
      CHC.setComposite(hc);

And when you said "...then the meat and potatoes methods of SolrIndexSearcher 
could take in
your custom written CompositeHitCollector, specify the anonymous inner
HitCollector it needs to use for the case it finds itself in..."  - the "use 
for the case" refers to if/else/else if cases in the above methods, such as if 
sorting is needed, use FieldSortedHitQueue, if not, use ScorePriorityQueue and 
such?


If I understood that correctly, I'll get to work, though I'm still not sure how 
DocSetHitCollector will fit in all of this.

............

But somehow this "add an additional parameter everywhere" doesn't sound right.  
I wish I could write my own WeightedSolrIndexSearcher that extends SolrSearcher 
and call some hook methods from SolrIndexSearcher to hook into caching (both 
get and set).

public class WeightedHitCollector extends TopDocHitCollector { // TDHC from 
Lucene
  public void collect(int docId, float score) {
    // score * weightFromSomewhere
    // stick in PriorityQueue (from super - TDHC)
  }
  public int[] getDocIds() {
    // get them from super.topDocs which returns TopDocs[], from which we can 
get ScoreDoc[] and then docIds
 
}

public class WeightedSolrIndexSearcher extends SolrIndexSearcher {
  public DocList getDocList(Query q, ....) {
    // check the cache
    DocList docList = super.getDocListFromCache(q, ...);
    // not cached, got to search
    if (docList == null) {
      WeightedHitCollector whc = new WeightedHitCollector();
      searcher.search(Query, null, whc);
      int[] docIds = whc.getDocIds();
      // cache

      super.cacheDocList(int[] docids);

     } else {
       return docList;
     }
  }
}

Super-simplified, but I'm wondering if this is realistic and/or better than 
adding the additional CompositeHitCollector param.

Thanks,
Otis


----- Original Message ----
From: Chris Hostetter <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, May 2, 2007 3:14:23 PM
Subject: Re: Custom HitCollector with SolrIndexSearcher and caching


: I feel like I might be missing something, and there is in fact a way to
: use a custom HitCollector and benefit from caching, but I just don't see
: it now.

I can't think of any easy way to do what you describe ... you can always
use the low level IndexSearcher methods with a custom HitCollector that
wraps a DocSetHitCollector and then explicitly cache the DocSet yourself,
but thta doesn't really help you with the DocList ... there definitely
doesn't seem to be an *easy* way to do what you're describing at the
moment, but with a little refactoring methods like getDocListAndSet
*coult* take in some sort of CompositeHitCollector class with an API
like...

   /**
    * a HitCollector whose colelct method will delegate to a specified
    * HitCollector for each match it wants collected
    */
   public abstract class CompositeHitCollector extends HitCollector {
     public setComposed(HitCollector inner);
   }

...then the meat and potatoes methods of SolrIndexSearcher could take in
your custom written CompositeHitCollector, specify the anonymous inner
HitCollector it needs to use for the case it finds itself in, and now
you've got a window into the collection process where you can much with
scores or igore certain matches.

It would be a non trivial change, but it would be possible.




-Hoss

Re: Custom HitCollector with SolrIndexSearcher and caching

Reply via email to