Hi Lan,

I figured out how to do this in  a kludgey way on the client side but it seems 
this could be implemented much more efficiently at the Solr/Lucene level.  I 
described my kludge and posted a question about this to the dev list, but so 
far have not received any replies 
(http://lucene.472066.n3.nabble.com/Solr-should-provide-an-option-to-show-only-most-relevant-facet-values-tc3374285.html).
  I also found Solr-385, but I don't understand how grouping solves the 
problem. It looks like a much different issue to me.

The problem I am trying to solve is that I only have room in the interface to 
show 30 facet values at the most and whether these are ordered by facet counts 
against the entire result set or by the highest ranking score of a member of a 
facet-value group, the problem is that we want to base the facet counts/ranking 
on only the top N hits rather than the entire result set.  In my use case the 
top 10,000 hits versus all 170,000.

Tom

-----Original Message-----
From: Lan [mailto:dung....@gmail.com] 
Sent: Thursday, September 29, 2011 7:40 PM
To: solr-user@lucene.apache.org
Subject: Re: Getting facet counts for 10,000 most relevant hits

I implemented a similar feature for a categorization suggestion service. I
did the faceting in the client code, which is not exactly the best
performing but it worked very well.

It would be nice to have the Solr server do the faceting for performance.


Burton-West, Tom wrote:
> 
> If relevance ranking is working well, in theory it doesn't matter how many
> hits you get as long as the best results show up in the first page of
> results.  However, the default in choosing which facet values to show is
> to show the facets with the highest count in the entire result set.  Is
> there a way to issue some kind of a filter query or facet query that would
> show only the facet counts for the 10,000 most relevant search results?
> 
> As an example, if you search in our full-text collection for "jaguar" you
> get 170,000 hits.  If I am looking for the car rather than the OS or the
> animal, I might expect to be able to click on a facet and limit my results
> to the car.  However, facets containing the word car or automobile are not
> in the top 5 facets that we show.  If you click on "more"  you will see
> "automobile periodicals" but not the rest of the facets containing the
> word automobile .  This occurs because the facet counts are for all
> 170,000 hits.  The facet counts  for at least 160,000 irrelevant hits are
> included (assuming only the top 10,000 hits are relevant) .
> 
> What we would like to do is get the facet counts for the N most relevant
> documents and select the 5 or 30 facet values with the highest counts for
> those relevant documents.
> 
> Is this possible or would it require writing some lucene or Solr code?
> 
> Tom Burton-West
> http://www.hathitrust.org/blogs/large-scale-search
> 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-facet-counts-for-10-000-most-relevant-hits-tp3363459p3380852.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to