The other option is to use various MLT params to return fewer similar documents to begin with.
Otis Solr & ElasticSearch Support http://sematext.com/ On May 21, 2013 7:00 PM, "Jack Krupansky" <j...@basetechnology.com> wrote: > I think I follow. AFAIK, Solr does not have a provision for limiting > faceting to the "top n" documents, but that does see like a reasonable > feature request. At the Lucene I presume it would simply be a matter of > having a hit collector that only accepts the top n documents. But, I'm not > familiar enough with the internal details of the Solr faceting code. > > -- Jack Krupansky > > -----Original Message----- From: Achim Domma > Sent: Tuesday, May 21, 2013 6:39 PM > To: solr-user@lucene.apache.org > Subject: Re: MoreLikeThisHandler + Facets > > Our current index contains nearly 400k documents and will grow to a few > millions. Our "more like this"-search is always based on a single document, > so my query is "id:some_doc_id". For such a query I usually get at least > 150k "similar" documents. This definition of "similar" is way so relaxed. > Usually only a few hundred or thousand documents near the reference > document are really of any interest to our users. > > Now assume that I get some facet values, which appear very often in the > similar documents starting at position 50k, but usually not near the > reference document. This facet will show currently show up in my facet > results. If I use this facet value for filtering, I restrict to result to > documents which are not of any interest to the user. > > We want to provide facets, which allow the user to explore and trill down > the documents in the near neighborhood of our reference document. > > If I'm on the complete wrong track, please let me know. I'm open for any > suggestions. Is it possible, that just our definition of "similar" does not > match Solrs model? I would also be willing to dig into code and to > implement a custom similarity. But currently it feels like I don't get the > base concepts right!? Any hint and guidance would be very welcome. > > kind regards, > Achim > > > Am 21.05.2013 um 15:27 schrieb Jack Krupansky: > > Any particular reason you would want to limit the documents for facet >> calculation? I mean, the whole point of the facet numbers is to let users >> know what's out there. You must have some other rationale in mind - what is >> it? >> >> -- Jack Krupansky >> >> -----Original Message----- From: Achim Domma >> Sent: Tuesday, May 21, 2013 5:47 AM >> To: solr-user@lucene.apache.org >> Subject: MoreLikeThisHandler + Facets >> >> Im calling the MoreLikeThisHandler with a query like "id:some_doc_id", so >> I would like to get documents which are similar to one specific document. I >> restrict the result to 25 rows and I calculate facets for some fields. >> >> On what data are those facets calculated? According to the documentation >> out of the similar documents, which is the main difference to the default >> search handler. But on how many of them? Is it possible to restrict the >> documents somehow? I would like my facets to be calculated based only on >> the top 1000 most similar documents. >> >> kind regards, >> Achim= >> > >