Our current index contains nearly 400k documents and will grow to a few 
millions. Our "more like this"-search is always based on a single document, so 
my query is "id:some_doc_id". For such a query I usually get at least 150k 
"similar" documents. This definition of "similar" is way so relaxed. Usually 
only a few hundred or thousand documents near the reference document are really 
of any interest to our users.

Now assume that I get some facet values, which appear very often in the similar 
documents starting at position 50k, but usually not near the reference 
document. This facet will show currently show up in my facet results. If I use 
this facet value for filtering, I restrict to result to documents which are not 
of any interest to the user.

We want to provide facets, which allow the user to explore and trill down the 
documents in the near neighborhood of our reference document.

If I'm on the complete wrong track, please let me know. I'm open for any 
suggestions. Is it possible, that just our definition of "similar" does not 
match Solrs model? I would also be willing to dig into code and to implement a 
custom similarity. But currently it feels like I don't get the base concepts 
right!? Any hint and guidance would be very welcome.

kind regards,
Achim 


Am 21.05.2013 um 15:27 schrieb Jack Krupansky:

> Any particular reason you would want to limit the documents for facet 
> calculation? I mean, the whole point of the facet numbers is to let users 
> know what's out there. You must have some other rationale in mind - what is 
> it?
> 
> -- Jack Krupansky
> 
> -----Original Message----- From: Achim Domma
> Sent: Tuesday, May 21, 2013 5:47 AM
> To: solr-user@lucene.apache.org
> Subject: MoreLikeThisHandler + Facets
> 
> Im calling the MoreLikeThisHandler with a query like "id:some_doc_id", so I 
> would like to get documents which are similar to one specific document. I 
> restrict the result to 25 rows and I calculate facets for some fields.
> 
> On what data are those facets calculated? According to the documentation out 
> of the similar documents, which is the main difference to the default search 
> handler. But on how many of them? Is it possible to restrict the documents 
> somehow? I would like my facets to be calculated based only on the top 1000 
> most similar documents.
> 
> kind regards,
> Achim= 

Reply via email to