Axel,

Others may have better ideas, but the simplest idea that occurs to me right now 
is to really just go over the search results and resort them the way you 
described.  However, I don't think this is as scary as it sounds.  You don't 
really have to go through the whole result set - you only need to do this for 
the N hits you are displaying (10 in your example).  All of the data you need 
to access will already be in memory and cached, so this should be cheap, quick, 
and easy.  The magic factor that's inversely proportional to the number of 
products in a shop could be stored in a separate field at index time.

This should be doable with a function query, too.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Axel Tetzlaff <axel.tetzl...@freiheit.com>
> To: solr-user@lucene.apache.org
> Sent: Thursday, January 15, 2009 8:15:29 AM
> Subject: Re: Unwanted clustering of search results after sorting by score
> 
> 
> Hi,
> 
> I'm working on the problem Max described as well. We did try to omit the
> norms which lead to the phenomenon that products that have a very extensive
> description were more likely to have a higher score since they contained the
> word more often. Due to many expands of the SynonymFilter at index-time this
> grew especially ugly. But as you already pointed out we should have a deeper
> look at how the score is assembled..
> 
> Nevertheless the second problem of getting a good mix of shops can be
> discussed seperatly. Say we have 5 products per result page and the 10 best
> matches for a search have all the same score. 8 of the products are of one
> shop (A), and the two others by two other shops (B,C).
> 
> What we often get is (letter indicating a product of this shop)
> 1.    A
> 2.    A
> 3.    A
> 4.    A
> 5.    A
> ---- second result page ----
> 6.    A
> 7.    B
> 8.    A
> 9.    C
> 10.  A 
> 
> but what we want to get is s.th. like this:
> 
> 1.    A
> 2.    C
> 3.    B
> 4.    A
> 5.    A
> ---- second result page ----
> 6.    A
> 7.    A
> 8.    A
> 9.    A
> 10.  A 
> 
> As you can imagine there is no uniform distribution of products over shops.
> So sorting by a random field does not work out since there are shops with
> 10s of thousands of products and shops with less than 100 products.
> 
> So theoretically I would sort by score and then by a magic factor which gets
> greater the less products of this shop (eventually with that same score) are
> already in the search result. Alternativly to a second sorting criteria the
> score could be diminished with as well I guess...
> 
> What really bothers me, is that this requirement seems to need an extra
> iteration over the search result which keeps track of the distribution of
> products and shops in the search result.
> 
> We're really thankful for any hint on howto tackle this problem,
> Axel
> -- 
> View this message in context: 
> http://www.nabble.com/Unwanted-clustering-of-search-results-after-sorting-by-score-tp20977761p21477387.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to