Hi,

I'm working on the problem Max described as well. We did try to omit the
norms which lead to the phenomenon that products that have a very extensive
description were more likely to have a higher score since they contained the
word more often. Due to many expands of the SynonymFilter at index-time this
grew especially ugly. But as you already pointed out we should have a deeper
look at how the score is assembled..

Nevertheless the second problem of getting a good mix of shops can be
discussed seperatly. Say we have 5 products per result page and the 10 best
matches for a search have all the same score. 8 of the products are of one
shop (A), and the two others by two other shops (B,C).

What we often get is (letter indicating a product of this shop)
1.    A
2.    A
3.    A
4.    A
5.    A
 ---- second result page ----
6.    A
7.    B
8.    A
9.    C
10.  A 

but what we want to get is s.th. like this:

1.    A
2.    C
3.    B
4.    A
5.    A
 ---- second result page ----
6.    A
7.    A
8.    A
9.    A
10.  A 

As you can imagine there is no uniform distribution of products over shops.
So sorting by a random field does not work out since there are shops with
10s of thousands of products and shops with less than 100 products.

So theoretically I would sort by score and then by a magic factor which gets
greater the less products of this shop (eventually with that same score) are
already in the search result. Alternativly to a second sorting criteria the
score could be diminished with as well I guess...

What really bothers me, is that this requirement seems to need an extra
iteration over the search result which keeps track of the distribution of
products and shops in the search result.

We're really thankful for any hint on howto tackle this problem,
Axel
-- 
View this message in context: 
http://www.nabble.com/Unwanted-clustering-of-search-results-after-sorting-by-score-tp20977761p21477387.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to