: Max - field collapsing may be your friend - https://issues.apache.org/jira/browse/SOLR-236
that doesn't really seem related ... i don't believe Max wants to see all results from a store "collapsed" into on result, i think he wants to see results from differnet stores treated "more fairly" and to eliminate the clustering effect he's seeing where differnet products from the same store tend to have similar scores because of the way the store provides the data (and not because of any inherent relevancy of hte proudcts) Max: to really diagnose something like this, you have to consider all the details about what exactly your queries look like and spend a lot of time looking at score explanations to really get a sense for the "trend" of why certain stores score higher then others. off the cuff, the only thing i can comment on is this specific example you made... : > Shop 'foo' describes its products with 250 words and uses the searched : > word once. Shop 'bar' describes its products with only 25 words and also : > uses the searched word once. The score for shop 'foo' will be much worst : > than for shop 'bar'. In a search in which are many products of shop : > 'foo' and 'bar' the products of shop 'bar' are shown before the products : > of shop 'foo'. depending on how you look at it, 'foo' is spamming you with excess keywords and bar deserves to get higher scores. eliminating "tf" probably isn't wise, but you might want to consider omiting norms, so the length of hte field doesn't factor in ... or you might want to try customizing your lengthNorm function (requires writing a SImilarity class) to make it flatter for 25-250 terms, but have a sharp spike if they go above 250 (if you consider 250 the threshold for a product description before you decide it's "spam"). You could also consider adding a numeric "shop_fudge_factor" field that you populate with a number indicating the average number of terms in product descriptions from that shop (you'd have to compute this yourself and add it to every document) and then use that as part of a FunctionQuery to fudge the scores for stores that are long winded a little higher. I would never do that personally though (it encourages keyword spamming in product descriptions) but it's something you can try. A suggestion of *least* resort: if you customize your Similarity class such that all the methods round the score components to very course granularity (ie: 1.2 instead of 1.234567) you should wind up with more tight groupings of products with the *exact* same score ... you could then do a secondary sort on something else (random perhaps?) to try and make the ordering more fair. (i really have no idea how well that might work) -Hoss