On 09.12.2010 21:26, ext Chris Hostetter wrote:
: doc1 is name=A B category=B
: doc2 is name=A category=B
:
: when searching for the terms "A" and "B" I want doc2 to get a higher score.
: to be more specific, I don't want the term "B" to influence doc1's score in
: both<name>  and<category>, only in one of them.

if you set the boost value of category to something very high, and set
tie=0 you should get the exact behavior you describe.

with tie=0, each clause (ie: "B") will only get a score contribution from
the highest scoring field -- if the qf boost value for category is
significantly higher then the boost value for "name" this should work
fine.

this is one of hte prime usecases for dismax: a "category" or "doc_type"
field that has a very small finite set of values in it which frequently
doesn't match anythin users type, but you configure it with a hight boost
value so when it *does* match something the user types, it causes
documents in that category (or having htat document_type) to dominate over
other documents.

-Hoss

yup, you actually formulated the main usecase for the dismax Query handler ;)

What you are probably stumbling about is - according to your example from the beginning - the basic scoring and therefore the weights to set.

 query: qf=name^5 category&q=pulp fiction&mm=2

Given a category field I assume, that the total number of tokens in here is rather small compared to your title field, thus the idf is low for each term, hitting in here. The idf for a token in the title field is probably rather high. Thus by default, a hit in the title would score higher. => boost the category field would be the easy solution Second you might have more than one category word in the category field? If so, the field normalization would also rate a hit in here down. I think it could help to deactivate norms for this field (omitNorms=true in the field type configuration) If this is not enough you can go into the similarity and change the implementation according to the field given

search time: idfExplain(...)/idf()
index time: lengthNorm(...)/computeNorm(...)


Reply via email to