Re: question about relevance

Chris Hostetter Thu, 29 Jul 2010 12:23:45 -0700

: 1. There are user records of type A, B, C etc. (userId field in index is
: common to all records)
: 2. A user can have any number of A, B, C etc (e.g. think of A being a
: language then user can know many languages like french, english, german etc)
: 3. Records are currently stored as a document in index.
: 4. A given query can match multiple records for the user
: 5. If for a user more records are matched (e.g. if he knows both french and
: german) then he is more relevant and should come top in UI. This is the
: reason I wanted to add lucene scores assuming the greater score means more
: relevance.


if your goal is to get back "users" from each search, then you should 
probably change your indexing strategry so that each "user" has a single 
document -- fields like "langauge" can be multivalued, etc...

then a search for "language:en langauge:fr" will return users who speak 
english or french, and hte ones that speak both will score higher.

if you really cant change the index structure, then essentially waht you 
are looking for is a "field collapsing" solution on the userId field, 
where you want each collapsed group to get a cumulative score.  i don't 
know if the existing field collapsing patches support this -- if you are 
already willing/capable to do it in the lcient then that may be the 
simplest thing to support moving foward.

Adding the scores is certainly one metric you could use -- it's generally 
suspicious to try and imply too much meaning to scores in lucene/solr but 
that's becuase people typically try to imply broader absolute meaning.  in 
the case of a single query the scores are relative eachother, and adding 
up all the scores for a given userId is approximaly what would happen in 
my example above -- except that there is also a "coord" factor that would 
penalalize documents that only match one clause ... it's complicated, but 
as an approximation adding the scores might give you what you are looking 
for -- only you can know for sure based on your specific data.



-Hoss

Re: question about relevance

Reply via email to