Re: Sorting results within the fields

aronitin Tue, 17 Jan 2012 17:47:37 -0800

Hi Jan,

Thanks for the reply.


Here is the concrete explanation of the problem that I'm trying to solve.

*SOLR Schema*

Here is the definition of the SOLR schema

*There are 3 dynamic fields*

<dynamicField name="*_conceptid" type="text" indexed="true" stored="true"
multiValued="true" termVectors="true" termPositions="true"
termOffsets="true" />
   <dynamicField name="*_headtermencodedconceptid"
type="headprefix_term_encoding" indexed="true" stored="true"
multiValued="true" />
   <dynamicField name="*_tailtermencodedconceptid"
type="tailprefix_term_encoding" indexed="true" stored="true"
multiValued="true" />

*There are 4 searchable fields*

<field name="concepts" type="text" indexed="true" stored="false"
multiValued="true"/>
*Description*: Data in this field is Whitespace Tokenized, Stemmed,
Lowercased

 <field name="concepts_exactmatch" type="lowercase" indexed="true"
stored="false" multiValued="true"/>
*Description*: Data in this field is only lowercase and Keyword Tokenizer is
applied. So, data is not changed when stored in this field.

<field name="concepts_headtermencoded_concept"
type="headprefix_term_encoding" indexed="true" stored="false"
multiValued="true" />
*Description*: Head terms are encoded in the format HEAD$Value

<field name="concepts_tailtermencoded_concept"
type="tailprefix_term_encoding" indexed="true" stored="false"
multiValued="true" />
*Description*: Tail terms are encoded in the format TAIL$Value

The data that we store in these fields is cleaned up data from large text:
generally 1 word, 2 words, 3 words values

D1 -> UI, UI Design, UI Programming , UI Design Document, 
D2 -> UI Mockup, UI development
D3 -> UI

When somebody queries *UI*,  internal query that is generated is 
concepts_headtermencoded_concept:HEAD$ui^100.0 concepts:ui^50.0
concepts_tailtermencoded_concept:TAIL$ui^10.0

So, that head term matched document is ranked higher than partial match. 

Current Implementation without score ranks the document like: D1 > D2 > D3
(because Lucene use Tf, IDF while scoring the document)

Now, we have created *application specific score* for each concept and want
to sort the results based on that score but preserving the boost on the
field defined in the query. 
e.g.
D1 ->  UI=90, UI Design = 45, UI Programming = 40, UI Design Document = 85,
Project Wolverine=40
D2 -> UI Mockup=55, UI Development=74, Project Management=39
D3 -> UI=95, Project Wolverine=35
D4 -> UI Dev = 75, Video Project=42
        1. If a match is found and only exact match was found then sorting will
happen based on the score value for the term that we have defined.
        2. If a match is found and exact and partial matches are there. Then
sorting should happen based on the exact matched documents on top and then
partially matched documents sorted within themselves based on score.

*Examples*
*Search*: UI
*Desired Results*: D3 > D1 > D4  > D2 where (D3, D1) contains exact match
and hence scored within themselves. (D4, D2 both have head match but score
of head match in D4 > D2)

*Search*: Project
*Desired Results*: D1 > D2 > D3 > D4 Where D1, D2 and D3 are head term
matches and sorted within (D1, D2, D3) based on score and D4 is tail term
match (even though has better score tail term boost is 1/10th of head term
boost).

So,  in all we can override the TF, IDF of Lucene scoring and want do the
scoring based on our concept specific score but preserving giving the higher
preference to exact match and then partial matches.

Hope I explained the problem. Let me know if you have any specific question. 

Thanks
Nitin

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-results-within-the-fields-tp3656049p3668047.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sorting results within the fields

Reply via email to