Re: Scoring by document size

Mathias Lux Tue, 17 Sep 2013 05:25:06 -0700

As the IDF values for A, B and C are minimal (couldn't get any worse
than being in any document), the major part of your score comes most
likely from the coord(..) part of scoring - which basically computes
the overlap of the query and the document. If you want to have a
stronger influence you can extend and override the Similarity
implementation. You might take a look at
http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html


cheers,
  Mathias

On Tue, Sep 17, 2013 at 1:59 PM, Upayavira <u...@odoko.co.uk> wrote:
> Have you used debugQuery=true, or fl=*,[explain], or those various
> functions? It is possible to ask Solr to tell you how it calculated the
> score, which will enable you to see what is going on in each case. You
> can probably work it out for yourself then I suspect.
>
> Upayavira
>
> On Tue, Sep 17, 2013, at 08:40 AM, blopez wrote:
>> Hi all,
>>
>> I have some doubts about the Solr scoring function. I'm using all default
>> configuration, but I'm facing a wired issue with the retrieved scores.
>>
>> In the schema, I'm going to focus in the only field I'm interested in.
>> Its
>> definition is:
>>
>> *<fieldType name="text" class="solr.TextField" sortMissingLast="true"
>> omitNorms="false">
>>                       <analyzer type="index">
>>                               <tokenizer 
>> class="solr.WhitespaceTokenizerFactory"/>
>>                               <filter class="solr.LowerCaseFilterFactory"/>
>>                               <filter 
>> class="solr.ASCIIFoldingFilterFactory"/>
>>                       </analyzer>
>>                       <analyzer type="query">
>>                               <tokenizer 
>> class="solr.WhitespaceTokenizerFactory"/>
>>                               <filter class="solr.LowerCaseFilterFactory"/>
>>                               <filter 
>> class="solr.ASCIIFoldingFilterFactory"/>
>>                       </analyzer>
>> </fieldType>
>>
>> <field name="myField" type="text" indexed="true" stored="true"
>> required="false" />*
>>
>> (omitNorms="false", if not, the document size is not taken into account
>> to
>> the final score)
>>
>> Then, I index some documents, with the following text in the 'myField'
>> field:
>>
>> doc1 = "A B C"
>> doc2 = "A B C D"
>> doc3 = "A B C D E"
>> doc4 = "A B C D E F"
>> doc5 = "A B C D E F G H"
>> doc6 = "A B C D E F G H I"
>>
>> Finally, I perform the query 'myField:("A" "B" "C")' in order to recover
>> all
>> the documents, but with different scoring (doc1 is more similar to the
>> query
>> than doc2, which is more similar than doc3, ...).
>>
>> All the documents are retrieved (OK), but the scores are like this:
>>
>> *doc1 = 2,590214
>> doc2 = 2,590214*
>> doc3 = 2,266437
>> *doc4 = 1,94266
>> doc5 = 1,94266*
>> doc6 = 1,618884
>>
>> So in conclussion, as you can see the score goes down, but not the way
>> I'd
>> like. Doc1 is getting the same scoring than Doc2, even when Doc1 matches
>> 3/3
>> tokens, and Doc2 matches 3/4 tokens.
>>
>> Is this the normal Solr behaviour? Is there any way to get my expected
>> behaviour?
>>
>> Thanks a lot,
>> Borja.
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Scoring-by-document-size-tp4090523.html
>> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Dr. Mathias Lux
Assistant Professor, Klagenfurt University, Austria
http://tinyurl.com/mlux-itec

Re: Scoring by document size

Reply via email to