As the IDF values for A, B and C are minimal (couldn't get any worse than being in any document), the major part of your score comes most likely from the coord(..) part of scoring - which basically computes the overlap of the query and the document. If you want to have a stronger influence you can extend and override the Similarity implementation. You might take a look at http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
cheers, Mathias On Tue, Sep 17, 2013 at 1:59 PM, Upayavira <u...@odoko.co.uk> wrote: > Have you used debugQuery=true, or fl=*,[explain], or those various > functions? It is possible to ask Solr to tell you how it calculated the > score, which will enable you to see what is going on in each case. You > can probably work it out for yourself then I suspect. > > Upayavira > > On Tue, Sep 17, 2013, at 08:40 AM, blopez wrote: >> Hi all, >> >> I have some doubts about the Solr scoring function. I'm using all default >> configuration, but I'm facing a wired issue with the retrieved scores. >> >> In the schema, I'm going to focus in the only field I'm interested in. >> Its >> definition is: >> >> *<fieldType name="text" class="solr.TextField" sortMissingLast="true" >> omitNorms="false"> >> <analyzer type="index"> >> <tokenizer >> class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter >> class="solr.ASCIIFoldingFilterFactory"/> >> </analyzer> >> <analyzer type="query"> >> <tokenizer >> class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter >> class="solr.ASCIIFoldingFilterFactory"/> >> </analyzer> >> </fieldType> >> >> <field name="myField" type="text" indexed="true" stored="true" >> required="false" />* >> >> (omitNorms="false", if not, the document size is not taken into account >> to >> the final score) >> >> Then, I index some documents, with the following text in the 'myField' >> field: >> >> doc1 = "A B C" >> doc2 = "A B C D" >> doc3 = "A B C D E" >> doc4 = "A B C D E F" >> doc5 = "A B C D E F G H" >> doc6 = "A B C D E F G H I" >> >> Finally, I perform the query 'myField:("A" "B" "C")' in order to recover >> all >> the documents, but with different scoring (doc1 is more similar to the >> query >> than doc2, which is more similar than doc3, ...). >> >> All the documents are retrieved (OK), but the scores are like this: >> >> *doc1 = 2,590214 >> doc2 = 2,590214* >> doc3 = 2,266437 >> *doc4 = 1,94266 >> doc5 = 1,94266* >> doc6 = 1,618884 >> >> So in conclussion, as you can see the score goes down, but not the way >> I'd >> like. Doc1 is getting the same scoring than Doc2, even when Doc1 matches >> 3/3 >> tokens, and Doc2 matches 3/4 tokens. >> >> Is this the normal Solr behaviour? Is there any way to get my expected >> behaviour? >> >> Thanks a lot, >> Borja. >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Scoring-by-document-size-tp4090523.html >> Sent from the Solr - User mailing list archive at Nabble.com. -- Dr. Mathias Lux Assistant Professor, Klagenfurt University, Austria http://tinyurl.com/mlux-itec