Hello all,

I have a standard solr 7.4 installation and i have a question regarding how
BM25 similarity is computed. Here is an example to describe my question

1. create core `test_core`

2. add to `test_core/conf/managed-schema`: the following field and fieldType

<field name="text"       type="ngram_text" indexed="true" stored="true"
docValues="false" multiValued="false"/>

<fieldType name="ngram_text" class="solr.TextField"
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
maxGramSize="7"/>
      </analyzer>
</fieldType>

3. restart solr and add the following document:

{
    "id":1,
    "text":"apples oranges"
}

4. perform query: test_core/select?debugQuery=on&fl=*,score&q=text:apples

5. check bm25 calculation:

0.57919353 = weight(Synonym(text:ap text:app text:appl text:apple
text:apples) in 0) [SchemaSimilarity], result of:
  0.57919353 = score(doc=0,freq=5.0 = termFreq=5.0
), product of:
    0.2876821 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
(docFreq + 0.5)) from:
      1.0 = docFreq
      1.0 = docCount
    2.0133111 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b
+ b * fieldLength / avgFieldLength)) from:
      5.0 = termFreq=5.0
      1.2 = parameter k1
      0.75 = parameter b
      11.0 = avgFieldLength
      2.0 = fieldLength

------

My question is that, since i only have one document, shouldn't fieldLength
and avgFieldLength have the same value? I notice that avgFieldLength uses
the NGram filter while the field length doesn't.

Shouldn't avgFieldLength be the average of fieldLength?

Thank you

-- 

AVISO DE CONFIDENCIALIDAD.
Este
 correo y la información contenida o 
adjunta al mismo es privada y 
confidencial y va dirigida exclusivamente a 
su destinatario. Pragsis 
informa a quien pueda haber recibido este correo 
por error que contiene 
información confidencial cuyo uso, copia, 
reproducción o distribución 
está expresamente prohibida. Si no es Vd. el 
destinatario del mismo y 
recibe este correo por error, le rogamos lo ponga 
en conocimiento del 
emisor y proceda a su eliminación sin copiarlo, 
imprimirlo o utilizarlo 
de ningún modo.



CONFIDENTIALITY WARNING.
This
 
message and the information contained in or attached to it are private 
and 
confidential and intended exclusively for the addressee. Pragsis 
informs 
to whom it may receive it in error that it contains privileged 
information 
and its use, copy, reproduction or distribution is 
prohibited. If you are 
not an intended recipient of this E-mail, please 
notify the sender, delete 
it and do not read, act upon, print, disclose,
 copy, retain or 
redistribute any portion of this E-mail.

Reply via email to