Hello all, I have a standard solr 7.4 installation and i have a question regarding how BM25 similarity is computed. Here is an example to describe my question
1. create core `test_core` 2. add to `test_core/conf/managed-schema`: the following field and fieldType <field name="text" type="ngram_text" indexed="true" stored="true" docValues="false" multiValued="false"/> <fieldType name="ngram_text" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="7"/> </analyzer> </fieldType> 3. restart solr and add the following document: { "id":1, "text":"apples oranges" } 4. perform query: test_core/select?debugQuery=on&fl=*,score&q=text:apples 5. check bm25 calculation: 0.57919353 = weight(Synonym(text:ap text:app text:appl text:apple text:apples) in 0) [SchemaSimilarity], result of: 0.57919353 = score(doc=0,freq=5.0 = termFreq=5.0 ), product of: 0.2876821 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from: 1.0 = docFreq 1.0 = docCount 2.0133111 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from: 5.0 = termFreq=5.0 1.2 = parameter k1 0.75 = parameter b 11.0 = avgFieldLength 2.0 = fieldLength ------ My question is that, since i only have one document, shouldn't fieldLength and avgFieldLength have the same value? I notice that avgFieldLength uses the NGram filter while the field length doesn't. Shouldn't avgFieldLength be the average of fieldLength? Thank you -- AVISO DE CONFIDENCIALIDAD. Este correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo. CONFIDENTIALITY WARNING. This message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, retain or redistribute any portion of this E-mail.