ExactStatsCache not very exact

Markus Jelsma Wed, 10 Feb 2016 09:23:47 -0800

Hi - i've noticed ExactStatsCache is not very exact on consecutive calls, see 
the following explains for the number one result:


70.76961 = sum of:
  70.76961 = max plus 0.65 times others of:
    70.76961 = weight(title_nl:contactformulier in 210879) [], result of:
      70.76961 = score(doc=210879,freq=1.0 = termFreq=1.0
), product of:
        7.4 = boost
        8.900626 = idf(docFreq=51, docCount=377832)
        1.0744705 = tfNorm, computed from:
          1.0 = termFreq=1.0
          0.3 = parameter k1
          0.75 = parameter b
          17.079535 = avgFieldLength
          10.24 = fieldLength


70.75283 = sum of:
  70.75283 = max plus 0.65 times others of:
    70.75283 = weight(title_nl:contactformulier in 140774) [], result of:
      70.75283 = score(doc=140774,freq=1.0 = termFreq=1.0
), product of:
        7.4 = boost
        8.898066 = idf(docFreq=51, docCount=376866)
        1.0745249 = tfNorm, computed from:
          1.0 = termFreq=1.0
          0.3 = parameter k1
          0.75 = parameter b
          17.087309 = avgFieldLength
          10.24 = fieldLength

It is clear that avgFieldLength and docCount are different. Both http requests 
where made on the same shard right after each other. This cluster has three 
shards, tho replica's.

ExactStatsCache is working very well though, if we disable it everything 
becomes a mess. When INFO logging is on, we clearly see the requests for 
collection statistics and in the overall resultset, docCount is equal for all 
results, even if they reside on different shards. I am curious though as to why 
it sometimes does produce different. Any known Jira for this?

Markus

ExactStatsCache not very exact

Reply via email to