Michael McCandless created LUCENE-10421:
-------------------------------------------

             Summary: Non-deterministic results from KnnVectorQuery?
                 Key: LUCENE-10421
                 URL: https://issues.apache.org/jira/browse/LUCENE-10421
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Michael McCandless


[Nightly benchmarks|https://home.apache.org/~mikemccand/lucenebench/] have been 
upset for the past ~1.5 weeks because it looks like {{KnnVectorQuery}} is 
giving slightly different results on every run, even on an identical 
(deterministically constructed – single thread indexing, flush by doc count, 
{{{}SerialMergeSchedule{}}}, {{{}LogDocCountMergePolicy{}}}, etc.) index each 
night.  It produces failures like this, which then abort the benchmark to help 
us catch any recent accidental bug that alters our precise top N search hits 
and scores:
{noformat}
 Traceback (most recent call last):
 File “/l/util.nightly/src/python/nightlyBench.py”, line 2177, in <module>
  run()
 File “/l/util.nightly/src/python/nightlyBench.py”, line 1225, in run
  raise RuntimeError(‘search result differences: %s’ % str(errors))
RuntimeError: search result differences: 
[“query=KnnVectorQuery:vector[-0.07267512,...][10] filter=None sort=None 
groupField=None hitCount=10: hit 4 has wrong field/score value ([20844660], 
‘0.92060816’) vs ([254438\
06], ‘0.920046’)“, “query=KnnVectorQuery:vector[-0.12073054,...][10] 
filter=None sort=None groupField=None hitCount=10: hit 7 has wrong field/score 
value ([25501982], ‘0.99630797’) vs ([13688085], ‘0.9961489’)“, “qu\
ery=KnnVectorQuery:vector[0.02227773,...][10] filter=None sort=None 
groupField=None hitCount=10: hit 0 has wrong field/score value ([4741915], 
‘0.9481132’) vs ([14220828], ‘0.9579846’)“, “query=KnnVectorQuery:vector\
[0.024077624,...][10] filter=None sort=None groupField=None hitCount=10: hit 0 
has wrong field/score value ([7472373], ‘0.8460249’) vs ([12577825], 
‘0.8378446’)“]{noformat}
At first I thought this might be expected because of the recent (awesome!!) 
improvements to HNSW, so I tried to simply "regold".  But the regold did not 
"take", so it indeed looks like there is some non-determinism here.

I pinged [[email protected]] and he found this random seeding that is most 
likely the cause?
{noformat}
public final class HnswGraphBuilder {

  /** Default random seed for level generation * */
  private static final long DEFAULT_RAND_SEED = System.currentTimeMillis(); 
{noformat}

Can we somehow make this deterministic instead?  Or maybe the nightly 
benchmarks could somehow pass something in to make results deterministic for 
benchmarking?  Or ... we could also relax the benchmarks to accept 
non-determinism for {{KnnVectorQuery}} task?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to