Michael Sokolov created LUCENE-10590:
----------------------------------------

             Summary: Indexing all zero vectors leads to heat death of the 
universe
                 Key: LUCENE-10590
                 URL: https://issues.apache.org/jira/browse/LUCENE-10590
             Project: Lucene - Core
          Issue Type: Bug
            Reporter: Michael Sokolov


By accident while testing something else, I ran a luceneutil test indexing 1M 
100d vectors where all the vectors were all zeroes. This caused indexing to 
take a very long time (~40x normal - it did eventually complete) and the search 
performance was similarly bad.  We should not degrade by orders of magnitude 
with even the worst data though.

I'm not entirely sure what the issue is, but perhaps as long as we keep finding 
hits that are "better" we keep exploring the graph, where better means (score, 
-docid) >= (lowest score, -docid). If that's right and all docs have the same 
score, then we probably need to either switch to > (but this could lead to 
poorer recall in normal cases) or introduce some kind of minimum score 
threshold?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to