Michael Sokolov created LUCENE-10590: ----------------------------------------
Summary: Indexing all zero vectors leads to heat death of the universe Key: LUCENE-10590 URL: https://issues.apache.org/jira/browse/LUCENE-10590 Project: Lucene - Core Issue Type: Bug Reporter: Michael Sokolov By accident while testing something else, I ran a luceneutil test indexing 1M 100d vectors where all the vectors were all zeroes. This caused indexing to take a very long time (~40x normal - it did eventually complete) and the search performance was similarly bad. We should not degrade by orders of magnitude with even the worst data though. I'm not entirely sure what the issue is, but perhaps as long as we keep finding hits that are "better" we keep exploring the graph, where better means (score, -docid) >= (lowest score, -docid). If that's right and all docs have the same score, then we probably need to either switch to > (but this could lead to poorer recall in normal cases) or introduce some kind of minimum score threshold? -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org