benwtrent opened a new pull request, #11905: URL: https://github.com/apache/lucene/pull/11905
This bug has been around since 9.1. It relates directly to the number of nodes that are contained in the level 0 of the HNSW graph. Since level 0 contains all the nodes, this implies the following: - In Lucene 9.1, the bug probably would have appeared once `31580641` (Integer.MAX_VALUE/(maxConn + 1)) vectors were in a single segment - In Lucene 9.2+, the bug appears when there are `16268814` (Integer.MAX_VALUE/(M * 2 + 1)) or more vectors in a single segment. The stack trace would indicate an EOF failure as Lucene attempts to `seek` to a negative number in `ByteBufferIndexInput`. This commit fixes the type casting and utilizes `Math.exact...` in the number multiplication and addition. The overhead here is minimal as these calculations are done in constructors and then used repeatably afterwards. I put fixes in the older codecs, I don't know if that is typically done, but if somebody has a large segment and wants to read the vectors, they could build this jar and read them now (the bug is only on read and data layout is unchanged) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org