benwtrent opened a new pull request, #14160: URL: https://github.com/apache/lucene/pull/14160
This is a continuation and completion of the work started by @benchaplin in https://github.com/apache/lucene/pull/14085 The algorithm is fairly simple: - Only score and then explore vectors that actually match the filtering criteria - Since this will make the graph even sparser, the search spread is increased to also include the candidate's neighbor neighbors (e.g. generally maxConn * maxConn exploration) - Additionally, even more scored candidates for a given NSW are considered to combat the increased sparsity Some of the changes to the baseline Acorn algorithm are: - There is some general threshold of filtering that bypasses this algorithm altogether. Early benchmarking seems to indicate that this might be around 50%, but honestly, its not fully convincing... - The number of additional neighbors explored is predicated on the percentage of the immediate neighborhood that is filtered out - Only look at the extended neighbors if less than 90% of the current neighborhood matches the filter. Here are some numbers for 1M vectors, float32 and then int4 quantized. https://docs.google.com/spreadsheets/d/1GqD7Jw42IIqimr2nB78fzEfOohrcBlJzOlpt0NuUVDQ/edit?gid=163290867#gid=163290867 Something I am unsure about: - How to expose this setting to the users? While I am not a fan of more configuration at query time, the behavior seems different enough to justify it. TODO: - More manual testing over more datasets - Add some unit and functional tests. closes: https://github.com/apache/lucene/issues/13940 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org