benwtrent opened a new pull request, #14160:
URL: https://github.com/apache/lucene/pull/14160

   This is a continuation and completion of the work started by @benchaplin in 
https://github.com/apache/lucene/pull/14085
   
   The algorithm is fairly simple:
   
    - Only score and then explore vectors that actually match the filtering 
criteria
    - Since this will make the graph even sparser, the search spread is 
increased to also include the candidate's neighbor neighbors (e.g. generally 
maxConn * maxConn exploration)
    - Additionally, even more scored candidates for a given NSW are considered 
to combat the increased sparsity
   
   Some of the changes to the baseline Acorn algorithm are:
   
    - There is some general threshold of filtering that bypasses this algorithm 
altogether. Early benchmarking seems to indicate that this might be around 50%, 
but honestly, its not fully convincing...
    - The number of additional neighbors explored is predicated on the 
percentage of the immediate neighborhood that is filtered out
    - Only look at the extended neighbors if less than 90% of the current 
neighborhood matches the filter.
   
   Here are some numbers for 1M vectors, float32 and then int4 quantized. 
   
   
https://docs.google.com/spreadsheets/d/1GqD7Jw42IIqimr2nB78fzEfOohrcBlJzOlpt0NuUVDQ/edit?gid=163290867#gid=163290867
   
   
   Something I am unsure about:
   
    - How to expose this setting to the users? While I am not a fan of more 
configuration at query time, the behavior seems different enough to justify it. 
   
   
   TODO:
    - More manual testing over more datasets
    - Add some unit and functional tests.
   
   
   closes: https://github.com/apache/lucene/issues/13940


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to