robertvanwinkle1138 commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1749187258

   The SPANN paper does not address efficient filtered queries.  Lucene's HNSW 
calculates the similarity score for every record, regardless of the record 
matching the filter.  
   
   Filtered − DiskANN [1] describes a solution for efficient filtered queries.  
   
   QDrant has a filter solution however the methodology described in their blog 
is opaque.  
   
   1. https://dl.acm.org/doi/pdf/10.1145/3543507.3583552
   
   > As Approximate Nearest Neighbor Search (ANNS)-based dense retrieval 
becomes ubiquitous for search and recommendation scenarios, efciently answering 
fltered ANNS queries has become a critical requirement. Filtered ANNS queries 
ask for the nearest neighbors of a query’s embedding from the points in the 
index that match the query’s labels such as date, price range, language. There 
has been little prior work on algorithms that use label metadata associated 
with vector data to build efcient indices for fltered ANNS queries. 
Consequently, current indices have high search latency or low recall which is 
not practical in interactive web-scenarios. We present two algorithms with 
native support for faster and more accurate fltered ANNS queries: one with 
streaming support, and another based on batch construction. Central to our 
algorithms is the construction of a graph-structured index which forms 
connections not only based on the geometry of the vector data, but also the 
associated lab
 el set. On real-world data with natural labels, both algorithms are an order 
of magnitude or more efcient for fltered queries than the current state of the 
art algorithms. The generated indices also be queried from an SSD and support 
thousands of queries per second at over 90% recall@10.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to