[PR] Use read advice consistently in the knn vector formats [lucene]

via GitHub Tue, 17 Dec 2024 05:47:13 -0800


jimczi opened a new pull request, #14076:
URL: https://github.com/apache/lucene/pull/14076


   This change reverts #13985 and makes sure each knn format sticks to a single 
read advice consistently. 
   Switching read advice during merges might help some use cases, but it can 
also hurt others—e.g. when search and merges are running at the same time. To 
balance this, the approach here picks one read advice per format, focusing on 
what’s most resource-intensive for that format.  
   
   For formats using HNSW, the read advice is set to `RANDOM` and doesn’t 
change during merges. Copying bytes from old segments to new ones is much 
faster than re-building the graph, so keeping `RANDOM` read advice makes the 
most sense.  
   
   For flat formats, the read advice is set to `SEQUENTIAL`, as brute-force is 
the only way to retrieve nearest neighbors.  
   
   This is a deliberate decision to keep things simple and predictable. While 
it might seem like a step back compared to #13985, using multiple read advices 
on the same file can lead to unpredictable behavior—it might seem fine until 
you test it in a constrained setup.  
   
   That said, we could still improve merge performance with a `RANDOM` read 
advice in the future, for instance, by adding eager prefetching. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[PR] Use read advice consistently in the knn vector formats [lucene]

Reply via email to