benwtrent commented on issue #12440:
URL: https://github.com/apache/lucene/issues/12440#issuecomment-1682640430

   > What is your take on existing merge optimization 
https://github.com/apache/lucene/pull/12050?
   
   I think its a good start. One problem I have is that typical Lucene segment 
merging attempts to merge segments of equal size together. Making "segment 
tiers" of similar sizes. Maybe an adequate "make merges faster" optimization is 
to have a better segment merge policy that takes advantage of the inherit 
advantages with this optimization.
   
   > A given vector/document will go through many segment merge in its life 
time, so the benefit of this optimization accrue a lot.
   Caveat: I used random vectors.
   
   This is true, but it depends on the merge policy and which segments are 
merged. When merging 10 segments that are of equal size, this optimization has 
almost no impact.
   
   > have we consider integrating other - native - libraries (faiss, raft, 
nmslib...) like what is done in open search (at a higher abstraction level 
though).
   
   I am unsure about this. A new codec could be made integrating those native 
libraries, but they should fit within the Lucene segment model and not use JNI. 
From what I can tell, those integrations don't do either of those things.
   
   Additionally, there shouldn't be any external dependencies (if directly 
integrated into the Lucene repo). See Discussion: 
https://github.com/apache/lucene/issues/12502
   
   
   
   Other options for "making merges faster" is to just provide scalar 
quantization for users. This will make merges as a whole faster as the 
computations required will be much cheaper.
   
   It bugs me that we have all this distributed work across segments that just 
gets ignored. No matter if this was a native implementation or not, merging 
similarly sized HNSW graphs from 9 segments into 1 will still be costly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to