Re: [I] Integrate a JVector codec for KNN searches [lucene]

via GitHub Thu, 12 Jun 2025 10:49:33 -0700


sam-herman commented on issue #14681:
URL: https://github.com/apache/lucene/issues/14681#issuecomment-2967720821


   The biggest roadblock to integrating properly with Lucene is that jVector 
throughout relies on a `RandomWriter` that can seek backwards. This is not 
compatible with Lucene's append only interfaces.
   As a result, we are now adding support for append only writer within jVector 
that is compatible with Lucene. Once it's there I think that the integration 
with Lucene will be much cleaner and we won't have to carry a lot of the 
complexity that is currently in the code of the opensearch plugin.
   
   For reference:
   https://github.com/datastax/jvector/pull/475
   
   > Could you elaborate why you think Lucene is not threadsafe? Will this 
mismatch present some obstacle to integrating JVector?
   
   Not sure about the context in which the comment was made. But I think it's 
referring to jVector's reliance on various `ForkJoinPools` to build a single 
segment of an index (not just during merge but all the time). 
   Not sure what the assumptions it was making about Lucene, perhaps about the 
per thread nature when writing new Lucene segments.
   I noticed that the `Lucene99HnswVectorsWriter` implementation takes 
`TaskExecutor mergeExec` to facilitate faster merges, but I haven't seen 
something similar to speed up the building of a single segment when reading 
flat vectors format from a source. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Integrate a JVector codec for KNN searches [lucene]

Reply via email to