sam-herman commented on issue #14681: URL: https://github.com/apache/lucene/issues/14681#issuecomment-2967720821
The biggest roadblock to integrating properly with Lucene is that jVector throughout relies on a `RandomWriter` that can seek backwards. This is not compatible with Lucene's append only interfaces. As a result, we are now adding support for append only writer within jVector that is compatible with Lucene. Once it's there I think that the integration with Lucene will be much cleaner and we won't have to carry a lot of the complexity that is currently in the code of the opensearch plugin. For reference: https://github.com/datastax/jvector/pull/475 > Could you elaborate why you think Lucene is not threadsafe? Will this mismatch present some obstacle to integrating JVector? Not sure about the context in which the comment was made. But I think it's referring to jVector's reliance on various `ForkJoinPools` to build a single segment of an index (not just during merge but all the time). Not sure what the assumptions it was making about Lucene, perhaps about the per thread nature when writing new Lucene segments. I noticed that the `Lucene99HnswVectorsWriter` implementation takes `TaskExecutor mergeExec` to facilitate faster merges, but I haven't seen something similar to speed up the building of a single segment when reading flat vectors format from a source. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org