ChrisHegarty opened a new pull request, #12703: URL: https://github.com/apache/lucene/pull/12703
[ This PR is draft - not ready to me merged. It is intended to help facilitate a discussion ] This PR enhances the vector similarity functions so that they can access the underlying memory directly, rather than first copying to a primitive array within the Java heap. I added a number of overloads to `VectorSimilarityFunction`, to allow the retrieval of the vector data to be pushed down. This way, the actual retrieval can be pushed into the provider implementation. This feels right, to me. `RandomAccessVectorValues` encapsulates and provides access to the underlying vector data. I added the ability to retrieve the backing IndexInput here, so that it's possible to bypass the accessor that does the copy. This is not great, but maybe ok, especially if we could restrict access? I updated `MemorySegmentIndexInput` to support retrieval of the backing segment for a given position. That way the vector provider can access this directly. This kinda feels ok, it makes the vector provider and memory segment index input more close in nature, without imposing incubating APIs into the currently-previewing implementation. There are now a couple of extra variants of the distance calculation functions, but they are largely a cut'n'paste of sections of each other. We might not need them all, but that kinda depends on which are more performance sensitive than others. Currently only float dot product has been updated, so as to first determine the potential performance benefits as well as the approach. Outstanding work and areas for improvement: 1. Figure out something better than exposing IndexInput in RandomAccessVectorValues 2. Binary and other distance calculations 3. Evaluate perf impact with macro benchmarks ( only micro done so far ) 4. There are a couple of hacks to get the benchmark running - remove these 5. .. relates #12482 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org