ChrisHegarty opened a new pull request, #12703:
URL: https://github.com/apache/lucene/pull/12703

   [ This PR is draft - not ready to me merged. It is intended to help 
facilitate a discussion ]
   
   This PR enhances the vector similarity functions so that they can access the 
underlying memory directly, rather than first copying to a primitive array 
within the Java heap.
   
   I added a number of overloads to `VectorSimilarityFunction`, to allow the 
retrieval of the vector data to be pushed down. This way, the actual retrieval 
can be pushed into the provider implementation. This feels right, to me.
   
   `RandomAccessVectorValues` encapsulates and provides access to the 
underlying vector data. I added the ability to retrieve the backing IndexInput 
here, so that it's possible to bypass the accessor that does the copy. This is 
not great, but maybe ok, especially if we could restrict access?
   
   I updated `MemorySegmentIndexInput` to support retrieval of the backing 
segment for a given position. That way the vector provider can access this 
directly. This kinda feels ok, it makes the vector provider and memory segment 
index input more close in nature, without imposing incubating APIs into the 
currently-previewing implementation.
   
   There are now a couple of extra variants of the distance calculation 
functions, but they are largely a cut'n'paste of sections of each other. We 
might not need them all, but that kinda depends on which are more performance 
sensitive than others.
   
   Currently only float dot product has been updated, so as to first determine 
the potential performance benefits as well as the approach.
   
   Outstanding work and areas for improvement:
   1. Figure out something better than exposing IndexInput in 
RandomAccessVectorValues
   2. Binary and other distance calculations
   3. Evaluate perf impact with macro benchmarks ( only micro done so far )
   4. There are a couple of hacks to get the benchmark running - remove these
   5.  ..
   
   relates #12482


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to