Re: [PR] Fix off-heap byte vector scoring at query time [lucene]

via GitHub Thu, 10 Jul 2025 14:34:04 -0700


msokolov commented on PR #14874:
URL: https://github.com/apache/lucene/pull/14874#issuecomment-3059172791


   > My understanding was that off heap document vectors helped by avoiding a 
copy back into the heap, plus avoiding the cost of reallocation and copy if 
some of them got garbage collected. But doesn't this change add a copy, by 
copying the byte[] queryVector from heap to the allocated off-heap segment? 
Also, since the query vector is only used during the lifetime of the query, I 
would've thought keeping it on heap should be okay?
   
   It is confusing to me too. I think to understand it we need to decompile and 
look at the instructions that are generated -- after hotspot does its work. 
Maybe we are bypassing memory barriers that get applied to on-heap arrays? I am 
really not sure.
   
   > I'm confused, if dotProductWTF and dotProduct are exactly identical, why 
did dotProductWTF fix the 'search after indexing' case?
   
   The idea behind this was to create two separate code paths: one used during 
indexing (when both arrays are on-heap) and another one used during search, 
when one array is one-heap and the others are off-heap (memory mapped from 
disk). This seems to enable hotspot to separately optimize these two code paths.
   
   There is yet another mystery here, which is: why, after adding this hotspot 
hackery, do we see *even faster* performance on the query path when it is 
preceded by an indexing workflow than we do when it is not (although it's still 
faster than the baseline).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Fix off-heap byte vector scoring at query time [lucene]

Reply via email to