vigyasharma commented on PR #14874:
URL: https://github.com/apache/lucene/pull/14874#issuecomment-3059155132

   Wow, these are very impressive gains! Nice find @kaivalnp.
   
   So the key change is in `Arena.ofAuto().allocateFrom(JAVA_BYTE, 
queryVector);` which allocates an off heap `MemorySegment` for the query 
vector? Do we have an intuition on why it creates such dramatic (4x?!) speedup?
   
   My understanding was that off heap document vectors helped by avoiding a 
copy back into the heap, plus avoiding the cost of reallocation and copy if 
some of them got garbage collected. But doesn't this change add a _copy_, by 
copying the `byte[] queryVector` from heap to the allocated off-heap segment? 
Also, since the query vector is only used during the lifetime of the query, I 
would've thought keeping it on heap should be okay?
   
   ...
   
   > Separately, I tried using the `Arena.ofAuto().allocateFrom()` construct in 
the on-heap case that is used during indexing and this made indexing incredibly 
slow. I guess it is because we force the allocation of many many small memory 
segments that have to be cleaned up by the garbage collector.
   
   @msokolov : so indexing is faster if vectors are "on heap", but search is 
faster if vectors are "off heap".. Or do you think it's mainly because of the 
`ofAuto()` which defers gc to jvm? Can you share the code path for indexing 
that you modified?
   
   ...
   
   > Where `dotProductWTF` is just a clone of `dotProduct`
   
   I'm confused, if `dotProductWTF` and `dotProduct` are exactly identical, why 
did `dotProductWTF` fix the _'search after indexing'_ case?
   
   ...
   Separately, this PR holds up in real world benchmarks and looks good to 
merge. There are some interesting spin off issues here that can make indexing 
faster.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to