mikemccand opened a new issue, #14763:
URL: https://github.com/apache/lucene/issues/14763

   ### Description
   
   Peeking at the last nightly benchmark data point, I see the top CPU hotspots 
during indexing.
   
   That 2nd one (13.12% in `ramBytesUsed`) is concerning ... I know we need to 
account properly for RAM so IW can flush when RAM exceeds its allowance ... but 
maybe we can optimize how we do that for HNSW?
   
   8.63% spent clearing bitsets for HNSW searching is also scary -- that likely 
impacts search performance too (since building an HNSW graph is done by doing a 
search for each inserted vector)?
   
   Also what exactly is `reduceLanesTemplate`?  I find this name very 
non-intuitive :)  Is it essentially a cast (like `long` -> `int`) for a vector?
   
   ```
   Profiler for cpu:
   WARNING: Using incubator modules: jdk.incubator.vector
   PROFILE SUMMARY from 4433882 events (total: 4M)
     tests.profile.mode=cpu
     tests.profile.count=50
     tests.profile.stacksize=4
     tests.profile.linenumbers=false
   PERCENT       CPU SAMPLES   STACK
   23.57%        1M            
jdk.incubator.vector.FloatVector#reduceLanesTemplate() [Inlined code]
                                 at 
jdk.incubator.vector.Float256Vector#reduceLanes() [Inlined code]
                                 at 
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody()
 [JIT compiled code]
                                 at 
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProduct() 
[Inlined code]
   13.12%        581719        
org.apache.lucene.util.hnsw.NeighborArray#ramBytesUsed() [Inlined code]
                                 at 
org.apache.lucene.util.hnsw.OnHeapHnswGraph#updateGraphRamBytesUsed() [JIT 
compiled code]
                                 at 
org.apache.lucene.util.hnsw.OnHeapHnswGraph#addNode() [JIT compiled code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNodeInternal() [JIT 
compiled code]
   9.51%         421729        
org.apache.lucene.index.FloatVectorValues$1#vectorValue() [Inlined code]
                                 at 
org.apache.lucene.codecs.hnsw.DefaultFlatVectorScorer$FloatScoringSupplier$1#score()
 [Inlined code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNodeInternal() [JIT 
compiled code]
   8.63%         382600        java.util.Arrays#fill() [Inlined code]
                                 at org.apache.lucene.util.FixedBitSet#clear() 
[Inlined code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphSearcher#prepareScratchState() [Inlined 
code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
   4.55%         201882        org.apache.lucene.util.FixedBitSet#getAndSet() 
[Inlined code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNodeInternal() [JIT 
compiled code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode() [Inlined code]
   3.12%         138341        
org.apache.lucene.util.RamUsageEstimator#sizeOf() [Inlined code]
                                 at 
org.apache.lucene.internal.hppc.MaxSizedIntArrayList#ramBytesUsed() [Inlined 
code]
                                 at 
org.apache.lucene.util.hnsw.NeighborArray#ramBytesUsed() [Inlined code]
                                 at 
org.apache.lucene.util.hnsw.OnHeapHnswGraph#updateGraphRamBytesUsed() [JIT 
compiled code]
   2.60%         115254        
org.apache.lucene.util.hnsw.HnswConcurrentMergeBuilder$MergeSearcher#graphSeek()
 [JIT compiled code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNodeInternal() [JIT 
compiled code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode() [Inlined code]
   2.42%         107338        
org.apache.lucene.internal.hppc.MaxSizedIntArrayList#ramBytesUsed() [Inlined 
code]
                                 at 
org.apache.lucene.util.hnsw.NeighborArray#ramBytesUsed() [Inlined code]
                                 at 
org.apache.lucene.util.hnsw.OnHeapHnswGraph#updateGraphRamBytesUsed() [JIT 
compiled code]
                                 at 
org.apache.lucene.util.hnsw.OnHeapHnswGraph#addNode() [JIT compiled code]
   2.36%         104842        
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody()
 [JIT compiled code]
                                 at 
org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProduct() 
[Inlined code]
                                 at 
org.apache.lucene.util.VectorUtil#dotProduct() [Inlined code]
                                 at 
org.apache.lucene.index.VectorSimilarityFunction$2#compare() [Inlined code]
   2.36%         104836        
org.apache.lucene.util.hnsw.OnHeapHnswGraph#updateGraphRamBytesUsed() [JIT 
compiled code]
                                 at 
org.apache.lucene.util.hnsw.OnHeapHnswGraph#addNode() [JIT compiled code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNodeInternal() [JIT 
compiled code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode() [Inlined code]
   2.21%         97814         
org.apache.lucene.util.hnsw.OnHeapHnswGraph#getNeighbors() [Inlined code]
                                 at 
org.apache.lucene.util.hnsw.HnswConcurrentMergeBuilder$MergeSearcher#graphSeek()
 [JIT compiled code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
                                 at 
org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNodeInternal() [JIT 
compiled code]
   1.55%         68929         
java.util.concurrent.locks.AbstractQueuedSynchronizer#apparentlyFirstQueuedIsExclusive()
 [Inlined code]
                                 at 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync#readerShouldBlock()
 [Inlined code]
                                 at 
java.util.concurrent.locks.ReentrantReadWriteLock$Sync#tryAcquireShared() 
[Inlined code]
                                 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer#acquireShared() [Inlined 
code]
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to