kaivalnp commented on PR #14874: URL: https://github.com/apache/lucene/pull/14874#issuecomment-3079893943
> I'll try to dig deeper on why this is happening.. @msokolov I tried what you mentioned [above](https://github.com/apache/lucene/pull/14874#issuecomment-3057200869), using the following hack: - Create a clone of [this function](https://github.com/apache/lucene/blob/d8b52ade0caee2e0505eead83bd4d6be859a6472/lucene/core/src/java24/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java#L311) - Change [this line](https://github.com/apache/lucene/blob/d8b52ade0caee2e0505eead83bd4d6be859a6472/lucene/core/src/java24/org/apache/lucene/internal/vectorization/Lucene99MemorySegmentByteVectorScorer.java#L115) to use the cloned function ..and the performance changed drastically! `main`: ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s force_merge(s) num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.961 1.836 1.835 0.999 100000 100 50 64 250 no 11.65 8582.22 24.70 1 77.47 292.969 292.969 HNSW ``` after the hack: ``` recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s force_merge(s) num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.961 1.201 1.200 0.999 100000 100 50 64 250 no 11.67 8569.71 50.90 1 77.47 292.969 292.969 HNSW ``` Note that search time is \~35% faster, while force-merge time is \~106% slower! Looks like the JVM is producing optimized branches of code based on the underlying input type(s) of `int dotProduct(MemorySegment, MemorySegment)` -- and the non-optimized branches suffer a latency regression.. On `main`, looks like the indexing version was optimized, while the search version was optimized after the hack Had the following questions: 1. Is this also an issue in long-running applications, or just a benchmark issue with `luceneutil`? 2. How can we refactor functions around so that the optimal case is always used? Perhaps separate out Panama / Vector API usages? 3. Also, does the issue disproportionately affect applications where indexing and search happen on the same node? (v/s applications with separate writers / searchers -- and both internally execute their own optimized branches) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org