[
https://issues.apache.org/jira/browse/LUCENE-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537013#comment-17537013
]
Uwe Schindler edited comment on LUCENE-10572 at 5/14/22 10:27 AM:
------------------------------------------------------------------
Mike, could you make a test on how much memory increaes by the PR and if
there's a speed improvement at all?
If memory due to the 2-byte length (this can sum up very fast if you know that
most terms in your index are < 128 bytes) is increasing and there's no speed
imporvement, let's throw away my PR. This would be the confirmation that equals
is memory bound (for the reasons I told you before). Arrays.equals() is using
intrinsic "ArraySupport#vectorizedMismatch()" (using SIMD) in JDK 17 -> this
should answer your question:
- Our code:
[BytesRefHash.java#L178|https://github.com/apache/lucene/blob/c1b626c0636821f4d7c085895359489e7dfa330f/lucene/core/src/java/org/apache/lucene/util/BytesRefHash.java#L178]
- Arrays#equals calling ArraySupport#mismatch:
[Arrays.java#L2710-L2712|https://github.com/openjdk/jdk/blob/jdk-17-ga/src/java.base/share/classes/java/util/Arrays.java#L2710-L2712]
- ArraySupport#mismatch calling the vectorizedMismatch method:
[ArraysSupport.java#L275-L303|https://github.com/openjdk/jdk/blob/jdk-17-ga/src/java.base/share/classes/jdk/internal/util/ArraysSupport.java#L275-L303]
- Here is the method called at end, which is {{@IntrinsicCandidate}}:
[ArraysSupport.java#L111-L161|https://github.com/openjdk/jdk/blob/dfacda488bfbe2e11e8d607a6d08527710286982/src/java.base/share/classes/jdk/internal/util/ArraysSupport.java#L111-L161]
was (Author: thetaphi):
Mike, could you make a test on how much memory increaes by the PR and if
there's a speed improvement at all?
If memory due to the 2-byte length (this can sum up very fast if you know that
most terms in your index are < 128 bytes) is increasing and there's no speed
imporvement, let's throw away my PR. This would be the confirmation that equals
is memory bound (for the reasons I told you before). Arrays.equals() is using
intrinsic "ArraySupport#vectorizedMismatch()" (using SIMD) in JDK 17 -> this
should answer your question:
- Our code:
[BytesRefHash.java#L178|https://github.com/apache/lucene/blob/c1b626c0636821f4d7c085895359489e7dfa330f/lucene/core/src/java/org/apache/lucene/util/BytesRefHash.java#L178]
- Arrays#equals calling ArraySupport#mismatch:
[Arrays.java#L2710-L2712|https://github.com/openjdk/jdk/blob/jdk-17-ga/src/java.base/share/classes/java/util/Arrays.java#L2710-L2712]
- ArraySupport#mismatch calling the vectorizedMismatch method:
[ArraysSupport.java#L275-L303|https://github.com/openjdk/jdk/blob/jdk-17-ga/src/java.base/share/classes/jdk/internal/util/ArraysSupport.java#L275-L303]
- Here is the method called at end, which is {{@IntrinsicCandidate}}:
[ArraysSupport.java#L111-L135|https://github.com/openjdk/jdk/blob/dfacda488bfbe2e11e8d607a6d08527710286982/src/java.base/share/classes/jdk/internal/util/ArraysSupport.java#L111-L135]
> Can we optimize BytesRefHash?
> -----------------------------
>
> Key: LUCENE-10572
> URL: https://issues.apache.org/jira/browse/LUCENE-10572
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Priority: Major
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> I was poking around in our nightly benchmarks
> ([https://home.apache.org/~mikemccand/lucenebench]) and noticed in the JFR
> profiling that the hottest method is this:
> {noformat}
> PERCENT CPU SAMPLES STACK
> 9.28% 53848 org.apache.lucene.util.BytesRefHash#equals()
> at
> org.apache.lucene.util.BytesRefHash#findHash()
> at org.apache.lucene.util.BytesRefHash#add()
> at
> org.apache.lucene.index.TermsHashPerField#add()
> at
> org.apache.lucene.index.IndexingChain$PerField#invert()
> at
> org.apache.lucene.index.IndexingChain#processField()
> at
> org.apache.lucene.index.IndexingChain#processDocument()
> at
> org.apache.lucene.index.DocumentsWriterPerThread#updateDocuments() {noformat}
> This is kinda crazy – comparing if the term to be inserted into the inverted
> index hash equals the term already added to {{BytesRefHash}} is the hottest
> method during nightly benchmarks.
> Discussing offline with [~rcmuir] and [~jpountz] they noticed a few
> questionable things about our current implementation:
> * Why are we using a 1 or 2 byte {{vInt}} to encode the length of the
> inserted term into the hash? Let's just use two bytes always, since IW
> limits term length to 32 K (< 64K that an unsigned short can cover)
> * Why are we doing byte swapping in this deep hotspot using {{VarHandles}}
> (BitUtil.VH_BE_SHORT.get)
> * Is it possible our growth strategy for {{BytesRefHash}} (on rehash) is not
> aggressive enough? Or the initial sizing of the hash is too small?
> * Maybe {{MurmurHash}} is not great (causing too many conflicts, and too
> many {{equals}} calls as a result?) – {{Fnv}} and {{xxhash}} are possible
> "upgrades"?
> * If we stick with {{{}MurmurHash{}}}, why are we using the 32 bit version
> ({{{}murmurhash3_x86_32{}}})?
> * Are we using the JVM's intrinsics to compare multiple bytes in a single
> SIMD instruction ([~rcmuir] is quite sure we are indeed)?
> * [~jpountz] suggested maybe the hash insert is simply memory bound
> * {{TermsHashPerField.writeByte}} is also depressingly slow (~5% of total
> CPU cost)
> I pulled these observations from a recent (5/6/22) profiler output:
> [https://home.apache.org/~mikemccand/lucenebench/2022.05.06.06.33.00.html]
> Maybe we can improve our performance on this crazy hotspot?
> Or maybe this is a "healthy" hotspot and we should leave it be!
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]