[jira] [Commented] (LUCENE-10572) Can we optimize BytesRefHash?
[ https://issues.apache.org/jira/browse/LUCENE-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537009#comment-17537009 ] Uwe Schindler commented on LUCENE-10572: Hi, actually the reason why BE encoding was used in ByteBlockPool and BytesRefHash (same for PagedBytes) was to emulate those vInt encodinf. For the data structure it makes otherwise no difference. So I am fine with any decission. The only thing that needs to be done is to remove the vInt encoding, because it relies on BE for it to work (see the code that I removed). It could be fixed to also work LE, but its not universal. If we do not run into space problem during indexing (if you have many short terms which is the default for most texts, the special case with 1-byte encoding for lengths < 128 is a good idea). It just brings problems when you all the time encode/decode it (e.g. when searching the hash table with equals). >From looking at the code: I doubt that the problem really comes from the vInt >encoding in BE format. Most terms are shorter than 128 bytes in normal >indexes. I am quite sure the problem with the hotspot at equals is simple that >you need to do a full comparison using Arrays.equals(). This happens mostly in >the case that the term is already there (hash collisions also happen on same >term). Simpy said: If you insert a BytesRef into the hash and the term already >exists (a common case during indexing text), it will get a hashcode that >already exists in the table. To verify that the term is really the same it has >to compare the bytes. This is why we see the hotspot on equals. There does not >even help a better hashing algorithm, as the case "term already in table" >always needs a verification no matter how good the hashing algorithm is. > Can we optimize BytesRefHash? > - > > Key: LUCENE-10572 > URL: https://issues.apache.org/jira/browse/LUCENE-10572 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > I was poking around in our nightly benchmarks > ([https://home.apache.org/~mikemccand/lucenebench]) and noticed in the JFR > profiling that the hottest method is this: > {noformat} > PERCENT CPU SAMPLES STACK > 9.28% 53848 org.apache.lucene.util.BytesRefHash#equals() > at > org.apache.lucene.util.BytesRefHash#findHash() > at org.apache.lucene.util.BytesRefHash#add() > at > org.apache.lucene.index.TermsHashPerField#add() > at > org.apache.lucene.index.IndexingChain$PerField#invert() > at > org.apache.lucene.index.IndexingChain#processField() > at > org.apache.lucene.index.IndexingChain#processDocument() > at > org.apache.lucene.index.DocumentsWriterPerThread#updateDocuments() {noformat} > This is kinda crazy – comparing if the term to be inserted into the inverted > index hash equals the term already added to {{BytesRefHash}} is the hottest > method during nightly benchmarks. > Discussing offline with [~rcmuir] and [~jpountz] they noticed a few > questionable things about our current implementation: > * Why are we using a 1 or 2 byte {{vInt}} to encode the length of the > inserted term into the hash? Let's just use two bytes always, since IW > limits term length to 32 K (< 64K that an unsigned short can cover) > * Why are we doing byte swapping in this deep hotspot using {{VarHandles}} > (BitUtil.VH_BE_SHORT.get) > * Is it possible our growth strategy for {{BytesRefHash}} (on rehash) is not > aggressive enough? Or the initial sizing of the hash is too small? > * Maybe {{MurmurHash}} is not great (causing too many conflicts, and too > many {{equals}} calls as a result?) – {{Fnv}} and {{xxhash}} are possible > "upgrades"? > * If we stick with {{{}MurmurHash{}}}, why are we using the 32 bit version > ({{{}murmurhash3_x86_32{}}})? > * Are we using the JVM's intrinsics to compare multiple bytes in a single > SIMD instruction ([~rcmuir] is quite sure we are indeed)? > * [~jpountz] suggested maybe the hash insert is simply memory bound > * {{TermsHashPerField.writeByte}} is also depressingly slow (~5% of total > CPU cost) > I pulled these observations from a recent (5/6/22) profiler output: > [https://home.apache.org/~mikemccand/lucenebench/2022.05.06.06.33.00.html] > Maybe we can improve our performance on this crazy hotspot? > Or maybe this is a "healthy" hotspot and we should leave it be! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-
[jira] [Commented] (LUCENE-10572) Can we optimize BytesRefHash?
[ https://issues.apache.org/jira/browse/LUCENE-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537013#comment-17537013 ] Uwe Schindler commented on LUCENE-10572: Mike, could you make a test on how much memory increaes by the PR and if there's a speed improvement at all? If memory due to the 2-byte length (this can sum up very fast if you know that most terms in your index are < 128 bytes) is increasing and there's no speed imporvement, let's throw away my PR. This would be the confirmation that equals is memory bound (for the reasons I told you before). Arrays.equals() is using intrinsic "ArraySupport#vectorizedMismatch()" (using SIMD) in JDK 17 -> this should answer your question: - Our code: [BytesRefHash.java#L178|https://github.com/apache/lucene/blob/c1b626c0636821f4d7c085895359489e7dfa330f/lucene/core/src/java/org/apache/lucene/util/BytesRefHash.java#L178] - Arrays#equals calling ArraySupport#mismatch: [Arrays.java#L2710-L2712|https://github.com/openjdk/jdk/blob/jdk-17-ga/src/java.base/share/classes/java/util/Arrays.java#L2710-L2712] - ArraySupport#mismatch calling the vectorizedMismatch method: [ArraysSupport.java#L275-L303|https://github.com/openjdk/jdk/blob/jdk-17-ga/src/java.base/share/classes/jdk/internal/util/ArraysSupport.java#L275-L303] - Here is the method called at end, which is {{@IntrinsicCandidate}}: [ArraysSupport.java#L111-L135|https://github.com/openjdk/jdk/blob/dfacda488bfbe2e11e8d607a6d08527710286982/src/java.base/share/classes/jdk/internal/util/ArraysSupport.java#L111-L135] > Can we optimize BytesRefHash? > - > > Key: LUCENE-10572 > URL: https://issues.apache.org/jira/browse/LUCENE-10572 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > I was poking around in our nightly benchmarks > ([https://home.apache.org/~mikemccand/lucenebench]) and noticed in the JFR > profiling that the hottest method is this: > {noformat} > PERCENT CPU SAMPLES STACK > 9.28% 53848 org.apache.lucene.util.BytesRefHash#equals() > at > org.apache.lucene.util.BytesRefHash#findHash() > at org.apache.lucene.util.BytesRefHash#add() > at > org.apache.lucene.index.TermsHashPerField#add() > at > org.apache.lucene.index.IndexingChain$PerField#invert() > at > org.apache.lucene.index.IndexingChain#processField() > at > org.apache.lucene.index.IndexingChain#processDocument() > at > org.apache.lucene.index.DocumentsWriterPerThread#updateDocuments() {noformat} > This is kinda crazy – comparing if the term to be inserted into the inverted > index hash equals the term already added to {{BytesRefHash}} is the hottest > method during nightly benchmarks. > Discussing offline with [~rcmuir] and [~jpountz] they noticed a few > questionable things about our current implementation: > * Why are we using a 1 or 2 byte {{vInt}} to encode the length of the > inserted term into the hash? Let's just use two bytes always, since IW > limits term length to 32 K (< 64K that an unsigned short can cover) > * Why are we doing byte swapping in this deep hotspot using {{VarHandles}} > (BitUtil.VH_BE_SHORT.get) > * Is it possible our growth strategy for {{BytesRefHash}} (on rehash) is not > aggressive enough? Or the initial sizing of the hash is too small? > * Maybe {{MurmurHash}} is not great (causing too many conflicts, and too > many {{equals}} calls as a result?) – {{Fnv}} and {{xxhash}} are possible > "upgrades"? > * If we stick with {{{}MurmurHash{}}}, why are we using the 32 bit version > ({{{}murmurhash3_x86_32{}}})? > * Are we using the JVM's intrinsics to compare multiple bytes in a single > SIMD instruction ([~rcmuir] is quite sure we are indeed)? > * [~jpountz] suggested maybe the hash insert is simply memory bound > * {{TermsHashPerField.writeByte}} is also depressingly slow (~5% of total > CPU cost) > I pulled these observations from a recent (5/6/22) profiler output: > [https://home.apache.org/~mikemccand/lucenebench/2022.05.06.06.33.00.html] > Maybe we can improve our performance on this crazy hotspot? > Or maybe this is a "healthy" hotspot and we should leave it be! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10572) Can we optimize BytesRefHash?
[ https://issues.apache.org/jira/browse/LUCENE-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537013#comment-17537013 ] Uwe Schindler edited comment on LUCENE-10572 at 5/14/22 10:27 AM: -- Mike, could you make a test on how much memory increaes by the PR and if there's a speed improvement at all? If memory due to the 2-byte length (this can sum up very fast if you know that most terms in your index are < 128 bytes) is increasing and there's no speed imporvement, let's throw away my PR. This would be the confirmation that equals is memory bound (for the reasons I told you before). Arrays.equals() is using intrinsic "ArraySupport#vectorizedMismatch()" (using SIMD) in JDK 17 -> this should answer your question: - Our code: [BytesRefHash.java#L178|https://github.com/apache/lucene/blob/c1b626c0636821f4d7c085895359489e7dfa330f/lucene/core/src/java/org/apache/lucene/util/BytesRefHash.java#L178] - Arrays#equals calling ArraySupport#mismatch: [Arrays.java#L2710-L2712|https://github.com/openjdk/jdk/blob/jdk-17-ga/src/java.base/share/classes/java/util/Arrays.java#L2710-L2712] - ArraySupport#mismatch calling the vectorizedMismatch method: [ArraysSupport.java#L275-L303|https://github.com/openjdk/jdk/blob/jdk-17-ga/src/java.base/share/classes/jdk/internal/util/ArraysSupport.java#L275-L303] - Here is the method called at end, which is {{@IntrinsicCandidate}}: [ArraysSupport.java#L111-L161|https://github.com/openjdk/jdk/blob/dfacda488bfbe2e11e8d607a6d08527710286982/src/java.base/share/classes/jdk/internal/util/ArraysSupport.java#L111-L161] was (Author: thetaphi): Mike, could you make a test on how much memory increaes by the PR and if there's a speed improvement at all? If memory due to the 2-byte length (this can sum up very fast if you know that most terms in your index are < 128 bytes) is increasing and there's no speed imporvement, let's throw away my PR. This would be the confirmation that equals is memory bound (for the reasons I told you before). Arrays.equals() is using intrinsic "ArraySupport#vectorizedMismatch()" (using SIMD) in JDK 17 -> this should answer your question: - Our code: [BytesRefHash.java#L178|https://github.com/apache/lucene/blob/c1b626c0636821f4d7c085895359489e7dfa330f/lucene/core/src/java/org/apache/lucene/util/BytesRefHash.java#L178] - Arrays#equals calling ArraySupport#mismatch: [Arrays.java#L2710-L2712|https://github.com/openjdk/jdk/blob/jdk-17-ga/src/java.base/share/classes/java/util/Arrays.java#L2710-L2712] - ArraySupport#mismatch calling the vectorizedMismatch method: [ArraysSupport.java#L275-L303|https://github.com/openjdk/jdk/blob/jdk-17-ga/src/java.base/share/classes/jdk/internal/util/ArraysSupport.java#L275-L303] - Here is the method called at end, which is {{@IntrinsicCandidate}}: [ArraysSupport.java#L111-L135|https://github.com/openjdk/jdk/blob/dfacda488bfbe2e11e8d607a6d08527710286982/src/java.base/share/classes/jdk/internal/util/ArraysSupport.java#L111-L135] > Can we optimize BytesRefHash? > - > > Key: LUCENE-10572 > URL: https://issues.apache.org/jira/browse/LUCENE-10572 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > I was poking around in our nightly benchmarks > ([https://home.apache.org/~mikemccand/lucenebench]) and noticed in the JFR > profiling that the hottest method is this: > {noformat} > PERCENT CPU SAMPLES STACK > 9.28% 53848 org.apache.lucene.util.BytesRefHash#equals() > at > org.apache.lucene.util.BytesRefHash#findHash() > at org.apache.lucene.util.BytesRefHash#add() > at > org.apache.lucene.index.TermsHashPerField#add() > at > org.apache.lucene.index.IndexingChain$PerField#invert() > at > org.apache.lucene.index.IndexingChain#processField() > at > org.apache.lucene.index.IndexingChain#processDocument() > at > org.apache.lucene.index.DocumentsWriterPerThread#updateDocuments() {noformat} > This is kinda crazy – comparing if the term to be inserted into the inverted > index hash equals the term already added to {{BytesRefHash}} is the hottest > method during nightly benchmarks. > Discussing offline with [~rcmuir] and [~jpountz] they noticed a few > questionable things about our current implementation: > * Why are we using a 1 or 2 byte {{vInt}} to encode the length of the > inserted term into the hash? Let's just use two bytes always, since IW > limits term length to 32 K (< 64K that an unsigned short can cover) > * Why are we doing byte swapping in this deep hotspot
[jira] [Comment Edited] (LUCENE-10572) Can we optimize BytesRefHash?
[ https://issues.apache.org/jira/browse/LUCENE-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537013#comment-17537013 ] Uwe Schindler edited comment on LUCENE-10572 at 5/14/22 10:28 AM: -- Mike, could you make a test on how much memory increaes by the PR and if there's a speed improvement at all? If memory due to the 2-byte length (this can sum up very fast if you know that most terms in your index are < 128 bytes) is increasing and there's no speed imporvement, let's throw away my PR. This would be the confirmation that equals is memory bound (for the reasons I told you before). Arrays.equals() is using intrinsic "ArraySupport#vectorizedMismatch()" (using SIMD) in JDK 17 when the array is long enough (around 9 bytes, if smaller it does not do any vectorization) -> this should answer your question: - Our code: [BytesRefHash.java#L178|https://github.com/apache/lucene/blob/c1b626c0636821f4d7c085895359489e7dfa330f/lucene/core/src/java/org/apache/lucene/util/BytesRefHash.java#L178] - Arrays#equals calling ArraySupport#mismatch: [Arrays.java#L2710-L2712|https://github.com/openjdk/jdk/blob/jdk-17-ga/src/java.base/share/classes/java/util/Arrays.java#L2710-L2712] - ArraySupport#mismatch calling the vectorizedMismatch method: [ArraysSupport.java#L275-L303|https://github.com/openjdk/jdk/blob/jdk-17-ga/src/java.base/share/classes/jdk/internal/util/ArraysSupport.java#L275-L303] - Here is the method called at end, which is {{@IntrinsicCandidate}}: [ArraysSupport.java#L111-L161|https://github.com/openjdk/jdk/blob/dfacda488bfbe2e11e8d607a6d08527710286982/src/java.base/share/classes/jdk/internal/util/ArraysSupport.java#L111-L161] was (Author: thetaphi): Mike, could you make a test on how much memory increaes by the PR and if there's a speed improvement at all? If memory due to the 2-byte length (this can sum up very fast if you know that most terms in your index are < 128 bytes) is increasing and there's no speed imporvement, let's throw away my PR. This would be the confirmation that equals is memory bound (for the reasons I told you before). Arrays.equals() is using intrinsic "ArraySupport#vectorizedMismatch()" (using SIMD) in JDK 17 -> this should answer your question: - Our code: [BytesRefHash.java#L178|https://github.com/apache/lucene/blob/c1b626c0636821f4d7c085895359489e7dfa330f/lucene/core/src/java/org/apache/lucene/util/BytesRefHash.java#L178] - Arrays#equals calling ArraySupport#mismatch: [Arrays.java#L2710-L2712|https://github.com/openjdk/jdk/blob/jdk-17-ga/src/java.base/share/classes/java/util/Arrays.java#L2710-L2712] - ArraySupport#mismatch calling the vectorizedMismatch method: [ArraysSupport.java#L275-L303|https://github.com/openjdk/jdk/blob/jdk-17-ga/src/java.base/share/classes/jdk/internal/util/ArraysSupport.java#L275-L303] - Here is the method called at end, which is {{@IntrinsicCandidate}}: [ArraysSupport.java#L111-L161|https://github.com/openjdk/jdk/blob/dfacda488bfbe2e11e8d607a6d08527710286982/src/java.base/share/classes/jdk/internal/util/ArraysSupport.java#L111-L161] > Can we optimize BytesRefHash? > - > > Key: LUCENE-10572 > URL: https://issues.apache.org/jira/browse/LUCENE-10572 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > I was poking around in our nightly benchmarks > ([https://home.apache.org/~mikemccand/lucenebench]) and noticed in the JFR > profiling that the hottest method is this: > {noformat} > PERCENT CPU SAMPLES STACK > 9.28% 53848 org.apache.lucene.util.BytesRefHash#equals() > at > org.apache.lucene.util.BytesRefHash#findHash() > at org.apache.lucene.util.BytesRefHash#add() > at > org.apache.lucene.index.TermsHashPerField#add() > at > org.apache.lucene.index.IndexingChain$PerField#invert() > at > org.apache.lucene.index.IndexingChain#processField() > at > org.apache.lucene.index.IndexingChain#processDocument() > at > org.apache.lucene.index.DocumentsWriterPerThread#updateDocuments() {noformat} > This is kinda crazy – comparing if the term to be inserted into the inverted > index hash equals the term already added to {{BytesRefHash}} is the hottest > method during nightly benchmarks. > Discussing offline with [~rcmuir] and [~jpountz] they noticed a few > questionable things about our current implementation: > * Why are we using a 1 or 2 byte {{vInt}} to encode the length of the > inserted term into the hash? Let's just use two bytes always, since IW > limits term length to 32 K (< 64K
[jira] [Comment Edited] (LUCENE-10572) Can we optimize BytesRefHash?
[ https://issues.apache.org/jira/browse/LUCENE-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537013#comment-17537013 ] Uwe Schindler edited comment on LUCENE-10572 at 5/14/22 10:29 AM: -- Mike, could you make a test on how much memory increaes by the PR and if there's a speed improvement at all? If memory due to the 2-byte length (this can sum up very fast if you know that most terms in your index are < 128 bytes) is increasing and there's no speed imporvement, let's throw away my PR. This would be the confirmation that equals is memory bound (for the reasons I told you before). Arrays.equals() is using intrinsic "ArraySupport#vectorizedMismatch()" (using SIMD) in JDK 17 when the array is long enough (>= 8 bytes, if smaller it does not do any vectorization) -> this should answer your question: - Our code: [BytesRefHash.java#L178|https://github.com/apache/lucene/blob/c1b626c0636821f4d7c085895359489e7dfa330f/lucene/core/src/java/org/apache/lucene/util/BytesRefHash.java#L178] - Arrays#equals calling ArraySupport#mismatch: [Arrays.java#L2710-L2712|https://github.com/openjdk/jdk/blob/jdk-17-ga/src/java.base/share/classes/java/util/Arrays.java#L2710-L2712] - ArraySupport#mismatch calling the vectorizedMismatch method: [ArraysSupport.java#L275-L303|https://github.com/openjdk/jdk/blob/jdk-17-ga/src/java.base/share/classes/jdk/internal/util/ArraysSupport.java#L275-L303] - Here is the method called at end, which is {{@IntrinsicCandidate}}: [ArraysSupport.java#L111-L161|https://github.com/openjdk/jdk/blob/dfacda488bfbe2e11e8d607a6d08527710286982/src/java.base/share/classes/jdk/internal/util/ArraysSupport.java#L111-L161] was (Author: thetaphi): Mike, could you make a test on how much memory increaes by the PR and if there's a speed improvement at all? If memory due to the 2-byte length (this can sum up very fast if you know that most terms in your index are < 128 bytes) is increasing and there's no speed imporvement, let's throw away my PR. This would be the confirmation that equals is memory bound (for the reasons I told you before). Arrays.equals() is using intrinsic "ArraySupport#vectorizedMismatch()" (using SIMD) in JDK 17 when the array is long enough (around 9 bytes, if smaller it does not do any vectorization) -> this should answer your question: - Our code: [BytesRefHash.java#L178|https://github.com/apache/lucene/blob/c1b626c0636821f4d7c085895359489e7dfa330f/lucene/core/src/java/org/apache/lucene/util/BytesRefHash.java#L178] - Arrays#equals calling ArraySupport#mismatch: [Arrays.java#L2710-L2712|https://github.com/openjdk/jdk/blob/jdk-17-ga/src/java.base/share/classes/java/util/Arrays.java#L2710-L2712] - ArraySupport#mismatch calling the vectorizedMismatch method: [ArraysSupport.java#L275-L303|https://github.com/openjdk/jdk/blob/jdk-17-ga/src/java.base/share/classes/jdk/internal/util/ArraysSupport.java#L275-L303] - Here is the method called at end, which is {{@IntrinsicCandidate}}: [ArraysSupport.java#L111-L161|https://github.com/openjdk/jdk/blob/dfacda488bfbe2e11e8d607a6d08527710286982/src/java.base/share/classes/jdk/internal/util/ArraysSupport.java#L111-L161] > Can we optimize BytesRefHash? > - > > Key: LUCENE-10572 > URL: https://issues.apache.org/jira/browse/LUCENE-10572 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > I was poking around in our nightly benchmarks > ([https://home.apache.org/~mikemccand/lucenebench]) and noticed in the JFR > profiling that the hottest method is this: > {noformat} > PERCENT CPU SAMPLES STACK > 9.28% 53848 org.apache.lucene.util.BytesRefHash#equals() > at > org.apache.lucene.util.BytesRefHash#findHash() > at org.apache.lucene.util.BytesRefHash#add() > at > org.apache.lucene.index.TermsHashPerField#add() > at > org.apache.lucene.index.IndexingChain$PerField#invert() > at > org.apache.lucene.index.IndexingChain#processField() > at > org.apache.lucene.index.IndexingChain#processDocument() > at > org.apache.lucene.index.DocumentsWriterPerThread#updateDocuments() {noformat} > This is kinda crazy – comparing if the term to be inserted into the inverted > index hash equals the term already added to {{BytesRefHash}} is the hottest > method during nightly benchmarks. > Discussing offline with [~rcmuir] and [~jpountz] they noticed a few > questionable things about our current implementation: > * Why are we using a 1 or 2 byte {{vInt}} to encode the length of the > inserted term into t
[jira] [Commented] (LUCENE-10572) Can we optimize BytesRefHash?
[ https://issues.apache.org/jira/browse/LUCENE-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537018#comment-17537018 ] Robert Muir commented on LUCENE-10572: -- {quote} Most terms are shorter than 128 bytes in normal indexes. {quote} Actually most analyzers have default limit of 255. So there is nothing practically to prevent hitting the 2-byte vint case today. I think having to use 2 bytes is ridiculous, especially so is a term of 30,000 bytes. Ever tried to type a single word that is 128 chars long? Anyway, I just dont think we need the code complexity, byte-swapping, branching, etc for all this. Seems like the wrong tradeoff for indexwriter, where we could just use another byte and maybe be faster. Maybe we can figure out a benchmark to see if its the case. Thanks for making it 2-bytes instead. To me the only question is, if we should add these NATIVE constants to bitutil, or simply do everything LE. I honestly think the only BE stuff out there is old IBM iron (AIX and S390X). Seems risky to optimize for this, I don't think many people are running lucene on this... > Can we optimize BytesRefHash? > - > > Key: LUCENE-10572 > URL: https://issues.apache.org/jira/browse/LUCENE-10572 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > I was poking around in our nightly benchmarks > ([https://home.apache.org/~mikemccand/lucenebench]) and noticed in the JFR > profiling that the hottest method is this: > {noformat} > PERCENT CPU SAMPLES STACK > 9.28% 53848 org.apache.lucene.util.BytesRefHash#equals() > at > org.apache.lucene.util.BytesRefHash#findHash() > at org.apache.lucene.util.BytesRefHash#add() > at > org.apache.lucene.index.TermsHashPerField#add() > at > org.apache.lucene.index.IndexingChain$PerField#invert() > at > org.apache.lucene.index.IndexingChain#processField() > at > org.apache.lucene.index.IndexingChain#processDocument() > at > org.apache.lucene.index.DocumentsWriterPerThread#updateDocuments() {noformat} > This is kinda crazy – comparing if the term to be inserted into the inverted > index hash equals the term already added to {{BytesRefHash}} is the hottest > method during nightly benchmarks. > Discussing offline with [~rcmuir] and [~jpountz] they noticed a few > questionable things about our current implementation: > * Why are we using a 1 or 2 byte {{vInt}} to encode the length of the > inserted term into the hash? Let's just use two bytes always, since IW > limits term length to 32 K (< 64K that an unsigned short can cover) > * Why are we doing byte swapping in this deep hotspot using {{VarHandles}} > (BitUtil.VH_BE_SHORT.get) > * Is it possible our growth strategy for {{BytesRefHash}} (on rehash) is not > aggressive enough? Or the initial sizing of the hash is too small? > * Maybe {{MurmurHash}} is not great (causing too many conflicts, and too > many {{equals}} calls as a result?) – {{Fnv}} and {{xxhash}} are possible > "upgrades"? > * If we stick with {{{}MurmurHash{}}}, why are we using the 32 bit version > ({{{}murmurhash3_x86_32{}}})? > * Are we using the JVM's intrinsics to compare multiple bytes in a single > SIMD instruction ([~rcmuir] is quite sure we are indeed)? > * [~jpountz] suggested maybe the hash insert is simply memory bound > * {{TermsHashPerField.writeByte}} is also depressingly slow (~5% of total > CPU cost) > I pulled these observations from a recent (5/6/22) profiler output: > [https://home.apache.org/~mikemccand/lucenebench/2022.05.06.06.33.00.html] > Maybe we can improve our performance on this crazy hotspot? > Or maybe this is a "healthy" hotspot and we should leave it be! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10572) Can we optimize BytesRefHash?
[ https://issues.apache.org/jira/browse/LUCENE-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537023#comment-17537023 ] Michael McCandless commented on LUCENE-10572: - {quote}Mike, could you make a test on how much memory increaes by the PR and if there's a speed improvement at all? {quote} +1, I will try to benchmark the PR! Thank you for the fast iterations here! Exciting :) > Can we optimize BytesRefHash? > - > > Key: LUCENE-10572 > URL: https://issues.apache.org/jira/browse/LUCENE-10572 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > I was poking around in our nightly benchmarks > ([https://home.apache.org/~mikemccand/lucenebench]) and noticed in the JFR > profiling that the hottest method is this: > {noformat} > PERCENT CPU SAMPLES STACK > 9.28% 53848 org.apache.lucene.util.BytesRefHash#equals() > at > org.apache.lucene.util.BytesRefHash#findHash() > at org.apache.lucene.util.BytesRefHash#add() > at > org.apache.lucene.index.TermsHashPerField#add() > at > org.apache.lucene.index.IndexingChain$PerField#invert() > at > org.apache.lucene.index.IndexingChain#processField() > at > org.apache.lucene.index.IndexingChain#processDocument() > at > org.apache.lucene.index.DocumentsWriterPerThread#updateDocuments() {noformat} > This is kinda crazy – comparing if the term to be inserted into the inverted > index hash equals the term already added to {{BytesRefHash}} is the hottest > method during nightly benchmarks. > Discussing offline with [~rcmuir] and [~jpountz] they noticed a few > questionable things about our current implementation: > * Why are we using a 1 or 2 byte {{vInt}} to encode the length of the > inserted term into the hash? Let's just use two bytes always, since IW > limits term length to 32 K (< 64K that an unsigned short can cover) > * Why are we doing byte swapping in this deep hotspot using {{VarHandles}} > (BitUtil.VH_BE_SHORT.get) > * Is it possible our growth strategy for {{BytesRefHash}} (on rehash) is not > aggressive enough? Or the initial sizing of the hash is too small? > * Maybe {{MurmurHash}} is not great (causing too many conflicts, and too > many {{equals}} calls as a result?) – {{Fnv}} and {{xxhash}} are possible > "upgrades"? > * If we stick with {{{}MurmurHash{}}}, why are we using the 32 bit version > ({{{}murmurhash3_x86_32{}}})? > * Are we using the JVM's intrinsics to compare multiple bytes in a single > SIMD instruction ([~rcmuir] is quite sure we are indeed)? > * [~jpountz] suggested maybe the hash insert is simply memory bound > * {{TermsHashPerField.writeByte}} is also depressingly slow (~5% of total > CPU cost) > I pulled these observations from a recent (5/6/22) profiler output: > [https://home.apache.org/~mikemccand/lucenebench/2022.05.06.06.33.00.html] > Maybe we can improve our performance on this crazy hotspot? > Or maybe this is a "healthy" hotspot and we should leave it be! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler opened a new pull request, #890: Detect CI builds and enable errorprone by default for those CI builds
uschindler opened a new pull request, #890: URL: https://github.com/apache/lucene/pull/890 This detects CI builds (Github and Jenkins) using existence of some environment variables. There is a new ext property `project.ext.isCIBuild`. The ext property is used to enable errorprone on thse builds. You can still enable just errorprone on local builds with: `-Perrorprone.enabled=true` We can use the new project property to enable other expensive checks by default for CI builds. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #890: Detect CI builds and enable errorprone by default for those CI builds
uschindler commented on code in PR #890: URL: https://github.com/apache/lucene/pull/890#discussion_r872968994 ## gradle/globals.gradle: ## @@ -147,5 +147,10 @@ allprojects { } return taskList } + +// detect if we run in CI environment by looking at existence of env vars: +// "CI": Github (https://docs.github.com/en/actions/learn-github-actions/environment-variables) +// anything starting with "JENKINS_" or "HUDSON_": Jenkins/Hudson (https://jenkins.thetaphi.de/env-vars.html/) +isCIBuild = System.getenv().keySet().find { it ==~ /(?i)(JENKINS_?.*|HUDSON_?.*|CI)/ } != null Review Comment: The regex is a bit stupid. The ? Makes no sense. It should be a bit different... Will fix in the evening. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #890: Detect CI builds and enable errorprone by default for those CI builds
uschindler commented on PR #890: URL: https://github.com/apache/lucene/pull/890#issuecomment-1126790330 I renamed the project property to just "errorprone". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #890: Detect CI builds and enable errorprone by default for those CI builds
uschindler commented on PR #890: URL: https://github.com/apache/lucene/pull/890#issuecomment-1126793622 The owasp check currently fails because of vulns in Jetty. We should upgrade (that's a separate task). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler merged pull request #890: Detect CI builds and enable errorprone by default for those CI builds
uschindler merged PR #890: URL: https://github.com/apache/lucene/pull/890 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a diff in pull request #890: Detect CI builds and enable errorprone by default for those CI builds
mocobeta commented on code in PR #890: URL: https://github.com/apache/lucene/pull/890#discussion_r873099330 ## lucene/CHANGES.txt: ## @@ -211,7 +211,9 @@ Build * Upgrade forbiddenapis to version 3.3. (Uwe Schindler) -* LUCENE-10532: Remove LuceneTestCase.Slow annotation. ALl tests can be fast. (Robert Muir) +* Detect CI builds on Github or Jenkins and enable errorprone. (Uwe Schindler, Dawid Weiss) Review Comment: @uschindler Can you please add `GITHUB#890` as the prefix of this entry? https://github.com/apache/lucene/pull/854 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a diff in pull request #890: Detect CI builds and enable errorprone by default for those CI builds
mocobeta commented on code in PR #890: URL: https://github.com/apache/lucene/pull/890#discussion_r873099427 ## lucene/CHANGES.txt: ## @@ -211,7 +211,9 @@ Build * Upgrade forbiddenapis to version 3.3. (Uwe Schindler) -* LUCENE-10532: Remove LuceneTestCase.Slow annotation. ALl tests can be fast. (Robert Muir) +* Detect CI builds on Github or Jenkins and enable errorprone. (Uwe Schindler, Dawid Weiss) Review Comment: Also, I think "Upgrade forbiddenapis to version 3.3." can refer to https://github.com/apache/lucene/pull/768. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a diff in pull request #890: Detect CI builds and enable errorprone by default for those CI builds
mocobeta commented on code in PR #890: URL: https://github.com/apache/lucene/pull/890#discussion_r873102060 ## lucene/CHANGES.txt: ## @@ -211,7 +211,9 @@ Build * Upgrade forbiddenapis to version 3.3. (Uwe Schindler) -* LUCENE-10532: Remove LuceneTestCase.Slow annotation. ALl tests can be fast. (Robert Muir) +* Detect CI builds on Github or Jenkins and enable errorprone. (Uwe Schindler, Dawid Weiss) Review Comment: FYI, we have the guidance for adding CHANGES entries, but having more concrete examples that refer to Github PRs instead of Jira issues would be helpful to let others know that an extra Jira accounts/issues are no longer required when contributing. https://github.com/apache/lucene/blob/main/CONTRIBUTING.md#add-a-changes-entry -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang closed pull request #511: LUCENE-10281: Error condition used to judge whether hits are sparse in StringValueFacetCounts
LuXugang closed pull request #511: LUCENE-10281: Error condition used to judge whether hits are sparse in StringValueFacetCounts URL: https://github.com/apache/lucene/pull/511 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org