[jira] [Comment Edited] (LUCENE-10572) Can we optimize BytesRefHash?

Uwe Schindler (Jira) Mon, 16 May 2022 01:57:05 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537405#comment-17537405
 ]


Uwe Schindler edited comment on LUCENE-10572 at 5/16/22 8:56 AM:
-----------------------------------------------------------------

Hi Dawid,
thats a nice issue. When looking at the BytesRefHash/ByteBlockPool 
implementation I already thought: Why the hell do we need redundant the length 
at all? We have a lookup from the index to the offset inside the block pool 
anyways. Instead of storing the length, we can lookup "offset of next entry - 
offset of entry to be looked up". The only special case is the very last item, 
but we just need to keep a slot for the next entry anyways.

So I'd also look into improving this. Nevertheless, the main limiting factor of 
the BytesRefHash is the equals (although vectorized) because it always needs to 
be verified. This is costly for long terms (long comparison) and it is very cpu 
cache unfriendly.


was (Author: thetaphi):
Hi Dawid,
thats a nice issue. When looking at the BytesRefHash/ByteBlockPool 
implementation I already thought: Why the hell do we need redundant the length 
at all? We have a lookup from the index to the offset inside the block pool 
anyways. Instead of storing the length, we can lookup offset of next entry - 
offset of entry to be looked up. The only special case is the very last item.

So I'd also look into improving this. Nevertheless, the main limiting factor of 
the BytesRefHash is the equals (although vectorized) because it always needs to 
be verified. This is costly for long terms (long comparison) and it is very cpu 
cache unfriendly.

> Can we optimize BytesRefHash?
> -----------------------------
>
>                 Key: LUCENE-10572
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10572
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I was poking around in our nightly benchmarks 
> ([https://home.apache.org/~mikemccand/lucenebench]) and noticed in the JFR 
> profiling that the hottest method is this:
> {noformat}
> PERCENT       CPU SAMPLES   STACK
> 9.28%         53848         org.apache.lucene.util.BytesRefHash#equals()
>                               at 
> org.apache.lucene.util.BytesRefHash#findHash()
>                               at org.apache.lucene.util.BytesRefHash#add()
>                               at 
> org.apache.lucene.index.TermsHashPerField#add()
>                               at 
> org.apache.lucene.index.IndexingChain$PerField#invert()
>                               at 
> org.apache.lucene.index.IndexingChain#processField()
>                               at 
> org.apache.lucene.index.IndexingChain#processDocument()
>                               at 
> org.apache.lucene.index.DocumentsWriterPerThread#updateDocuments() {noformat}
> This is kinda crazy – comparing if the term to be inserted into the inverted 
> index hash equals the term already added to {{BytesRefHash}} is the hottest 
> method during nightly benchmarks.
> Discussing offline with [~rcmuir] and [~jpountz] they noticed a few 
> questionable things about our current implementation:
>  * Why are we using a 1 or 2 byte {{vInt}} to encode the length of the 
> inserted term into the hash?  Let's just use two bytes always, since IW 
> limits term length to 32 K (< 64K that an unsigned short can cover)
>  * Why are we doing byte swapping in this deep hotspot using {{VarHandles}} 
> (BitUtil.VH_BE_SHORT.get)
>  * Is it possible our growth strategy for {{BytesRefHash}} (on rehash) is not 
> aggressive enough?  Or the initial sizing of the hash is too small?
>  * Maybe {{MurmurHash}} is not great (causing too many conflicts, and too 
> many {{equals}} calls as a result?) – {{Fnv}} and {{xxhash}} are possible 
> "upgrades"?
>  * If we stick with {{{}MurmurHash{}}}, why are we using the 32 bit version 
> ({{{}murmurhash3_x86_32{}}})?
>  * Are we using the JVM's intrinsics to compare multiple bytes in a single 
> SIMD instruction ([~rcmuir] is quite sure we are indeed)?
>  * [~jpountz] suggested maybe the hash insert is simply memory bound
>  * {{TermsHashPerField.writeByte}} is also depressingly slow (~5% of total 
> CPU cost)
> I pulled these observations from a recent (5/6/22) profiler output: 
> [https://home.apache.org/~mikemccand/lucenebench/2022.05.06.06.33.00.html]
> Maybe we can improve our performance on this crazy hotspot?
> Or maybe this is a "healthy" hotspot and we should leave it be!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10572) Can we optimize BytesRefHash?

Reply via email to