[jira] [Updated] (LUCENE-9932) Performance improvement for BKD index building

neoremind (Jira) Tue, 20 Apr 2021 03:47:05 -0700


     [ 
https://issues.apache.org/jira/browse/LUCENE-9932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


neoremind updated LUCENE-9932:
------------------------------
    Description: 
In BKD index building, the input bytes must be sorted before calling BKD writer 
related API. The sorting method leverages MSB Radix Sort algorithm, and the 
comparing method takes both the bytes itself and the DocId, but in real cases, 
DocIds are usually monotonically increasing. This could yield one possible 
performance enhancer. I found this enhancement when I dig into one performance 
issue in our system. Then I research on the possible solution.

DocId is usually increased by one when building index in a thread-safe way, by 
assuming such condition, the comparing method can eliminate the unnecessary 
comparing input - DocId, only leave the bytes itself to compare. In order to do 
so, MSB radix sorting and its fallback sorting method must be *stable*, so that 
when elements are the same, the sorting method maintains its original order 
when added, which makes DocId still monotonically increasing. To make MSB Radix 
Sort stable, it needs a trivial update; to make fallback sort table, use merge 
sort instead of quick sort. Meanwhile, there should introduce a switch which is 
able to turn the stable option on or off.

To validate how much performance could be gained. I make a benchmark taking 
down only the time elapsed in _MutablePointsReaderUtils.sort_ stage.

*Test environment:* 
 MacBook Pro (Retina, 15-inch, Mid 2015), 2.2 GHz Intel Core i7, 16 GB 1600 MHz 
DDR3

*Java version:*
 java version "1.8.0_161"
 Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
 Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

*Testcase:*
 bytesPerDim = [1, 2, 3, 4, 8, 16, 32]
 dim = 1
 doc num = 2,000,000
 warm up 5 time, run 10 times to calculate average time used.

*Result:*

 
||bytesPerDim\scenario||disable sort doc id (PR branch)||enable sort doc id 
(master branch)||
|1|30989.594 us|1151149.9 us|
|2|313469.47 us|1115595.1 us|
|3|844617.8 us|1465465.1 us|
|4|1350946.8 us|1465465.1 us|
|8|1344814.6 us|1458115.5 us|
|16|1344516.6 us|1459849.6 us|
|32|1386847.8 us|1583097.5 us|

!benchmark_data.png|width=580,height=283!

Result shows that, by disabling sort DocId, sorting runs 1.73x to 37x faster 
when there are many duplicate bytes (bytesPerDim = 1 or 2 or 3). When data 
cardinality is high (bytesPerDim >= 4, test cases will generate random bytes 
which are more scatter, not likely to be duplicate), the performance does not 
go backward, still a little better.

In conclusion, in the end to end process for building BKD index, which relies 
on BKDWriter for some data types, performance could be better by ignoring DocId 
if they are already monotonically increasing.

  was:
In BKD index building, the input bytes must be sorted before calling BKD writer 
related API. The sorting method leverages MSB Radix Sort algorithm, and the 
comparing method takes both the bytes itself and the DocId, but in real cases, 
DocIds are usually monotonically increasing. This could yield one possible 
performance enhancer. I found this enhancement when I dig into one performance 
issue in our system. Then I research on the possible solution.

DocId is usually increased by one when building index in a thread-safe way, by 
assuming such condition, the comparing method can eliminate the unnecessary 
comparing input - DocId, only leave the bytes itself to compare. In order to do 
so, MSB radix sorting and its fallback sorting method must be *stable*, so that 
when elements are the same, the sorting method maintains its original order 
when added, which makes DocId still monotonically increasing. To make MSB Radix 
Sort stable, it needs a trivial update; to make fallback sort table, use merge 
sort instead of quick sort. Meanwhile, there should introduce a switch which is 
able to turn the stable option on or off.

To validate how much performance could be gained. I make a benchmark taking 
down only the time elapsed in _MutablePointsReaderUtils.sort_ stage.

*Test environment:* 
 MacBook Pro (Retina, 15-inch, Mid 2015), 2.2 GHz Intel Core i7, 16 GB 1600 MHz 
DDR3

*Java version:*
 java version "1.8.0_161"
 Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
 Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

*Testcase:*
 bytesPerDim = [1, 2, 3, 4, 8, 16, 32]
 dim = 1
 doc num = 2,000,000
 warm up 5 time, run 10 times to calculate average time used.

*Result:*

 
||bytesPerDim\scenario||disable sort doc id (PR branch)||enable sort doc id 
(master branch)||
|1|30989.594 us|1151149.9 us|
|2|313469.47 us|1115595.1 us|
|3|844617.8 us|1465465.1 us|
|4|1350946.8 us|1465465.1 us|
|8|1344814.6 us|1458115.5 us|
|16|1344516.6 us|1459849.6 us|
|32|1386847.8 us|1583097.5 us|

!benchmark_data.png|width=580,height=283!

 When there are many duplicate bytes (bytesPerDim = 1 or 2 or 3) which means 
data cardinality is low. By disable sorting DocId, it runs 1.73x to 37x faster. 
 When data cardinality is high, the performance does not go backward, also runs 
after.

PR is submitted, I only study how to make BKDWriter runs fast, to make it work 
for Lucene index building, I am not sure where to integrate this change into. 
If any committers have time, could you help me review if the issue makes sense? 
Thanks very much.


> Performance improvement for BKD index building
> ----------------------------------------------
>
>                 Key: LUCENE-9932
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9932
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 8.8.2
>            Reporter: neoremind
>            Priority: Critical
>         Attachments: benchmark_data.png
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In BKD index building, the input bytes must be sorted before calling BKD 
> writer related API. The sorting method leverages MSB Radix Sort algorithm, 
> and the comparing method takes both the bytes itself and the DocId, but in 
> real cases, DocIds are usually monotonically increasing. This could yield one 
> possible performance enhancer. I found this enhancement when I dig into one 
> performance issue in our system. Then I research on the possible solution.
> DocId is usually increased by one when building index in a thread-safe way, 
> by assuming such condition, the comparing method can eliminate the 
> unnecessary comparing input - DocId, only leave the bytes itself to compare. 
> In order to do so, MSB radix sorting and its fallback sorting method must be 
> *stable*, so that when elements are the same, the sorting method maintains 
> its original order when added, which makes DocId still monotonically 
> increasing. To make MSB Radix Sort stable, it needs a trivial update; to make 
> fallback sort table, use merge sort instead of quick sort. Meanwhile, there 
> should introduce a switch which is able to turn the stable option on or off.
> To validate how much performance could be gained. I make a benchmark taking 
> down only the time elapsed in _MutablePointsReaderUtils.sort_ stage.
> *Test environment:* 
>  MacBook Pro (Retina, 15-inch, Mid 2015), 2.2 GHz Intel Core i7, 16 GB 1600 
> MHz DDR3
> *Java version:*
>  java version "1.8.0_161"
>  Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
>  Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
> *Testcase:*
>  bytesPerDim = [1, 2, 3, 4, 8, 16, 32]
>  dim = 1
>  doc num = 2,000,000
>  warm up 5 time, run 10 times to calculate average time used.
> *Result:*
>  
> ||bytesPerDim\scenario||disable sort doc id (PR branch)||enable sort doc id 
> (master branch)||
> |1|30989.594 us|1151149.9 us|
> |2|313469.47 us|1115595.1 us|
> |3|844617.8 us|1465465.1 us|
> |4|1350946.8 us|1465465.1 us|
> |8|1344814.6 us|1458115.5 us|
> |16|1344516.6 us|1459849.6 us|
> |32|1386847.8 us|1583097.5 us|
> !benchmark_data.png|width=580,height=283!
> Result shows that, by disabling sort DocId, sorting runs 1.73x to 37x faster 
> when there are many duplicate bytes (bytesPerDim = 1 or 2 or 3). When data 
> cardinality is high (bytesPerDim >= 4, test cases will generate random bytes 
> which are more scatter, not likely to be duplicate), the performance does not 
> go backward, still a little better.
> In conclusion, in the end to end process for building BKD index, which relies 
> on BKDWriter for some data types, performance could be better by ignoring 
> DocId if they are already monotonically increasing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9932) Performance improvement for BKD index building

Reply via email to