epotyom opened a new pull request, #13559:
URL: https://github.com/apache/lucene/pull/13559
In SparseFixedBitSet.firstDoc, instead of iterating though the entire
indices array until non-zero value is found, keep track of max updated index.
Use case where it improves performance:
1. `SparseFixedBitSet` is created with high enough length, e.g. max doc in a
segment
2. `#nextSetBit` is called (in a loop) on a bit set that is still being
built, i.e. some of the next bits are `#set`, but the rest of the bit set is
still empty.
3. The moment there are no further set bits, `#nextSetBit` call to
`#firstDoc` iterates through the rest of `indices` array.
In my case, we use SparseFixedBitSet to track and iterate children hits
found in `ToParentBlockJoinQuery`. Iterating through empty `indices` elements
becomes expensive when we do it for each parent docID.
Lucene util performance test results might not be great though - so maybe
there is better way to achieve similar effect?
```
python3 src/python/localrun.py -source wikimediumall
...
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
BrowseDateTaxoFacets 1.65 (9.1%) 1.59
(0.5%) -3.6% ( -12% - 6%) 0.081
BrowseDayOfYearTaxoFacets 1.67 (9.2%) 1.61
(0.6%) -3.5% ( -12% - 6%) 0.088
MedTermDayTaxoFacets 9.40 (6.2%) 9.19
(5.0%) -2.2% ( -12% - 9%) 0.206
BrowseRandomLabelTaxoFacets 1.29 (4.6%) 1.27
(1.0%) -1.9% ( -7% - 3%) 0.070
Prefix3 543.93 (5.7%) 535.20
(4.6%) -1.6% ( -11% - 9%) 0.326
AndHighLow 780.30 (3.9%) 771.77
(4.1%) -1.1% ( -8% - 7%) 0.383
AndHighMed 199.79 (2.3%) 197.77
(3.0%) -1.0% ( -6% - 4%) 0.233
MedSloppyPhrase 61.79 (4.1%) 61.24
(4.1%) -0.9% ( -8% - 7%) 0.488
AndHighHigh 84.66 (6.6%) 83.92
(7.6%) -0.9% ( -14% - 14%) 0.699
PKLookup 143.72 (1.9%) 142.48
(2.1%) -0.9% ( -4% - 3%) 0.171
Fuzzy1 56.85 (1.5%) 56.36
(2.0%) -0.8% ( -4% - 2%) 0.122
BrowseDateSSDVFacets 0.43 (16.7%) 0.43
(16.1%) -0.8% ( -28% - 38%) 0.873
Wildcard 159.45 (2.7%) 158.30
(4.0%) -0.7% ( -7% - 6%) 0.505
Fuzzy2 56.79 (1.2%) 56.38
(1.8%) -0.7% ( -3% - 2%) 0.139
HighPhrase 20.07 (4.4%) 19.94
(5.8%) -0.6% ( -10% - 9%) 0.701
MedSpanNear 15.66 (1.8%) 15.60
(2.2%) -0.4% ( -4% - 3%) 0.537
OrNotHighMed 211.86 (3.1%) 211.03
(2.7%) -0.4% ( -5% - 5%) 0.670
HighTermTitleBDVSort 16.31 (2.8%) 16.25
(2.7%) -0.4% ( -5% - 5%) 0.661
MedPhrase 154.39 (2.7%) 154.01
(3.3%) -0.2% ( -6% - 5%) 0.800
OrHighMed 184.54 (2.5%) 184.21
(2.0%) -0.2% ( -4% - 4%) 0.797
LowPhrase 72.18 (3.6%) 72.06
(4.3%) -0.2% ( -7% - 8%) 0.893
OrHighNotHigh 229.39 (4.4%) 229.05
(4.5%) -0.1% ( -8% - 9%) 0.915
LowSloppyPhrase 98.92 (1.5%) 98.84
(2.1%) -0.1% ( -3% - 3%) 0.897
LowSpanNear 53.22 (1.0%) 53.21
(0.8%) -0.0% ( -1% - 1%) 0.932
Respell 34.18 (1.9%) 34.18
(2.4%) -0.0% ( -4% - 4%) 0.986
HighSpanNear 5.05 (3.1%) 5.06
(3.0%) 0.1% ( -5% - 6%) 0.929
AndHighMedDayTaxoFacets 16.78 (1.6%) 16.79
(1.7%) 0.1% ( -3% - 3%) 0.850
OrHighLow 381.10 (3.4%) 381.52
(3.0%) 0.1% ( -6% - 6%) 0.914
HighSloppyPhrase 12.20 (3.4%) 12.22
(4.1%) 0.1% ( -7% - 7%) 0.902
HighTermMonthSort 1059.29 (4.7%) 1061.27
(4.8%) 0.2% ( -8% - 10%) 0.901
AndHighHighDayTaxoFacets 13.53 (1.6%) 13.56
(1.7%) 0.2% ( -3% - 3%) 0.703
OrNotHighLow 664.93 (3.1%) 666.40
(3.6%) 0.2% ( -6% - 7%) 0.835
MedTerm 330.39 (8.8%) 331.13
(6.3%) 0.2% ( -13% - 16%) 0.927
OrHighNotLow 305.23 (5.3%) 306.22
(5.1%) 0.3% ( -9% - 11%) 0.844
BrowseRandomLabelSSDVFacets 1.64 (4.1%) 1.65
(4.5%) 0.4% ( -7% - 9%) 0.754
OrNotHighHigh 170.52 (6.8%) 171.39
(6.2%) 0.5% ( -11% - 14%) 0.804
OrHighMedDayTaxoFacets 3.03 (4.3%) 3.04
(3.1%) 0.6% ( -6% - 8%) 0.632
LowTerm 367.63 (5.1%) 370.21
(4.0%) 0.7% ( -7% - 10%) 0.629
TermDTSort 40.64 (4.8%) 41.01
(2.9%) 0.9% ( -6% - 9%) 0.467
HighTerm 321.26 (8.6%) 324.29
(6.8%) 0.9% ( -13% - 17%) 0.702
OrHighHigh 103.68 (8.6%) 104.72
(7.8%) 1.0% ( -14% - 19%) 0.699
LowIntervalsOrdered 89.24 (4.8%) 90.40
(5.1%) 1.3% ( -8% - 11%) 0.409
MedIntervalsOrdered 14.36 (5.8%) 14.59
(5.7%) 1.6% ( -9% - 13%) 0.386
HighIntervalsOrdered 20.94 (4.8%) 21.28
(5.0%) 1.6% ( -7% - 11%) 0.299
HighTermTitleSort 65.54 (2.8%) 66.62
(3.0%) 1.6% ( -4% - 7%) 0.073
OrHighNotMed 326.83 (3.5%) 332.80
(4.0%) 1.8% ( -5% - 9%) 0.127
HighTermDayOfYearSort 313.09 (3.0%) 318.88
(4.2%) 1.8% ( -5% - 9%) 0.108
BrowseDayOfYearSSDVFacets 2.58 (6.4%) 2.63
(6.3%) 1.9% ( -10% - 15%) 0.337
BrowseMonthSSDVFacets 2.64 (6.8%) 2.71
(6.5%) 2.3% ( -10% - 16%) 0.268
BrowseMonthTaxoFacets 1.81 (10.6%) 1.88
(10.3%) 3.6% ( -15% - 27%) 0.278
IntNRQ 134.47 (20.5%) 139.69
(16.0%) 3.9% ( -27% - 50%) 0.505
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]