epotyom opened a new pull request, #13559: URL: https://github.com/apache/lucene/pull/13559
In SparseFixedBitSet.firstDoc, instead of iterating though the entire indices array until non-zero value is found, keep track of max updated index. Use case where it improves performance: 1. `SparseFixedBitSet` is created with high enough length, e.g. max doc in a segment 2. `#nextSetBit` is called (in a loop) on a bit set that is still being built, i.e. some of the next bits are `#set`, but the rest of the bit set is still empty. 3. The moment there are no further set bits, `#nextSetBit` call to `#firstDoc` iterates through the rest of `indices` array. In my case, we use SparseFixedBitSet to track and iterate children hits found in `ToParentBlockJoinQuery`. Iterating through empty `indices` elements becomes expensive when we do it for each parent docID. Lucene util performance test results might not be great though - so maybe there is better way to achieve similar effect? ``` python3 src/python/localrun.py -source wikimediumall ... TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value BrowseDateTaxoFacets 1.65 (9.1%) 1.59 (0.5%) -3.6% ( -12% - 6%) 0.081 BrowseDayOfYearTaxoFacets 1.67 (9.2%) 1.61 (0.6%) -3.5% ( -12% - 6%) 0.088 MedTermDayTaxoFacets 9.40 (6.2%) 9.19 (5.0%) -2.2% ( -12% - 9%) 0.206 BrowseRandomLabelTaxoFacets 1.29 (4.6%) 1.27 (1.0%) -1.9% ( -7% - 3%) 0.070 Prefix3 543.93 (5.7%) 535.20 (4.6%) -1.6% ( -11% - 9%) 0.326 AndHighLow 780.30 (3.9%) 771.77 (4.1%) -1.1% ( -8% - 7%) 0.383 AndHighMed 199.79 (2.3%) 197.77 (3.0%) -1.0% ( -6% - 4%) 0.233 MedSloppyPhrase 61.79 (4.1%) 61.24 (4.1%) -0.9% ( -8% - 7%) 0.488 AndHighHigh 84.66 (6.6%) 83.92 (7.6%) -0.9% ( -14% - 14%) 0.699 PKLookup 143.72 (1.9%) 142.48 (2.1%) -0.9% ( -4% - 3%) 0.171 Fuzzy1 56.85 (1.5%) 56.36 (2.0%) -0.8% ( -4% - 2%) 0.122 BrowseDateSSDVFacets 0.43 (16.7%) 0.43 (16.1%) -0.8% ( -28% - 38%) 0.873 Wildcard 159.45 (2.7%) 158.30 (4.0%) -0.7% ( -7% - 6%) 0.505 Fuzzy2 56.79 (1.2%) 56.38 (1.8%) -0.7% ( -3% - 2%) 0.139 HighPhrase 20.07 (4.4%) 19.94 (5.8%) -0.6% ( -10% - 9%) 0.701 MedSpanNear 15.66 (1.8%) 15.60 (2.2%) -0.4% ( -4% - 3%) 0.537 OrNotHighMed 211.86 (3.1%) 211.03 (2.7%) -0.4% ( -5% - 5%) 0.670 HighTermTitleBDVSort 16.31 (2.8%) 16.25 (2.7%) -0.4% ( -5% - 5%) 0.661 MedPhrase 154.39 (2.7%) 154.01 (3.3%) -0.2% ( -6% - 5%) 0.800 OrHighMed 184.54 (2.5%) 184.21 (2.0%) -0.2% ( -4% - 4%) 0.797 LowPhrase 72.18 (3.6%) 72.06 (4.3%) -0.2% ( -7% - 8%) 0.893 OrHighNotHigh 229.39 (4.4%) 229.05 (4.5%) -0.1% ( -8% - 9%) 0.915 LowSloppyPhrase 98.92 (1.5%) 98.84 (2.1%) -0.1% ( -3% - 3%) 0.897 LowSpanNear 53.22 (1.0%) 53.21 (0.8%) -0.0% ( -1% - 1%) 0.932 Respell 34.18 (1.9%) 34.18 (2.4%) -0.0% ( -4% - 4%) 0.986 HighSpanNear 5.05 (3.1%) 5.06 (3.0%) 0.1% ( -5% - 6%) 0.929 AndHighMedDayTaxoFacets 16.78 (1.6%) 16.79 (1.7%) 0.1% ( -3% - 3%) 0.850 OrHighLow 381.10 (3.4%) 381.52 (3.0%) 0.1% ( -6% - 6%) 0.914 HighSloppyPhrase 12.20 (3.4%) 12.22 (4.1%) 0.1% ( -7% - 7%) 0.902 HighTermMonthSort 1059.29 (4.7%) 1061.27 (4.8%) 0.2% ( -8% - 10%) 0.901 AndHighHighDayTaxoFacets 13.53 (1.6%) 13.56 (1.7%) 0.2% ( -3% - 3%) 0.703 OrNotHighLow 664.93 (3.1%) 666.40 (3.6%) 0.2% ( -6% - 7%) 0.835 MedTerm 330.39 (8.8%) 331.13 (6.3%) 0.2% ( -13% - 16%) 0.927 OrHighNotLow 305.23 (5.3%) 306.22 (5.1%) 0.3% ( -9% - 11%) 0.844 BrowseRandomLabelSSDVFacets 1.64 (4.1%) 1.65 (4.5%) 0.4% ( -7% - 9%) 0.754 OrNotHighHigh 170.52 (6.8%) 171.39 (6.2%) 0.5% ( -11% - 14%) 0.804 OrHighMedDayTaxoFacets 3.03 (4.3%) 3.04 (3.1%) 0.6% ( -6% - 8%) 0.632 LowTerm 367.63 (5.1%) 370.21 (4.0%) 0.7% ( -7% - 10%) 0.629 TermDTSort 40.64 (4.8%) 41.01 (2.9%) 0.9% ( -6% - 9%) 0.467 HighTerm 321.26 (8.6%) 324.29 (6.8%) 0.9% ( -13% - 17%) 0.702 OrHighHigh 103.68 (8.6%) 104.72 (7.8%) 1.0% ( -14% - 19%) 0.699 LowIntervalsOrdered 89.24 (4.8%) 90.40 (5.1%) 1.3% ( -8% - 11%) 0.409 MedIntervalsOrdered 14.36 (5.8%) 14.59 (5.7%) 1.6% ( -9% - 13%) 0.386 HighIntervalsOrdered 20.94 (4.8%) 21.28 (5.0%) 1.6% ( -7% - 11%) 0.299 HighTermTitleSort 65.54 (2.8%) 66.62 (3.0%) 1.6% ( -4% - 7%) 0.073 OrHighNotMed 326.83 (3.5%) 332.80 (4.0%) 1.8% ( -5% - 9%) 0.127 HighTermDayOfYearSort 313.09 (3.0%) 318.88 (4.2%) 1.8% ( -5% - 9%) 0.108 BrowseDayOfYearSSDVFacets 2.58 (6.4%) 2.63 (6.3%) 1.9% ( -10% - 15%) 0.337 BrowseMonthSSDVFacets 2.64 (6.8%) 2.71 (6.5%) 2.3% ( -10% - 16%) 0.268 BrowseMonthTaxoFacets 1.81 (10.6%) 1.88 (10.3%) 3.6% ( -15% - 27%) 0.278 IntNRQ 134.47 (20.5%) 139.69 (16.0%) 3.9% ( -27% - 50%) 0.505 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org