original-brownbear commented on PR #13472: URL: https://github.com/apache/lucene/pull/13472#issuecomment-2170618011
Lucene util benchmark results for this by running with one less thread for this branch vs main (credit to @jpountz and @javanna for the idea) to get an idea of the impact: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value Fuzzy1 105.06 (3.1%) 103.22 (3.6%) -1.7% ( -8% - 5%) 0.103 BrowseDayOfYearTaxoFacets 14.80 (1.0%) 14.55 (4.5%) -1.7% ( -7% - 3%) 0.096 OrHighMedDayTaxoFacets 6.60 (3.3%) 6.49 (2.1%) -1.6% ( -6% - 3%) 0.062 Respell 52.96 (2.2%) 52.56 (1.9%) -0.8% ( -4% - 3%) 0.243 BrowseDateTaxoFacets 14.91 (1.2%) 14.86 (3.9%) -0.4% ( -5% - 4%) 0.695 BrowseRandomLabelSSDVFacets 3.73 (0.5%) 3.73 (0.5%) 0.1% ( 0% - 1%) 0.714 BrowseMonthSSDVFacets 5.58 (2.0%) 5.59 (2.0%) 0.2% ( -3% - 4%) 0.763 BrowseDayOfYearSSDVFacets 7.61 (0.6%) 7.62 (0.6%) 0.2% ( 0% - 1%) 0.276 MedTermDayTaxoFacets 25.46 (0.7%) 25.52 (0.9%) 0.3% ( -1% - 1%) 0.328 AndHighHighDayTaxoFacets 15.24 (0.7%) 15.28 (0.5%) 0.3% ( -1% - 1%) 0.183 AndHighMedDayTaxoFacets 17.92 (0.7%) 17.99 (0.5%) 0.4% ( 0% - 1%) 0.023 BrowseRandomLabelTaxoFacets 11.95 (1.7%) 12.00 (1.2%) 0.4% ( -2% - 3%) 0.331 BrowseMonthTaxoFacets 12.37 (3.0%) 12.46 (1.7%) 0.7% ( -3% - 5%) 0.358 HighTermMonthSort 306.96 (16.4%) 309.25 (14.6%) 0.7% ( -26% - 38%) 0.879 BrowseDateSSDVFacets 1.45 (1.0%) 1.48 (2.4%) 1.7% ( -1% - 5%) 0.004 Prefix3 223.49 (31.2%) 228.83 (13.7%) 2.4% ( -32% - 68%) 0.754 Fuzzy2 55.36 (20.9%) 58.92 (14.4%) 6.4% ( -23% - 52%) 0.256 PKLookup 176.48 (18.1%) 194.13 (13.2%) 10.0% ( -17% - 50%) 0.045 OrNotHighLow 472.02 (2.4%) 567.48 (26.2%) 20.2% ( -8% - 50%) 0.001 HighSloppyPhrase 3.06 (3.6%) 3.69 (7.1%) 20.4% ( 9% - 32%) 0.000 AndHighLow 784.51 (24.4%) 959.85 (12.6%) 22.4% ( -11% - 78%) 0.000 Wildcard 124.97 (1.4%) 154.50 (2.5%) 23.6% ( 19% - 27%) 0.000 IntNRQ 70.70 (1.2%) 87.67 (4.0%) 24.0% ( 18% - 29%) 0.000 HighPhrase 94.06 (2.9%) 118.04 (5.3%) 25.5% ( 16% - 34%) 0.000 AndHighHigh 53.83 (1.5%) 67.85 (2.0%) 26.1% ( 22% - 30%) 0.000 LowSloppyPhrase 60.97 (2.4%) 77.49 (5.6%) 27.1% ( 18% - 35%) 0.000 LowPhrase 20.56 (1.2%) 26.27 (2.9%) 27.7% ( 23% - 32%) 0.000 MedPhrase 29.76 (1.7%) 39.75 (5.1%) 33.6% ( 26% - 40%) 0.000 LowIntervalsOrdered 15.55 (2.5%) 20.83 (4.1%) 33.9% ( 26% - 41%) 0.000 AndHighMed 99.55 (2.7%) 135.12 (2.1%) 35.7% ( 30% - 41%) 0.000 LowSpanNear 3.16 (1.8%) 4.30 (1.6%) 36.3% ( 32% - 40%) 0.000 OrHighMed 117.00 (3.8%) 164.78 (4.2%) 40.8% ( 31% - 50%) 0.000 OrHighNotHigh 89.87 (6.3%) 128.16 (36.4%) 42.6% ( 0% - 91%) 0.000 OrHighHigh 38.70 (1.8%) 55.41 (8.0%) 43.2% ( 32% - 53%) 0.000 MedSloppyPhrase 7.29 (3.5%) 10.68 (4.6%) 46.5% ( 37% - 56%) 0.000 HighSpanNear 2.54 (2.1%) 3.77 (3.2%) 48.6% ( 42% - 55%) 0.000 MedTerm 216.76 (15.6%) 324.89 (29.6%) 49.9% ( 4% - 112%) 0.000 HighTermTitleSort 13.92 (9.3%) 23.43 (8.9%) 68.3% ( 45% - 95%) 0.000 TermDTSort 68.68 (3.3%) 117.77 (12.2%) 71.5% ( 54% - 90%) 0.000 HighTerm 220.46 (5.7%) 396.67 (14.8%) 79.9% ( 56% - 106%) 0.000 OrHighLow 218.43 (26.1%) 400.99 (82.8%) 83.6% ( -20% - 260%) 0.000 HighTermTitleBDVSort 4.45 (2.1%) 8.32 (2.1%) 86.8% ( 80% - 92%) 0.000 MedSpanNear 22.62 (2.7%) 42.88 (5.8%) 89.6% ( 78% - 100%) 0.000 OrHighNotLow 329.64 (22.4%) 672.19 (30.0%) 103.9% ( 42% - 201%) 0.000 HighTermDayOfYearSort 57.50 (3.8%) 125.18 (9.8%) 117.7% ( 100% - 136%) 0.000 MedIntervalsOrdered 10.22 (4.1%) 22.48 (9.4%) 119.9% ( 102% - 139%) 0.000 HighIntervalsOrdered 2.41 (6.1%) 5.39 (10.2%) 123.3% ( 100% - 148%) 0.000 LowTerm 251.06 (10.8%) 634.45 (7.9%) 152.7% ( 120% - 192%) 0.000 OrNotHighMed 74.81 (5.4%) 221.54 (14.8%) 196.1% ( 166% - 228%) 0.000 OrNotHighHigh 95.65 (7.1%) 314.65 (21.1%) 228.9% ( 187% - 276%) 0.000 OrHighNotMed 59.11 (6.5%) 206.56 (15.0%) 249.4% ( 214% - 289%) 0.000 ``` This is wikimediumall, 3 threads for main and 2 threads for this branch. Effectively no regressions but some considerable speedups. The reason for this is the obvious reduction in context switching. We go from perf output for `main`: ``` Performance counter stats for process id '157418': 574,008,686,445 cycles 1,130,739,465,717 instructions # 1.97 insn per cycle 2,599,704,747 cache-misses 429,542 context-switches 49.053969801 seconds time elapsed ``` to this branch ``` Performance counter stats for process id '157292': 526,556,069,563 cycles 1,122,410,787,297 instructions # 2.13 insn per cycle 2,420,210,310 cache-misses 385,991 context-switches 41.044785986 seconds time elapsed ``` -> same number of instructions need to be executed pretty much, but they run in fewer cycles and encounter fewer cache misses. This is also seen in the profile of where the CPU time goes: main looks like this: ``` 17.21% 328981 org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect() 5.75% 109925 org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegmentNHLD() 5.24% 100195 org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector#countHit() 5.17% 98733 org.apache.lucene.util.packed.DirectMonotonicReader#get() 4.11% 78637 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#ordValue() 3.98% 76164 org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll() 2.57% 49115 org.apache.lucene.codecs.lucene99.Lucene99PostingsReader$EverythingEnum#nextPosition() 1.82% 34823 org.apache.lucene.queries.spans.NearSpansOrdered#stretchToOrder() 1.73% 33136 jdk.internal.foreign.MemorySessionImpl#checkValidStateRaw() 1.63% 31172 java.util.concurrent.atomic.AtomicLong#incrementAndGet() ``` while this branch looks as follows: ``` 10.79% 183254 org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect() 5.89% 100099 org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegmentNHLD() 5.62% 95387 org.apache.lucene.util.packed.DirectMonotonicReader#get() 4.59% 77917 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#ordValue() 4.48% 76145 org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll() 3.20% 54407 org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector#countHit() 2.77% 47088 org.apache.lucene.codecs.lucene99.Lucene99PostingsReader$EverythingEnum#nextPosition() 2.06% 34965 org.apache.lucene.queries.spans.NearSpansOrdered#stretchToOrder() 1.91% 32484 jdk.internal.foreign.MemorySessionImpl#checkValidStateRaw() 1.81% 30763 org.apache.lucene.codecs.lucene99.Lucene99PostingsReader$BlockImpactsPostingsEnum#advance() 1.71% 28966 org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get() 1.66% 28206 org.apache.lucene.codecs.lucene99.Lucene99PostingsReader$EverythingEnum#advance() ``` -> a lot less time goes into `collect` which goes through contended counter increments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org