HUSTERGS commented on PR #14968: URL: https://github.com/apache/lucene/pull/14968#issuecomment-3178922979
> Another idea: in order to not add new APIs, an alternative would be to implement specialized bulk scorers for the case when all scorers are term scorers, on the same field (a common case, and arguably the case we're most interested in optimizing) and work directly on `ImpactsEnum`, norms, and `SimScorer`. This should allow us to do interesting things without introducing new APIs, such as reading norms only once per doc ID or vectorizing score computations of required/non-essential clauses. I'm waiting for #15039 to merge, and looking forward to dig a little bit more about this > I suspect there is some connections between #15004 and this PR (there are some overlaps of affected tasks), maybe we should wait for the #15004 being merged into the main branch and compare the performance diff of this PR then ? Since #15004 is merged, I ran the benchmark with result below: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CombinedTerm 11.25 (3.8%) 11.03 (4.6%) -1.9% ( -9% - 6%) 0.150 OrHighHigh 21.77 (2.1%) 21.50 (2.0%) -1.3% ( -5% - 2%) 0.050 Or2Terms2StopWords 59.40 (5.5%) 58.70 (5.1%) -1.2% ( -11% - 10%) 0.485 TermB1M 445.35 (2.8%) 440.99 (2.9%) -1.0% ( -6% - 4%) 0.284 AndHighHigh 22.79 (2.7%) 22.57 (2.2%) -1.0% ( -5% - 4%) 0.214 Term100 445.94 (2.9%) 441.68 (2.8%) -1.0% ( -6% - 4%) 0.291 Term 445.68 (2.9%) 441.44 (3.0%) -1.0% ( -6% - 5%) 0.303 Term10K 445.19 (2.9%) 441.01 (2.8%) -0.9% ( -6% - 4%) 0.298 TermB1M1P 445.76 (2.9%) 441.60 (2.9%) -0.9% ( -6% - 5%) 0.311 And3Terms 72.47 (3.7%) 71.80 (3.5%) -0.9% ( -7% - 6%) 0.420 Term1M 445.53 (2.8%) 441.52 (2.9%) -0.9% ( -6% - 4%) 0.320 FilteredPrefix3 71.42 (3.0%) 70.80 (3.6%) -0.9% ( -7% - 5%) 0.410 OrHighRare 95.32 (5.4%) 94.53 (5.0%) -0.8% ( -10% - 10%) 0.615 OrHighMed 66.68 (4.1%) 66.21 (3.6%) -0.7% ( -7% - 7%) 0.561 AndHighMed 54.25 (3.0%) 53.88 (2.8%) -0.7% ( -6% - 5%) 0.454 DismaxTerm 480.63 (3.5%) 477.44 (3.7%) -0.7% ( -7% - 6%) 0.556 FilteredTerm 63.21 (2.6%) 62.84 (2.4%) -0.6% ( -5% - 4%) 0.460 CountAndHighMed 75.83 (1.8%) 75.42 (2.4%) -0.5% ( -4% - 3%) 0.426 Or3Terms 64.39 (3.7%) 64.07 (3.4%) -0.5% ( -7% - 6%) 0.648 DismaxOrHighHigh 35.26 (2.4%) 35.09 (2.6%) -0.5% ( -5% - 4%) 0.528 CountOrHighMed 78.98 (1.7%) 78.64 (1.8%) -0.4% ( -3% - 3%) 0.447 Prefix3 76.18 (3.1%) 75.90 (4.0%) -0.4% ( -7% - 6%) 0.742 FilteredPhrase 9.72 (2.7%) 9.68 (2.2%) -0.3% ( -5% - 4%) 0.675 DismaxOrHighMed 49.38 (3.3%) 49.23 (3.1%) -0.3% ( -6% - 6%) 0.774 And2Terms2StopWords 57.85 (5.9%) 57.68 (5.6%) -0.3% ( -11% - 11%) 0.874 Wildcard 47.49 (3.0%) 47.36 (3.4%) -0.3% ( -6% - 6%) 0.795 Phrase 7.53 (3.0%) 7.51 (2.4%) -0.2% ( -5% - 5%) 0.801 AndHighOrMedMed 14.10 (3.4%) 14.08 (3.3%) -0.2% ( -6% - 6%) 0.864 FilteredAnd3Terms 104.14 (2.9%) 103.97 (2.3%) -0.2% ( -5% - 5%) 0.840 IntSet 287.42 (4.0%) 286.99 (3.8%) -0.2% ( -7% - 7%) 0.903 FilteredOrStopWords 8.14 (2.1%) 8.13 (2.4%) -0.1% ( -4% - 4%) 0.844 FilteredOrHighHigh 12.87 (2.7%) 12.86 (2.5%) -0.1% ( -5% - 5%) 0.875 Fuzzy1 39.10 (3.8%) 39.05 (3.2%) -0.1% ( -6% - 7%) 0.911 FilteredOrHighMed 38.08 (3.7%) 38.04 (3.2%) -0.1% ( -6% - 7%) 0.931 FilteredIntNRQ 42.38 (2.5%) 42.35 (2.3%) -0.1% ( -4% - 4%) 0.919 FilteredOr3Terms 42.96 (3.7%) 42.95 (3.1%) -0.0% ( -6% - 6%) 0.983 IntervalsOrdered 2.43 (3.9%) 2.42 (3.3%) -0.0% ( -6% - 7%) 0.985 FilteredOr2Terms2StopWords 48.16 (4.5%) 48.17 (4.0%) 0.0% ( -8% - 8%) 0.985 FilteredOrMany 3.98 (3.4%) 3.98 (2.7%) 0.1% ( -5% - 6%) 0.949 Fuzzy2 35.37 (3.5%) 35.42 (3.1%) 0.1% ( -6% - 7%) 0.905 CountFilteredIntNRQ 16.31 (1.1%) 16.33 (0.9%) 0.1% ( -1% - 2%) 0.673 CountPhrase 2.67 (3.8%) 2.67 (3.4%) 0.2% ( -6% - 7%) 0.870 CountFilteredPhrase 8.89 (3.3%) 8.91 (3.0%) 0.2% ( -5% - 6%) 0.839 CountFilteredOrHighMed 17.86 (0.6%) 17.89 (0.5%) 0.2% ( 0% - 1%) 0.234 CountFilteredOrHighHigh 15.78 (0.8%) 15.81 (0.7%) 0.2% ( -1% - 1%) 0.334 IntNRQ 42.71 (2.5%) 42.80 (2.2%) 0.2% ( -4% - 5%) 0.766 FilteredAnd2Terms2StopWords 59.46 (4.6%) 59.66 (4.3%) 0.3% ( -8% - 9%) 0.810 CountOrHighHigh 50.23 (2.1%) 50.41 (2.0%) 0.4% ( -3% - 4%) 0.558 CombinedOrHighMed 20.51 (4.4%) 20.59 (5.0%) 0.4% ( -8% - 10%) 0.799 CountOrMany 4.93 (3.3%) 4.95 (3.2%) 0.5% ( -5% - 7%) 0.653 OrMany 4.55 (5.4%) 4.57 (4.8%) 0.5% ( -9% - 11%) 0.770 CombinedAndHighMed 20.75 (4.2%) 20.86 (4.2%) 0.5% ( -7% - 9%) 0.690 CountAndHighHigh 48.78 (1.9%) 49.08 (1.8%) 0.6% ( -2% - 4%) 0.295 Respell 35.79 (4.3%) 36.05 (2.5%) 0.7% ( -5% - 7%) 0.519 CombinedOrHighHigh 5.65 (3.3%) 5.69 (3.7%) 0.8% ( -6% - 8%) 0.492 CountFilteredOrMany 4.35 (2.6%) 4.39 (2.6%) 0.8% ( -4% - 6%) 0.332 CountTerm 5812.39 (2.7%) 5862.14 (2.9%) 0.9% ( -4% - 6%) 0.335 SloppyPhrase 1.14 (4.5%) 1.15 (4.8%) 0.9% ( -8% - 10%) 0.538 CombinedAndHighHigh 5.71 (1.7%) 5.76 (1.8%) 1.0% ( -2% - 4%) 0.075 AndMedOrHighHigh 16.62 (3.2%) 16.78 (3.2%) 1.0% ( -5% - 7%) 0.316 SpanNear 2.48 (5.2%) 2.51 (5.3%) 1.0% ( -8% - 12%) 0.538 AndStopWords 9.11 (3.0%) 9.31 (1.9%) 2.2% ( -2% - 7%) 0.006 FilteredAndHighMed 31.76 (2.6%) 32.53 (1.6%) 2.4% ( -1% - 6%) 0.000 OrStopWords 9.17 (1.9%) 9.39 (3.1%) 2.5% ( -2% - 7%) 0.002 FilteredAndStopWords 8.57 (2.8%) 8.80 (1.3%) 2.7% ( -1% - 6%) 0.000 FilteredAndHighHigh 10.61 (2.6%) 10.92 (1.0%) 2.9% ( 0% - 6%) 0.000 ``` I'm planning to do another round of benchmark after https://github.com/mikemccand/luceneutil/pull/436 is merged, maybe the speedup is not real ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org