HUSTERGS commented on PR #14968: URL: https://github.com/apache/lucene/pull/14968#issuecomment-3094415292
> I'd suggest to focus this first PR on the `Scorer#applyAsRequiredClause` API and later see if there's more room for speedups by adding new APIs to `PostingsEnum` in a follow-up PR? Yeah, I think it's a good idea, I did some experiment with some detail of current version of code these days. I've move the `PostingEnum` related code directly into the `applyAsRequiredClause` and removed the dependency for newly intruduced `NormAndFreqBuffer`, the luceneutil benchmark result seems no longer yield a good performance gain (at least not as good as before):, especially for the `OrStopWords` query, Here is the result: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value OrHighMed 69.09 (3.6%) 67.53 (10.6%) -2.3% ( -15% - 12%) 0.365 Or2Terms2StopWords 63.02 (5.4%) 61.66 (8.7%) -2.2% ( -15% - 12%) 0.344 CombinedTerm 11.07 (5.1%) 10.85 (4.6%) -2.1% ( -11% - 8%) 0.180 AndHighHigh 22.49 (2.8%) 22.03 (9.5%) -2.1% ( -13% - 10%) 0.354 OrHighHigh 21.41 (2.3%) 20.98 (9.9%) -2.0% ( -13% - 10%) 0.384 And3Terms 73.13 (3.5%) 71.87 (8.7%) -1.7% ( -13% - 10%) 0.409 TermB1M 473.09 (4.4%) 465.04 (8.0%) -1.7% ( -13% - 11%) 0.404 TermB1M1P 473.02 (4.4%) 465.15 (8.1%) -1.7% ( -13% - 11%) 0.419 AndHighMed 54.53 (3.2%) 53.64 (10.0%) -1.6% ( -14% - 11%) 0.485 Or3Terms 65.73 (3.6%) 64.70 (9.4%) -1.6% ( -13% - 11%) 0.484 Term10K 472.65 (4.4%) 465.27 (8.1%) -1.6% ( -13% - 11%) 0.450 Term100 472.96 (4.2%) 465.67 (8.1%) -1.5% ( -13% - 11%) 0.452 Term1M 472.95 (4.4%) 465.91 (8.1%) -1.5% ( -13% - 11%) 0.469 Term 472.92 (4.5%) 465.98 (8.2%) -1.5% ( -13% - 11%) 0.481 And2Terms2StopWords 61.13 (5.8%) 60.43 (8.9%) -1.2% ( -14% - 14%) 0.627 TermMonthSort 2219.47 (3.2%) 2201.87 (3.7%) -0.8% ( -7% - 6%) 0.467 DismaxOrHighMed 50.75 (3.5%) 50.42 (7.9%) -0.7% ( -11% - 11%) 0.733 AndMedOrHighHigh 16.90 (2.9%) 16.80 (4.8%) -0.6% ( -8% - 7%) 0.621 IntSet 298.20 (4.4%) 296.49 (4.0%) -0.6% ( -8% - 8%) 0.666 DismaxTerm 513.00 (4.7%) 510.11 (6.5%) -0.6% ( -11% - 11%) 0.753 DismaxOrHighHigh 35.47 (3.2%) 35.31 (6.3%) -0.4% ( -9% - 9%) 0.776 FilteredOr3Terms 44.20 (3.2%) 44.11 (3.1%) -0.2% ( -6% - 6%) 0.831 FilteredOr2Terms2StopWords 50.82 (4.1%) 50.72 (4.3%) -0.2% ( -8% - 8%) 0.886 Fuzzy1 40.73 (3.7%) 40.65 (4.9%) -0.2% ( -8% - 8%) 0.891 OrMany 4.69 (3.7%) 4.69 (6.1%) -0.1% ( -9% - 10%) 0.950 CombinedAndHighMed 21.52 (4.6%) 21.51 (4.3%) -0.1% ( -8% - 9%) 0.967 CountOrHighMed 78.12 (1.7%) 78.08 (2.6%) -0.0% ( -4% - 4%) 0.944 FilteredOrMany 4.06 (2.4%) 4.06 (2.2%) -0.0% ( -4% - 4%) 0.964 FilteredAnd2Terms2StopWords 61.06 (4.3%) 61.04 (6.2%) -0.0% ( -10% - 10%) 0.986 FilteredOrHighMed 39.18 (3.3%) 39.17 (3.2%) -0.0% ( -6% - 6%) 0.986 CountTerm 6298.40 (4.3%) 6297.77 (4.5%) -0.0% ( -8% - 9%) 0.994 Fuzzy2 36.89 (3.5%) 36.90 (4.7%) 0.0% ( -7% - 8%) 0.980 CountAndHighMed 75.47 (1.7%) 75.50 (2.1%) 0.0% ( -3% - 3%) 0.941 IntNRQ 42.55 (2.2%) 42.57 (3.0%) 0.1% ( -4% - 5%) 0.946 FilteredAnd3Terms 101.94 (2.3%) 102.05 (2.9%) 0.1% ( -5% - 5%) 0.900 CountFilteredOrHighMed 17.95 (0.7%) 17.98 (0.6%) 0.2% ( -1% - 1%) 0.460 FilteredOrHighHigh 13.02 (2.5%) 13.05 (2.3%) 0.2% ( -4% - 5%) 0.813 FilteredIntNRQ 42.16 (2.3%) 42.24 (3.0%) 0.2% ( -4% - 5%) 0.820 CountFilteredIntNRQ 16.31 (0.8%) 16.35 (1.2%) 0.2% ( -1% - 2%) 0.468 CountFilteredOrHighHigh 15.86 (0.8%) 15.90 (0.8%) 0.3% ( -1% - 1%) 0.331 CountOrHighHigh 50.16 (2.4%) 50.29 (2.5%) 0.3% ( -4% - 5%) 0.724 CountFilteredPhrase 9.18 (2.5%) 9.21 (3.4%) 0.3% ( -5% - 6%) 0.771 Wildcard 47.34 (3.3%) 47.48 (3.7%) 0.3% ( -6% - 7%) 0.790 AndHighOrMedMed 14.04 (2.2%) 14.08 (2.6%) 0.3% ( -4% - 5%) 0.688 IntervalsOrdered 2.43 (3.4%) 2.44 (3.3%) 0.3% ( -6% - 7%) 0.760 CountOrMany 5.04 (2.9%) 5.06 (2.8%) 0.4% ( -5% - 6%) 0.696 CountFilteredOrMany 4.46 (2.5%) 4.48 (2.6%) 0.4% ( -4% - 5%) 0.635 TermTitleSort 51.93 (4.8%) 52.13 (5.1%) 0.4% ( -9% - 10%) 0.809 CountAndHighHigh 48.66 (2.2%) 48.85 (2.2%) 0.4% ( -3% - 4%) 0.560 CombinedAndHighHigh 5.67 (2.8%) 5.69 (2.3%) 0.4% ( -4% - 5%) 0.597 Prefix3 75.57 (3.8%) 75.90 (3.3%) 0.4% ( -6% - 7%) 0.699 FilteredPrefix3 70.64 (3.3%) 70.98 (3.1%) 0.5% ( -5% - 7%) 0.637 Respell 36.72 (3.5%) 36.93 (3.6%) 0.6% ( -6% - 7%) 0.603 SpanNear 2.45 (5.5%) 2.46 (5.4%) 0.6% ( -9% - 12%) 0.733 FilteredOrStopWords 8.13 (2.2%) 8.18 (2.0%) 0.7% ( -3% - 5%) 0.329 FilteredTerm 64.92 (3.0%) 65.36 (3.6%) 0.7% ( -5% - 7%) 0.522 TermDTSort 144.97 (3.3%) 146.07 (4.8%) 0.8% ( -7% - 9%) 0.561 FilteredPhrase 9.83 (2.2%) 9.91 (2.6%) 0.8% ( -3% - 5%) 0.297 SloppyPhrase 1.12 (5.3%) 1.13 (4.9%) 0.8% ( -8% - 11%) 0.616 Phrase 7.57 (4.3%) 7.64 (4.3%) 0.9% ( -7% - 9%) 0.490 TermDayOfYearSort 264.98 (2.6%) 267.70 (2.9%) 1.0% ( -4% - 6%) 0.241 OrHighRare 94.68 (6.8%) 95.91 (5.4%) 1.3% ( -10% - 14%) 0.501 CombinedOrHighMed 21.05 (5.6%) 21.35 (4.6%) 1.4% ( -8% - 12%) 0.396 AndStopWords 8.87 (3.6%) 9.01 (7.4%) 1.6% ( -9% - 12%) 0.399 CountPhrase 2.65 (4.9%) 2.69 (3.2%) 1.9% ( -5% - 10%) 0.154 CombinedOrHighHigh 5.54 (5.1%) 5.66 (3.0%) 2.2% ( -5% - 10%) 0.092 OrStopWords 8.99 (3.2%) 9.20 (8.8%) 2.3% ( -9% - 14%) 0.263 FilteredAndHighMed 31.76 (2.4%) 32.52 (4.0%) 2.4% ( -3% - 8%) 0.020 FilteredAndStopWords 8.41 (3.1%) 8.75 (2.0%) 4.0% ( -1% - 9%) 0.000 FilteredAndHighHigh 10.41 (3.1%) 10.87 (1.8%) 4.4% ( 0% - 9%) 0.000 ``` If I still use the `NormAndFreqBuffer` (instead of `freqs` and `normValues` raw arrays inside `TermScorer`), the performance seems to be better? A little bit strange to me, Here is the result under identical setup (only related querys are showed below) ``` CombinedOrHighMed 21.60 (4.0%) 21.98 (3.8%) 1.8% ( -5% - 9%) 0.151 OrStopWords 9.05 (1.4%) 9.23 (3.1%) 2.0% ( -2% - 6%) 0.009 CombinedOrHighHigh 5.68 (2.7%) 5.81 (2.2%) 2.2% ( -2% - 7%) 0.005 FilteredAndHighMed 31.77 (2.2%) 32.77 (1.5%) 3.1% ( 0% - 7%) 0.000 FilteredAndStopWords 8.40 (2.4%) 8.72 (1.9%) 3.8% ( 0% - 8%) 0.000 FilteredAndHighHigh 10.37 (2.4%) 10.84 (1.3%) 4.5% ( 0% - 8%) 0.000 ``` Not sure what causes the differences : ( Will push a new commit using raw array though -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org