tang-hi commented on PR #12417: URL: https://github.com/apache/lucene/pull/12417#issuecomment-1631024688
Hi, everyone. I tried the lazy compute idea that I mentioned before. First, I attempted to change the code in the main branch to lazy compute, the benchmark results didn't show much difference. Then I applied the lazy compute algorithm to the vectorized code, and the benchmark results showed improved performance. However, I was surprised to see that the benchmark results for Prefix3 were not good. After that, I tested the scalar code, and the benchmark results showed a decrease in performance. I have the results of these two benchemark below. ## vectorized TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value Prefix3 90.71 (2.4%) 80.20 (4.7%) -11.6% ( -18% - -4%) 0.000 BrowseDayOfYearSSDVFacets 6.89 (13.1%) 6.65 (10.2%) -3.5% ( -23% - 22%) 0.348 Wildcard 56.40 (3.0%) 54.70 (2.9%) -3.0% ( -8% - 3%) 0.001 BrowseDateTaxoFacets 5.72 (11.1%) 5.58 (7.9%) -2.5% ( -19% - 18%) 0.415 BrowseRandomLabelTaxoFacets 5.54 (8.6%) 5.42 (7.4%) -2.3% ( -16% - 14%) 0.369 BrowseDayOfYearTaxoFacets 5.73 (10.3%) 5.61 (7.5%) -2.1% ( -18% - 17%) 0.454 BrowseMonthSSDVFacets 7.01 (11.3%) 6.86 (11.9%) -2.1% ( -22% - 23%) 0.567 IntNRQ 42.16 (4.9%) 41.72 (2.7%) -1.1% ( -8% - 6%) 0.397 Fuzzy2 81.88 (1.8%) 82.32 (2.3%) 0.5% ( -3% - 4%) 0.411 MedIntervalsOrdered 10.00 (5.2%) 10.05 (4.0%) 0.5% ( -8% - 10%) 0.711 Fuzzy1 114.12 (1.8%) 114.78 (2.4%) 0.6% ( -3% - 4%) 0.385 BrowseRandomLabelSSDVFacets 5.31 (5.5%) 5.35 (8.0%) 0.7% ( -12% - 14%) 0.746 TermDTSort 159.43 (5.3%) 160.63 (4.0%) 0.8% ( -8% - 10%) 0.612 MedTermDayTaxoFacets 11.07 (3.0%) 11.16 (2.8%) 0.8% ( -4% - 6%) 0.381 HighTermTitleSort 100.01 (3.8%) 101.02 (4.6%) 1.0% ( -7% - 9%) 0.447 MedSpanNear 45.12 (2.5%) 45.61 (2.3%) 1.1% ( -3% - 6%) 0.157 HighIntervalsOrdered 14.06 (4.6%) 14.22 (4.3%) 1.2% ( -7% - 10%) 0.410 HighTermTitleBDVSort 12.39 (1.7%) 12.54 (2.9%) 1.2% ( -3% - 5%) 0.104 HighSpanNear 17.67 (2.2%) 17.90 (2.0%) 1.3% ( -2% - 5%) 0.047 Respell 68.69 (1.5%) 69.62 (1.5%) 1.4% ( -1% - 4%) 0.005 BrowseDateSSDVFacets 1.78 (13.0%) 1.81 (11.8%) 1.6% ( -20% - 30%) 0.676 AndHighHighDayTaxoFacets 3.79 (3.4%) 3.86 (3.4%) 2.0% ( -4% - 9%) 0.067 AndHighMedDayTaxoFacets 21.71 (2.5%) 22.16 (2.4%) 2.1% ( -2% - 7%) 0.007 OrHighMedDayTaxoFacets 8.33 (5.1%) 8.52 (3.5%) 2.3% ( -6% - 11%) 0.103 PKLookup 266.99 (2.7%) 273.32 (2.4%) 2.4% ( -2% - 7%) 0.003 OrHighNotLow 536.68 (3.5%) 551.31 (4.7%) 2.7% ( -5% - 11%) 0.037 HighTerm 772.12 (2.2%) 795.02 (3.7%) 3.0% ( -2% - 9%) 0.002 LowTerm 777.24 (2.1%) 803.48 (4.0%) 3.4% ( -2% - 9%) 0.001 MedTerm 625.99 (3.1%) 647.55 (5.4%) 3.4% ( -4% - 12%) 0.013 MedSloppyPhrase 18.33 (1.8%) 18.98 (2.8%) 3.5% ( 0% - 8%) 0.000 OrHighNotMed 461.93 (3.6%) 478.49 (4.2%) 3.6% ( -4% - 11%) 0.004 OrHighNotHigh 526.32 (2.9%) 546.51 (3.7%) 3.8% ( -2% - 10%) 0.000 OrNotHighMed 417.97 (2.6%) 434.74 (3.1%) 4.0% ( -1% - 9%) 0.000 OrNotHighHigh 514.95 (2.9%) 535.68 (2.7%) 4.0% ( -1% - 9%) 0.000 OrHighHigh 37.60 (3.5%) 39.18 (4.5%) 4.2% ( -3% - 12%) 0.001 HighSloppyPhrase 2.28 (2.3%) 2.38 (2.4%) 4.3% ( 0% - 9%) 0.000 AndHighHigh 38.76 (2.1%) 40.63 (2.8%) 4.8% ( 0% - 9%) 0.000 HighTermDayOfYearSort 343.83 (2.4%) 360.72 (4.7%) 4.9% ( -2% - 12%) 0.000 LowSloppyPhrase 60.08 (1.6%) 63.04 (2.2%) 4.9% ( 1% - 8%) 0.000 OrNotHighLow 647.85 (1.7%) 680.90 (2.5%) 5.1% ( 0% - 9%) 0.000 LowIntervalsOrdered 5.71 (3.6%) 6.00 (2.8%) 5.1% ( -1% - 11%) 0.000 HighPhrase 28.70 (1.7%) 30.18 (1.7%) 5.2% ( 1% - 8%) 0.000 LowSpanNear 7.41 (2.1%) 7.80 (1.9%) 5.3% ( 1% - 9%) 0.000 MedPhrase 11.08 (1.5%) 11.71 (2.1%) 5.8% ( 2% - 9%) 0.000 OrHighLow 520.52 (1.6%) 552.94 (3.5%) 6.2% ( 1% - 11%) 0.000 AndHighLow 1225.95 (2.7%) 1308.18 (5.8%) 6.7% ( -1% - 15%) 0.000 HighTermMonthSort 3123.59 (2.9%) 3381.95 (5.3%) 8.3% ( 0% - 16%) 0.000 LowPhrase 27.73 (1.3%) 30.05 (2.2%) 8.4% ( 4% - 12%) 0.000 AndHighMed 109.92 (1.8%) 119.78 (2.7%) 9.0% ( 4% - 13%) 0.000 OrHighMed 111.14 (2.2%) 121.78 (3.3%) 9.6% ( 3% - 15%) 0.000 BrowseMonthTaxoFacets 16.77 (28.5%) 18.90 (1.4%) 12.7% ( -13% - 59%) 0.046 ## scalar TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value LowSloppyPhrase 9.77 (4.4%) 5.73 (2.1%) -41.4% ( -45% - -36%) 0.000 MedSpanNear 11.23 (4.6%) 6.95 (2.3%) -38.2% ( -43% - -32%) 0.000 AndHighMed 123.37 (8.0%) 76.65 (2.0%) -37.9% ( -44% - -30%) 0.000 AndHighMedDayTaxoFacets 26.15 (2.0%) 16.34 (1.6%) -37.5% ( -40% - -34%) 0.000 HighSpanNear 13.97 (3.2%) 9.09 (1.8%) -34.9% ( -38% - -30%) 0.000 MedIntervalsOrdered 21.95 (3.5%) 14.81 (2.7%) -32.5% ( -37% - -27%) 0.000 AndHighLow 1070.91 (3.9%) 732.57 (2.7%) -31.6% ( -36% - -25%) 0.000 LowPhrase 20.41 (2.9%) 14.15 (2.3%) -30.7% ( -34% - -26%) 0.000 MedSloppyPhrase 28.12 (2.8%) 19.93 (1.2%) -29.1% ( -32% - -25%) 0.000 LowIntervalsOrdered 3.36 (4.6%) 2.39 (3.3%) -28.8% ( -35% - -21%) 0.000 OrNotHighLow 640.26 (2.8%) 488.06 (1.8%) -23.8% ( -27% - -19%) 0.000 AndHighHighDayTaxoFacets 10.74 (2.1%) 8.19 (2.1%) -23.7% ( -27% - -19%) 0.000 LowSpanNear 277.31 (1.8%) 212.48 (1.9%) -23.4% ( -26% - -20%) 0.000 AndHighHigh 25.31 (5.6%) 19.62 (2.7%) -22.5% ( -29% - -15%) 0.000 OrNotHighMed 528.99 (2.2%) 414.93 (2.7%) -21.6% ( -25% - -17%) 0.000 OrHighLow 408.04 (3.1%) 322.73 (2.7%) -20.9% ( -25% - -15%) 0.000 HighSloppyPhrase 13.08 (3.5%) 10.65 (2.0%) -18.6% ( -23% - -13%) 0.000 OrHighHigh 23.91 (5.3%) 19.49 (2.5%) -18.5% ( -24% - -11%) 0.000 MedPhrase 187.62 (2.3%) 153.52 (1.7%) -18.2% ( -21% - -14%) 0.000 OrHighMed 31.30 (4.8%) 26.68 (3.4%) -14.8% ( -21% - -6%) 0.000 HighIntervalsOrdered 0.74 (5.2%) 0.64 (4.7%) -14.1% ( -22% - -4%) 0.000 HighTermDayOfYearSort 327.16 (2.4%) 288.14 (3.1%) -11.9% ( -16% - -6%) 0.000 HighTermTitleSort 105.28 (6.8%) 93.57 (5.1%) -11.1% ( -21% - 0%) 0.000 HighPhrase 173.78 (2.5%) 156.10 (1.4%) -10.2% ( -13% - -6%) 0.000 TermDTSort 166.57 (5.5%) 151.19 (4.4%) -9.2% ( -18% - 0%) 0.000 MedTerm 641.21 (3.4%) 583.88 (3.2%) -8.9% ( -15% - -2%) 0.000 OrNotHighHigh 533.02 (2.8%) 486.18 (2.0%) -8.8% ( -13% - -4%) 0.000 OrHighNotLow 516.66 (2.9%) 472.41 (4.7%) -8.6% ( -15% - 0%) 0.000 OrHighMedDayTaxoFacets 8.58 (3.7%) 7.85 (3.8%) -8.4% ( -15% - 0%) 0.000 OrHighNotMed 496.57 (3.1%) 457.70 (2.7%) -7.8% ( -13% - -2%) 0.000 HighTerm 587.86 (4.2%) 542.76 (3.0%) -7.7% ( -14% - 0%) 0.000 OrHighNotHigh 770.22 (2.5%) 723.29 (2.9%) -6.1% ( -11% - 0%) 0.000 Fuzzy2 31.75 (2.2%) 29.98 (5.6%) -5.6% ( -13% - 2%) 0.000 MedTermDayTaxoFacets 38.41 (1.8%) 36.45 (1.4%) -5.1% ( -8% - -1%) 0.000 LowTerm 900.74 (2.3%) 866.32 (3.2%) -3.8% ( -9% - 1%) 0.000 HighTermTitleBDVSort 7.78 (2.9%) 7.61 (3.6%) -2.2% ( -8% - 4%) 0.034 BrowseMonthSSDVFacets 9.22 (6.5%) 9.08 (7.8%) -1.4% ( -14% - 13%) 0.532 PKLookup 260.37 (3.8%) 258.96 (3.6%) -0.5% ( -7% - 7%) 0.645 Respell 54.02 (2.2%) 53.76 (2.2%) -0.5% ( -4% - 4%) 0.492 Fuzzy1 81.60 (2.4%) 81.29 (1.7%) -0.4% ( -4% - 3%) 0.557 BrowseRandomLabelSSDVFacets 6.07 (8.1%) 6.05 (8.3%) -0.3% ( -15% - 17%) 0.894 BrowseDayOfYearSSDVFacets 8.14 (6.9%) 8.12 (7.3%) -0.3% ( -13% - 14%) 0.885 HighTermMonthSort 2956.82 (4.0%) 2967.31 (4.3%) 0.4% ( -7% - 9%) 0.787 Wildcard 73.82 (3.0%) 75.25 (3.5%) 1.9% ( -4% - 8%) 0.061 Prefix3 190.39 (4.0%) 195.04 (3.9%) 2.4% ( -5% - 10%) 0.050 BrowseRandomLabelTaxoFacets 5.77 (11.3%) 5.92 (10.1%) 2.6% ( -16% - 27%) 0.441 BrowseDateTaxoFacets 6.26 (15.9%) 6.43 (15.7%) 2.7% ( -24% - 40%) 0.588 BrowseDayOfYearTaxoFacets 6.28 (15.9%) 6.47 (16.0%) 2.9% ( -25% - 41%) 0.565 BrowseMonthTaxoFacets 10.24 (52.4%) 10.54 (54.2%) 2.9% ( -68% - 230%) 0.862 IntNRQ 43.93 (15.3%) 45.33 (8.8%) 3.2% ( -18% - 32%) 0.419 BrowseDateSSDVFacets 1.57 (11.2%) 1.64 (13.6%) 4.6% ( -18% - 33%) 0.244 My current conclusions are: 1. Lazy compute works well when combined with vectors. 2. Perhaps we can improve the performance of the scalar code. The current version may still have room for improvement, but I currently don't have any ideas. If you have any suggestions, please feel free to propose them or directly commit to this branch. 3. The benchmark results for Prefix3 seem odd because it only shows a performance decrease in the vectorized version, even in the scalar version, the performance of Prefix3 doesn't decrease. Any clues? I would love to hear your opinions and welcome any ideas. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org