[ https://issues.apache.org/jira/browse/LUCENE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461367#comment-17461367 ]
Feng Guo edited comment on LUCENE-10319 at 12/17/21, 12:16 PM: --------------------------------------------------------------- Out of curiosity, I tried to run the luceneutil wikimedium1m for block size = 256, but got an error there: {code:java} WARNING: cat=AndHighHigh: hit counts differ: 10274+ vs 10884+ WARNING: cat=HighTerm: hit counts differ: 5969+ vs 9423+ WARNING: cat=LowTerm: hit counts differ: 2394+ vs 3325+ WARNING: cat=MedTerm: hit counts differ: 4558+ vs 7118+ WARNING: cat=OrHighHigh: hit counts differ: 5986+ vs 5987+ WARNING: cat=OrHighMed: hit counts differ: 3044+ vs 3445+ Traceback (most recent call last): File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/localrun.py", line 60, in <module> comp.benchmark("baseline_vs_patch") File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/competition.py", line 494, in benchmark searchBench.run(id, base, challenger, File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/searchBench.py", line 196, in run raise RuntimeError('errors occurred: %s' % str(cmpDiffs)) RuntimeError: errors occurred: ([], ['query=+body:web +body:up filter=None sort=None groupField=None hitCount=10274+: wrong hitCount: 10274+ vs 10884+', 'query=body:he body:resulting filter=None sort=None groupField=None hitCount=3044+: wrong hitCount: 3044+ vs 3445+', 'query=body:official filter=None sort=None groupField=None hitCount=4558+: wrong hitCount: 4558+ vs 7118+', 'query=body:thumb filter=None sort=None groupField=None hitCount=5969+: wrong hitCount: 5969+ vs 9423+', 'query=body:years body:pages filter=None sort=None groupField=None hitCount=5986+: wrong hitCount: 5986+ vs 5987+', 'query=body:goods filter=None sort=None groupField=None hitCount=2394+: wrong hitCount: 2394+ vs 3325+'], 1.0) {code} I guess this error may be something about Impacts? So i changed the {{#TOTAL_HITS_THRESHOLD}} to a very large number for both baseline and candidate and rerun the benchmark, everything looks good now and i got a rather good report. But notice that this report does *not* really make sense since we changed the {{{}#TOTAL_HITS_THRESHOLD{}}}, this is just to verify the results are right. {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value Fuzzy1 118.73 (11.5%) 114.82 (13.0%) -3.3% ( -24% - 23%) 0.407 LowTerm 2369.88 (9.2%) 2323.31 (5.7%) -2.0% ( -15% - 14%) 0.428 PKLookup 250.07 (5.0%) 245.42 (4.3%) -1.9% ( -10% - 7%) 0.214 Prefix3 306.43 (6.9%) 301.82 (7.0%) -1.5% ( -14% - 13%) 0.502 Wildcard 221.77 (5.2%) 218.64 (4.0%) -1.4% ( -10% - 8%) 0.348 HighTermMonthSort 1161.02 (12.7%) 1156.95 (11.1%) -0.4% ( -21% - 26%) 0.928 BrowseDayOfYearSSDVFacets 140.62 (1.3%) 140.48 (1.1%) -0.1% ( -2% - 2%) 0.791 Fuzzy2 47.51 (8.9%) 47.57 (7.0%) 0.1% ( -14% - 17%) 0.961 Respell 200.51 (2.7%) 200.82 (1.4%) 0.2% ( -3% - 4%) 0.823 OrHighMed 197.90 (3.0%) 198.36 (3.6%) 0.2% ( -6% - 7%) 0.830 BrowseMonthSSDVFacets 152.24 (2.8%) 152.74 (1.0%) 0.3% ( -3% - 4%) 0.630 OrHighLow 245.11 (3.5%) 245.97 (3.1%) 0.4% ( -6% - 7%) 0.744 AndHighLow 1598.05 (7.2%) 1604.55 (4.6%) 0.4% ( -10% - 13%) 0.836 BrowseDayOfYearTaxoFacets 28.84 (3.0%) 28.99 (3.3%) 0.5% ( -5% - 7%) 0.603 OrHighHigh 109.37 (4.2%) 110.14 (4.0%) 0.7% ( -7% - 9%) 0.599 BrowseMonthTaxoFacets 30.77 (3.5%) 31.00 (4.1%) 0.8% ( -6% - 8%) 0.541 BrowseDateTaxoFacets 28.71 (3.2%) 28.93 (3.3%) 0.8% ( -5% - 7%) 0.461 HighTermDayOfYearSort 593.30 (13.5%) 599.82 (13.2%) 1.1% ( -22% - 32%) 0.800 AndHighHigh 441.62 (5.0%) 452.99 (4.1%) 2.6% ( -6% - 12%) 0.083 IntNRQ 121.71 (6.2%) 124.89 (4.2%) 2.6% ( -7% - 13%) 0.127 HighTerm 599.78 (4.2%) 615.86 (2.6%) 2.7% ( -3% - 9%) 0.019 MedSloppyPhrase 397.69 (3.1%) 411.46 (3.3%) 3.5% ( -2% - 10%) 0.001 MedSpanNear 75.75 (2.8%) 78.59 (1.5%) 3.7% ( 0% - 8%) 0.000 HighIntervalsOrdered 108.30 (2.8%) 112.66 (2.3%) 4.0% ( 0% - 9%) 0.000 HighSpanNear 23.10 (3.2%) 24.25 (1.5%) 5.0% ( 0% - 9%) 0.000 MedTerm 1001.40 (4.2%) 1055.70 (2.4%) 5.4% ( -1% - 12%) 0.000 LowPhrase 258.65 (2.3%) 278.10 (2.2%) 7.5% ( 2% - 12%) 0.000 HighPhrase 67.81 (3.0%) 72.94 (3.7%) 7.6% ( 0% - 14%) 0.000 HighSloppyPhrase 20.13 (6.0%) 21.69 (5.9%) 7.7% ( -3% - 20%) 0.000 MedPhrase 258.96 (2.6%) 279.48 (3.0%) 7.9% ( 2% - 13%) 0.000 LowIntervalsOrdered 476.40 (3.2%) 516.31 (2.8%) 8.4% ( 2% - 14%) 0.000 MedIntervalsOrdered 112.10 (2.4%) 121.85 (2.9%) 8.7% ( 3% - 14%) 0.000 AndHighMed 784.68 (5.2%) 856.24 (5.1%) 9.1% ( -1% - 20%) 0.000 LowSpanNear 92.93 (1.8%) 101.80 (2.5%) 9.5% ( 5% - 14%) 0.000 LowSloppyPhrase 250.51 (3.0%) 279.69 (3.6%) 11.6% ( 4% - 18%) 0.000 {code} Then, i deleted the check of TotalHits In LuceneUtil and rerun the benchmark. As expected, we can see that QPS of tasks with a totalHits diff decreased and others increased. I post the report here in case some one would be interested in. (Not really related to this ISSUE though) {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value AndHighHigh 214.93 (3.8%) 183.83 (2.6%) -14.5% ( -20% - -8%) 0.000 MedTerm 2589.52 (4.5%) 2303.67 (5.5%) -11.0% ( -20% - -1%) 0.000 HighTerm 1750.90 (4.0%) 1560.54 (4.3%) -10.9% ( -18% - -2%) 0.000 HighPhrase 238.61 (2.8%) 218.08 (4.3%) -8.6% ( -15% - -1%) 0.000 OrHighHigh 117.03 (1.9%) 107.52 (4.8%) -8.1% ( -14% - -1%) 0.000 HighTermMonthSort 905.11 (10.5%) 864.34 (9.3%) -4.5% ( -21% - 17%) 0.150 HighTermDayOfYearSort 1095.73 (10.4%) 1056.20 (11.0%) -3.6% ( -22% - 19%) 0.288 PKLookup 249.62 (3.8%) 241.15 (4.6%) -3.4% ( -11% - 5%) 0.011 LowTerm 2761.54 (4.6%) 2681.22 (6.8%) -2.9% ( -13% - 8%) 0.111 Respell 163.65 (3.4%) 159.17 (3.8%) -2.7% ( -9% - 4%) 0.016 Wildcard 587.89 (2.9%) 573.02 (4.8%) -2.5% ( -9% - 5%) 0.044 IntNRQ 654.86 (4.4%) 644.88 (5.4%) -1.5% ( -10% - 8%) 0.328 LowPhrase 596.01 (4.3%) 587.28 (5.5%) -1.5% ( -10% - 8%) 0.349 HighIntervalsOrdered 16.48 (8.9%) 16.26 (6.4%) -1.3% ( -15% - 15%) 0.586 AndHighLow 1665.94 (6.4%) 1649.07 (6.1%) -1.0% ( -12% - 12%) 0.610 BrowseDayOfYearSSDVFacets 142.76 (2.5%) 141.87 (3.3%) -0.6% ( -6% - 5%) 0.507 BrowseDateTaxoFacets 29.49 (4.2%) 29.40 (3.8%) -0.3% ( -8% - 8%) 0.796 MedPhrase 653.42 (4.6%) 652.05 (5.6%) -0.2% ( -9% - 10%) 0.897 Fuzzy1 116.77 (6.3%) 116.59 (10.4%) -0.2% ( -15% - 17%) 0.956 BrowseDayOfYearTaxoFacets 29.58 (4.3%) 29.55 (4.1%) -0.1% ( -8% - 8%) 0.929 Fuzzy2 73.12 (10.4%) 73.04 (10.7%) -0.1% ( -19% - 23%) 0.974 BrowseMonthTaxoFacets 31.65 (5.0%) 31.64 (4.9%) -0.0% ( -9% - 10%) 0.985 BrowseMonthSSDVFacets 155.25 (3.5%) 155.27 (3.8%) 0.0% ( -7% - 7%) 0.991 OrHighMed 267.80 (5.9%) 268.44 (6.2%) 0.2% ( -11% - 13%) 0.900 OrHighLow 820.94 (8.5%) 832.70 (7.8%) 1.4% ( -13% - 19%) 0.579 Prefix3 483.34 (5.8%) 490.76 (7.1%) 1.5% ( -10% - 15%) 0.453 LowSloppyPhrase 268.01 (2.2%) 279.16 (3.9%) 4.2% ( -1% - 10%) 0.000 LowSpanNear 518.44 (3.8%) 542.08 (5.2%) 4.6% ( -4% - 14%) 0.002 MedSloppyPhrase 252.28 (2.4%) 264.31 (2.2%) 4.8% ( 0% - 9%) 0.000 HighSloppyPhrase 157.88 (2.6%) 165.44 (3.1%) 4.8% ( 0% - 10%) 0.000 HighSpanNear 232.57 (2.5%) 243.72 (3.5%) 4.8% ( -1% - 11%) 0.000 LowIntervalsOrdered 697.59 (3.8%) 734.23 (4.8%) 5.3% ( -3% - 14%) 0.000 MedSpanNear 171.60 (3.1%) 181.41 (4.4%) 5.7% ( -1% - 13%) 0.000 MedIntervalsOrdered 356.52 (3.1%) 383.69 (4.1%) 7.6% ( 0% - 15%) 0.000 AndHighMed 555.66 (4.4%) 617.40 (5.7%) 11.1% ( 0% - 22%) 0.000 {code} was (Author: gf2121): Out of curiosity, I tried to run the luceneutil wikimedium1m for block size = 256, but got an error there: {code:java} WARNING: cat=AndHighHigh: hit counts differ: 10274+ vs 10884+ WARNING: cat=HighTerm: hit counts differ: 5969+ vs 9423+ WARNING: cat=LowTerm: hit counts differ: 2394+ vs 3325+ WARNING: cat=MedTerm: hit counts differ: 4558+ vs 7118+ WARNING: cat=OrHighHigh: hit counts differ: 5986+ vs 5987+ WARNING: cat=OrHighMed: hit counts differ: 3044+ vs 3445+ Traceback (most recent call last): File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/localrun.py", line 60, in <module> comp.benchmark("baseline_vs_patch") File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/competition.py", line 494, in benchmark searchBench.run(id, base, challenger, File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/searchBench.py", line 196, in run raise RuntimeError('errors occurred: %s' % str(cmpDiffs)) RuntimeError: errors occurred: ([], ['query=+body:web +body:up filter=None sort=None groupField=None hitCount=10274+: wrong hitCount: 10274+ vs 10884+', 'query=body:he body:resulting filter=None sort=None groupField=None hitCount=3044+: wrong hitCount: 3044+ vs 3445+', 'query=body:official filter=None sort=None groupField=None hitCount=4558+: wrong hitCount: 4558+ vs 7118+', 'query=body:thumb filter=None sort=None groupField=None hitCount=5969+: wrong hitCount: 5969+ vs 9423+', 'query=body:years body:pages filter=None sort=None groupField=None hitCount=5986+: wrong hitCount: 5986+ vs 5987+', 'query=body:goods filter=None sort=None groupField=None hitCount=2394+: wrong hitCount: 2394+ vs 3325+'], 1.0) {code} I guess this error may be something about Impacts? So i changed the {{#TOTAL_HITS_THRESHOLD}} to a very large number for both baseline and candidate and rerun the benchmark, everything looks good now and i got a rather good report. But notice that this report does *not* really make sense since we changed the {{{}#TOTAL_HITS_THRESHOLD{}}}, this is just to verify the results are right. {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value Fuzzy1 118.73 (11.5%) 114.82 (13.0%) -3.3% ( -24% - 23%) 0.407 LowTerm 2369.88 (9.2%) 2323.31 (5.7%) -2.0% ( -15% - 14%) 0.428 PKLookup 250.07 (5.0%) 245.42 (4.3%) -1.9% ( -10% - 7%) 0.214 Prefix3 306.43 (6.9%) 301.82 (7.0%) -1.5% ( -14% - 13%) 0.502 Wildcard 221.77 (5.2%) 218.64 (4.0%) -1.4% ( -10% - 8%) 0.348 HighTermMonthSort 1161.02 (12.7%) 1156.95 (11.1%) -0.4% ( -21% - 26%) 0.928 BrowseDayOfYearSSDVFacets 140.62 (1.3%) 140.48 (1.1%) -0.1% ( -2% - 2%) 0.791 Fuzzy2 47.51 (8.9%) 47.57 (7.0%) 0.1% ( -14% - 17%) 0.961 Respell 200.51 (2.7%) 200.82 (1.4%) 0.2% ( -3% - 4%) 0.823 OrHighMed 197.90 (3.0%) 198.36 (3.6%) 0.2% ( -6% - 7%) 0.830 BrowseMonthSSDVFacets 152.24 (2.8%) 152.74 (1.0%) 0.3% ( -3% - 4%) 0.630 OrHighLow 245.11 (3.5%) 245.97 (3.1%) 0.4% ( -6% - 7%) 0.744 AndHighLow 1598.05 (7.2%) 1604.55 (4.6%) 0.4% ( -10% - 13%) 0.836 BrowseDayOfYearTaxoFacets 28.84 (3.0%) 28.99 (3.3%) 0.5% ( -5% - 7%) 0.603 OrHighHigh 109.37 (4.2%) 110.14 (4.0%) 0.7% ( -7% - 9%) 0.599 BrowseMonthTaxoFacets 30.77 (3.5%) 31.00 (4.1%) 0.8% ( -6% - 8%) 0.541 BrowseDateTaxoFacets 28.71 (3.2%) 28.93 (3.3%) 0.8% ( -5% - 7%) 0.461 HighTermDayOfYearSort 593.30 (13.5%) 599.82 (13.2%) 1.1% ( -22% - 32%) 0.800 AndHighHigh 441.62 (5.0%) 452.99 (4.1%) 2.6% ( -6% - 12%) 0.083 IntNRQ 121.71 (6.2%) 124.89 (4.2%) 2.6% ( -7% - 13%) 0.127 HighTerm 599.78 (4.2%) 615.86 (2.6%) 2.7% ( -3% - 9%) 0.019 MedSloppyPhrase 397.69 (3.1%) 411.46 (3.3%) 3.5% ( -2% - 10%) 0.001 MedSpanNear 75.75 (2.8%) 78.59 (1.5%) 3.7% ( 0% - 8%) 0.000 HighIntervalsOrdered 108.30 (2.8%) 112.66 (2.3%) 4.0% ( 0% - 9%) 0.000 HighSpanNear 23.10 (3.2%) 24.25 (1.5%) 5.0% ( 0% - 9%) 0.000 MedTerm 1001.40 (4.2%) 1055.70 (2.4%) 5.4% ( -1% - 12%) 0.000 LowPhrase 258.65 (2.3%) 278.10 (2.2%) 7.5% ( 2% - 12%) 0.000 HighPhrase 67.81 (3.0%) 72.94 (3.7%) 7.6% ( 0% - 14%) 0.000 HighSloppyPhrase 20.13 (6.0%) 21.69 (5.9%) 7.7% ( -3% - 20%) 0.000 MedPhrase 258.96 (2.6%) 279.48 (3.0%) 7.9% ( 2% - 13%) 0.000 LowIntervalsOrdered 476.40 (3.2%) 516.31 (2.8%) 8.4% ( 2% - 14%) 0.000 MedIntervalsOrdered 112.10 (2.4%) 121.85 (2.9%) 8.7% ( 3% - 14%) 0.000 AndHighMed 784.68 (5.2%) 856.24 (5.1%) 9.1% ( -1% - 20%) 0.000 LowSpanNear 92.93 (1.8%) 101.80 (2.5%) 9.5% ( 5% - 14%) 0.000 LowSloppyPhrase 250.51 (3.0%) 279.69 (3.6%) 11.6% ( 4% - 18%) 0.000 {code} Then, i deleted the check of TotalHits In LuceneUtil and rerun the benchmark. As expected, we can see that QPS of tasks with a totalHits diff decreased and others increased. I post the report here in case some one would be interested in. {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value AndHighHigh 214.93 (3.8%) 183.83 (2.6%) -14.5% ( -20% - -8%) 0.000 MedTerm 2589.52 (4.5%) 2303.67 (5.5%) -11.0% ( -20% - -1%) 0.000 HighTerm 1750.90 (4.0%) 1560.54 (4.3%) -10.9% ( -18% - -2%) 0.000 HighPhrase 238.61 (2.8%) 218.08 (4.3%) -8.6% ( -15% - -1%) 0.000 OrHighHigh 117.03 (1.9%) 107.52 (4.8%) -8.1% ( -14% - -1%) 0.000 HighTermMonthSort 905.11 (10.5%) 864.34 (9.3%) -4.5% ( -21% - 17%) 0.150 HighTermDayOfYearSort 1095.73 (10.4%) 1056.20 (11.0%) -3.6% ( -22% - 19%) 0.288 PKLookup 249.62 (3.8%) 241.15 (4.6%) -3.4% ( -11% - 5%) 0.011 LowTerm 2761.54 (4.6%) 2681.22 (6.8%) -2.9% ( -13% - 8%) 0.111 Respell 163.65 (3.4%) 159.17 (3.8%) -2.7% ( -9% - 4%) 0.016 Wildcard 587.89 (2.9%) 573.02 (4.8%) -2.5% ( -9% - 5%) 0.044 IntNRQ 654.86 (4.4%) 644.88 (5.4%) -1.5% ( -10% - 8%) 0.328 LowPhrase 596.01 (4.3%) 587.28 (5.5%) -1.5% ( -10% - 8%) 0.349 HighIntervalsOrdered 16.48 (8.9%) 16.26 (6.4%) -1.3% ( -15% - 15%) 0.586 AndHighLow 1665.94 (6.4%) 1649.07 (6.1%) -1.0% ( -12% - 12%) 0.610 BrowseDayOfYearSSDVFacets 142.76 (2.5%) 141.87 (3.3%) -0.6% ( -6% - 5%) 0.507 BrowseDateTaxoFacets 29.49 (4.2%) 29.40 (3.8%) -0.3% ( -8% - 8%) 0.796 MedPhrase 653.42 (4.6%) 652.05 (5.6%) -0.2% ( -9% - 10%) 0.897 Fuzzy1 116.77 (6.3%) 116.59 (10.4%) -0.2% ( -15% - 17%) 0.956 BrowseDayOfYearTaxoFacets 29.58 (4.3%) 29.55 (4.1%) -0.1% ( -8% - 8%) 0.929 Fuzzy2 73.12 (10.4%) 73.04 (10.7%) -0.1% ( -19% - 23%) 0.974 BrowseMonthTaxoFacets 31.65 (5.0%) 31.64 (4.9%) -0.0% ( -9% - 10%) 0.985 BrowseMonthSSDVFacets 155.25 (3.5%) 155.27 (3.8%) 0.0% ( -7% - 7%) 0.991 OrHighMed 267.80 (5.9%) 268.44 (6.2%) 0.2% ( -11% - 13%) 0.900 OrHighLow 820.94 (8.5%) 832.70 (7.8%) 1.4% ( -13% - 19%) 0.579 Prefix3 483.34 (5.8%) 490.76 (7.1%) 1.5% ( -10% - 15%) 0.453 LowSloppyPhrase 268.01 (2.2%) 279.16 (3.9%) 4.2% ( -1% - 10%) 0.000 LowSpanNear 518.44 (3.8%) 542.08 (5.2%) 4.6% ( -4% - 14%) 0.002 MedSloppyPhrase 252.28 (2.4%) 264.31 (2.2%) 4.8% ( 0% - 9%) 0.000 HighSloppyPhrase 157.88 (2.6%) 165.44 (3.1%) 4.8% ( 0% - 10%) 0.000 HighSpanNear 232.57 (2.5%) 243.72 (3.5%) 4.8% ( -1% - 11%) 0.000 LowIntervalsOrdered 697.59 (3.8%) 734.23 (4.8%) 5.3% ( -3% - 14%) 0.000 MedSpanNear 171.60 (3.1%) 181.41 (4.4%) 5.7% ( -1% - 13%) 0.000 MedIntervalsOrdered 356.52 (3.1%) 383.69 (4.1%) 7.6% ( 0% - 15%) 0.000 AndHighMed 555.66 (4.4%) 617.40 (5.7%) 11.1% ( 0% - 22%) 0.000 {code} > Make ForUtil#BLOCK_SIZE changeable > ---------------------------------- > > Key: LUCENE-10319 > URL: https://issues.apache.org/jira/browse/LUCENE-10319 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs > Reporter: Feng Guo > Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > In LUCENE-10315, I tried to generate a {{ForUtil}} whose > {{{}BLOCK_SIZE=512{}}}, I thought it could be simple since it looks like i > only need to change the BLOCK_SIZE, but it turns out that there are a lot of > values related to the BLOCK_SIZE but hard coded. > So this is trying to make all hard code value generated from the BLOCK_SIZE > in case we need a ForUtil somewhere else or want to change BLOCK_SIZE in > postings in feature. > I tried to make the BLOCK_SIZE = 64 / 256 and all tests passed. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org