jpountz commented on PR #12489:
URL: https://github.com/apache/lucene/pull/12489#issuecomment-1712779097

   Wikibigall. Less space spent on doc valuse this time since I did not enable 
indexing of facets. There is a more significant size reduction of postings this 
time (-10.5%). This is not misaligned with the reproducibility paper which 
observered size reductions of 18% with partitioned Elias-Fano and 5% with 
SVByte on the Wikipedia dataset. I would expect PFor to be somewhere in between 
as it's better able to take advantage of small gaps between docs than SVByte, 
but less than partioned Elias-Fano.
   
   | File | before (MB) | after (MB) |
   | - | - | - |
   | terms (tim) | 767 |766 |
   | postings (doc) | 2779 | 2489 |
   | positions (pos) | 11356 | 10569 |
   | points (kdd) | 100 | 99 |
   | doc values (dvd) | 456 | 461 |
   | stored fields (fdt) | 249 | 257 |
   | norms (nvd) | 13 | 13 |
   | total | 15734 |14669 |
   
   Benchmarks still show slowdowns on phrase queries and speedups on 
conjunctions, though it's less spectacular than on wikimedium10m.
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                            MedTerm      652.41      (7.5%)      493.97      
(2.6%)  -24.3% ( -31% -  -15%) 0.000
                         HighPhrase       30.86      (3.5%)       23.85      
(2.6%)  -22.7% ( -27% -  -17%) 0.000
                          LowPhrase       51.09      (3.1%)       42.38      
(2.2%)  -17.1% ( -21% -  -12%) 0.000
                            LowTerm     1057.76      (5.4%)      881.22      
(2.5%)  -16.7% ( -23% -   -9%) 0.000
                          MedPhrase       82.18      (3.0%)       71.88      
(1.7%)  -12.5% ( -16% -   -8%) 0.000
                  HighTermMonthSort     6482.52      (4.5%)     5739.50      
(3.5%)  -11.5% ( -18% -   -3%) 0.000
                           PKLookup      293.95      (3.2%)      276.15      
(3.7%)   -6.1% ( -12% -    0%) 0.000
                    MedSloppyPhrase        8.68      (2.7%)        8.20      
(2.9%)   -5.5% ( -10% -    0%) 0.000
                          OrHighLow      578.06      (4.4%)      550.49      
(4.0%)   -4.8% ( -12% -    3%) 0.016
                   HighSloppyPhrase        7.43      (2.2%)        7.10      
(4.0%)   -4.4% ( -10% -    1%) 0.003
                             Fuzzy1      244.70      (2.9%)      238.49      
(3.3%)   -2.5% (  -8% -    3%) 0.080
                         OrHighHigh       39.76      (9.5%)       39.21      
(6.1%)   -1.4% ( -15% -   15%) 0.717
                           HighTerm      370.57      (8.5%)      367.09      
(4.4%)   -0.9% ( -12% -   13%) 0.768
                    LowSloppyPhrase       13.68      (2.3%)       13.71      
(3.3%)    0.2% (  -5% -    5%) 0.868
                            Respell      204.23      (1.8%)      204.98      
(2.0%)    0.4% (  -3% -    4%) 0.679
                            Prefix3      225.23      (5.1%)      226.74      
(5.5%)    0.7% (  -9% -   11%) 0.786
                           Wildcard      170.34      (4.0%)      171.63      
(3.4%)    0.8% (  -6% -    8%) 0.665
                             IntNRQ       92.30     (11.9%)       95.15     
(10.2%)    3.1% ( -17% -   28%) 0.555
                        MedSpanNear        5.79      (6.8%)        5.99      
(9.3%)    3.4% ( -11% -   20%) 0.378
                          OrHighMed      104.41      (7.3%)      107.99      
(5.3%)    3.4% (  -8% -   17%) 0.253
                       HighSpanNear        2.47      (4.2%)        2.56      
(4.1%)    3.7% (  -4% -   12%) 0.059
                             Fuzzy2      139.96      (2.8%)      146.77      
(2.6%)    4.9% (   0% -   10%) 0.000
                        LowSpanNear       42.96      (3.6%)       45.21      
(2.5%)    5.2% (   0% -   11%) 0.000
                        AndHighHigh       33.24      (6.2%)       36.20      
(4.3%)    8.9% (  -1% -   20%) 0.000
                         AndHighMed      131.84      (5.2%)      144.31      
(3.2%)    9.5% (   0% -   18%) 0.000
              HighTermDayOfYearSort      186.67      (2.9%)      208.78      
(3.2%)   11.8% (   5% -   18%) 0.000
                         AndHighLow      590.69      (3.2%)      677.22      
(2.2%)   14.6% (   9% -   20%) 0.000
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to