epotyom commented on PR #13657:
URL: https://github.com/apache/lucene/pull/13657#issuecomment-2307371884

   I've been running some experiments, so far this is what I observe: It is not 
important in which order we init collection of `Collector`s and call 
`createWeight`, but it is important that we allocate something on heap after 
`createWeight` call to **avoid** regression. The minimum size that has to be 
allocated seems to be ~44 bytes.
   
   Still don't have a very good theory that explains it. I believe unused 
arrays decrease the Survivor/Eden promotion ratio. It might be that the 
decrease is signifiant enough and fewer objects overflow to old gen? And old 
gen GC is not needed as often? And in baseline we create Collectors list a 
little bit later too, which might mean that it is garbage-collected in Eden 
space more often hence also decreases the ratio? I'll experiment with 
`-XX:SurvivorRatio` see if different values change anything.
   
   ### Test results
   
   I've added one line below `createWeight` which creates an empty array, e.g. 
`Object[] arr = new Object[7];` or `byte[] arr = new byte[23];`, etc
   
   #### Object[] array
   
   size 5: REGRESSION; size on heap (here and below the sizes are rough 
estimates to the best of my knowledge, not taking padding/OS cache, etc into 
account) `16+5*4 = 36`, 12+4 bytes object+length overhead; 4 bytes for each 
reference since max heap size is below 32G 
([link](https://www.baeldung.com/java-size-of-object#:~:text=References%20have%20a%20typical%20size,%2D50%25%20more%20heap%20space.))
   ```
   Report after iter 7:
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                           HighTerm      337.78      (2.2%)      324.30      
(3.3%)   -4.0% (  -9% -    1%) 0.004
   ```
   
   size 6: REGRESSION; size on heap `16+6*4 = 40`
   ```
   Report after iter 19:
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                           HighTerm      346.95      (3.5%)      332.90      
(3.4%)   -4.1% ( -10% -    2%) 0.000
   ```
   
   size 7: NO regression; size on heap `16+7*4 = 44`
   ```
   Report after iter 19:
   
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                           HighTerm      364.91      (4.1%)      360.21      
(3.7%)   -1.3% (  -8% -    6%) 0.295
   ```
   
   #### int[] array 
   
   size 6; size on heap `16+6*4 = 40`; REGRESSION; 
   ```
   Report after iter 16:
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                           HighTerm      360.99      (3.0%)      349.82      
(2.6%)   -3.1% (  -8% -    2%) 0.002
   ```
   
   size 7; size on heap `16+7*4 = 44`; NO regression
   ```
   Report after iter 19:
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                           HighTerm      360.49      (4.4%)      355.37      
(3.3%)   -1.4% (  -8% -    6%) 0.252
   ```
   #### byte[] array
   size 10; size on heap `16+10 = 26`; REGRESSION
   ```
   Report after iter 15:
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                           HighTerm      353.31      (3.9%)      325.77      
(3.3%)   -7.8% ( -14% -    0%) 0.000
   ```
   
   size 20; size on heap `16+20 = 36`; REGRESSION
   ```
   Report after iter 19:
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                           HighTerm      356.40      (3.5%)      338.70      
(3.1%)   -5.0% ( -11% -    1%) 0.000
   ```
   
   size 22; size on heap `16+22 = 38`; REGRESSION
   ```
   Report after iter 19:
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                           HighTerm      355.32      (4.9%)      342.58      
(2.7%)   -3.6% ( -10% -    4%) 0.004
   ```
   size 23; size on heap `16+23 = 39`; REGRESSION
   ```
   Report after iter 19:
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                           HighTerm      357.27      (4.2%)      341.16      
(2.9%)   -4.5% ( -11% -    2%) 0.000
   ```
   size 24; size on heap `16+24 = 40`; REGRESSION/NO regression
   ```
   Report after iter 15:
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                           HighTerm      349.07      (3.1%)      349.22      
(2.6%)    0.0% (  -5% -    5%) 0.966
   ```
   
   ```
   Report after iter 19:
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                           HighTerm      361.25      (4.1%)      338.94      
(3.2%)   -6.2% ( -12% -    1%) 0.000
   ```
   
   size 25; size on heap `16+25 = 41`; REGRESSION/NO regression
   ```
   1st run:
   Report after iter 19:
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                           HighTerm      357.41      (3.5%)      356.03      
(4.1%)   -0.4% (  -7% -    7%) 0.747
   
   2nd run:
   Report after iter 19:
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                           HighTerm      359.64      (3.9%)      350.90      
(3.6%)   -2.4% (  -9% -    5%) 0.042
   ```
   
   size 28; size on heap `16+28 = 44`; NO regression
   ```
   1st run
   Report after iter 19:
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                           HighTerm      360.73      (4.1%)      353.56      
(3.9%)   -2.0% (  -9% -    6%) 0.115
   
   2nd run
   Report after iter 19:
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                           HighTerm      359.99      (2.9%)      360.59      
(3.9%)    0.2% (  -6% -    7%) 0.878
   
   3rd run
   Report after iter 19:
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                           HighTerm      353.56      (4.5%)      350.58      
(4.3%)   -0.8% (  -9% -    8%) 0.541
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to