epotyom commented on PR #13657: URL: https://github.com/apache/lucene/pull/13657#issuecomment-2307371884
I've been running some experiments, so far this is what I observe: It is not important in which order we init collection of `Collector`s and call `createWeight`, but it is important that we allocate something on heap after `createWeight` call to **avoid** regression. The minimum size that has to be allocated seems to be ~44 bytes. Still don't have a very good theory that explains it. I believe unused arrays decrease the Survivor/Eden promotion ratio. It might be that the decrease is signifiant enough and fewer objects overflow to old gen? And old gen GC is not needed as often? And in baseline we create Collectors list a little bit later too, which might mean that it is garbage-collected in Eden space more often hence also decreases the ratio? I'll experiment with `-XX:SurvivorRatio` see if different values change anything. ### Test results I've added one line below `createWeight` which creates an empty array, e.g. `Object[] arr = new Object[7];` or `byte[] arr = new byte[23];`, etc #### Object[] array size 5: REGRESSION; size on heap (here and below the sizes are rough estimates to the best of my knowledge, not taking padding/OS cache, etc into account) `16+5*4 = 36`, 12+4 bytes object+length overhead; 4 bytes for each reference since max heap size is below 32G ([link](https://www.baeldung.com/java-size-of-object#:~:text=References%20have%20a%20typical%20size,%2D50%25%20more%20heap%20space.)) ``` Report after iter 7: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value HighTerm 337.78 (2.2%) 324.30 (3.3%) -4.0% ( -9% - 1%) 0.004 ``` size 6: REGRESSION; size on heap `16+6*4 = 40` ``` Report after iter 19: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value HighTerm 346.95 (3.5%) 332.90 (3.4%) -4.1% ( -10% - 2%) 0.000 ``` size 7: NO regression; size on heap `16+7*4 = 44` ``` Report after iter 19: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value HighTerm 364.91 (4.1%) 360.21 (3.7%) -1.3% ( -8% - 6%) 0.295 ``` #### int[] array size 6; size on heap `16+6*4 = 40`; REGRESSION; ``` Report after iter 16: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value HighTerm 360.99 (3.0%) 349.82 (2.6%) -3.1% ( -8% - 2%) 0.002 ``` size 7; size on heap `16+7*4 = 44`; NO regression ``` Report after iter 19: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value HighTerm 360.49 (4.4%) 355.37 (3.3%) -1.4% ( -8% - 6%) 0.252 ``` #### byte[] array size 10; size on heap `16+10 = 26`; REGRESSION ``` Report after iter 15: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value HighTerm 353.31 (3.9%) 325.77 (3.3%) -7.8% ( -14% - 0%) 0.000 ``` size 20; size on heap `16+20 = 36`; REGRESSION ``` Report after iter 19: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value HighTerm 356.40 (3.5%) 338.70 (3.1%) -5.0% ( -11% - 1%) 0.000 ``` size 22; size on heap `16+22 = 38`; REGRESSION ``` Report after iter 19: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value HighTerm 355.32 (4.9%) 342.58 (2.7%) -3.6% ( -10% - 4%) 0.004 ``` size 23; size on heap `16+23 = 39`; REGRESSION ``` Report after iter 19: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value HighTerm 357.27 (4.2%) 341.16 (2.9%) -4.5% ( -11% - 2%) 0.000 ``` size 24; size on heap `16+24 = 40`; REGRESSION/NO regression ``` Report after iter 15: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value HighTerm 349.07 (3.1%) 349.22 (2.6%) 0.0% ( -5% - 5%) 0.966 ``` ``` Report after iter 19: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value HighTerm 361.25 (4.1%) 338.94 (3.2%) -6.2% ( -12% - 1%) 0.000 ``` size 25; size on heap `16+25 = 41`; REGRESSION/NO regression ``` 1st run: Report after iter 19: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value HighTerm 357.41 (3.5%) 356.03 (4.1%) -0.4% ( -7% - 7%) 0.747 2nd run: Report after iter 19: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value HighTerm 359.64 (3.9%) 350.90 (3.6%) -2.4% ( -9% - 5%) 0.042 ``` size 28; size on heap `16+28 = 44`; NO regression ``` 1st run Report after iter 19: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value HighTerm 360.73 (4.1%) 353.56 (3.9%) -2.0% ( -9% - 6%) 0.115 2nd run Report after iter 19: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value HighTerm 359.99 (2.9%) 360.59 (3.9%) 0.2% ( -6% - 7%) 0.878 3rd run Report after iter 19: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value HighTerm 353.56 (4.5%) 350.58 (4.3%) -0.8% ( -9% - 8%) 0.541 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org