jpountz opened a new pull request, #14979:
URL: https://github.com/apache/lucene/pull/14979

   I remember benchmarking prefix sums quite extensively, and unrolled loops 
performed significantly better than their rolled on counterpart, both on micro 
and macro benchmarks:
   
   ```java
   private static void prefixSum(int[] arr, int len) {
     for (int i = 1; i < len; ++i) {
       arr[i] += arr[i-1];
     }
   }
   ```
   
   However, I recently discovered that rewriting the loop this way performs 
much better, and almost on par with the unrolled variant:
   
   ```java
   private static void prefixSum(int[] arr, int len) {
     int sum = 0;
     for (int i = 0; i < len; ++i) {
       sum += arr[i];
       arr[i] = sum;
     }
   }
   ```
   
   I haven't checked the assembly yet, but both a JMH benchmark and luceneutil 
agree that it doesn't introduce a slowdown, so I cut over prefix sums to this 
approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to