jpountz opened a new pull request, #14979: URL: https://github.com/apache/lucene/pull/14979
I remember benchmarking prefix sums quite extensively, and unrolled loops performed significantly better than their rolled on counterpart, both on micro and macro benchmarks: ```java private static void prefixSum(int[] arr, int len) { for (int i = 1; i < len; ++i) { arr[i] += arr[i-1]; } } ``` However, I recently discovered that rewriting the loop this way performs much better, and almost on par with the unrolled variant: ```java private static void prefixSum(int[] arr, int len) { int sum = 0; for (int i = 0; i < len; ++i) { sum += arr[i]; arr[i] = sum; } } ``` I haven't checked the assembly yet, but both a JMH benchmark and luceneutil agree that it doesn't introduce a slowdown, so I cut over prefix sums to this approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org