gsmiller commented on a change in pull request #69: URL: https://github.com/apache/lucene/pull/69#discussion_r609160727
########## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/PForUtil.java ########## @@ -121,4 +167,146 @@ void skip(DataInput in) throws IOException { in.skipBytes(forUtil.numBytes(bitsPerValue) + (numExceptions << 1)); } } + + /** + * Fill {@code longs} with the final values for the case of all deltas being 1. Note this assumes + * there are no exceptions to apply. + */ + private static void prefixSumOfOnes(long[] longs, long base) { + System.arraycopy(IDENTITY_PLUS_ONE, 0, longs, 0, ForUtil.BLOCK_SIZE); + // This loop gets auto-vectorized + for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) { + longs[i] += base; + } + } + + /** + * Fill {@code longs} with the final values for the case of all deltas being {@code val}. Note + * this assumes there are no exceptions to apply. + */ + private static void prefixSumOf(long[] longs, long base, long val) { + for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { + longs[i] = (i + 1) * val + base; Review comment: Hmm... I'm not getting anything useful out of `perfasm`. I've got `perf` installed and running on linux, but no matter what I seem to do, I just get the following (I've tried lots of different settings for `hotThreshold` and `frequency`, and have also increased the number of benchmark iterations and time-per-iteration in various ways): ``` Secondary result "jpountz.PackedIntsDeltaDecodeBenchmark.pForDeltaDecoder:·asm": PrintAssembly processed: 361476 total address lines. Perf output processed (skipped 16.036 seconds): Column 1: cycles (0 events) WARNING: No hottest code region above the threshold (10.00%) for disassembly. Use "hotThreshold" profiler option to lower the filter threshold. ....[Hottest Regions]............................................................................... .................................................................................................... <totals> ....[Hottest Methods (after inlining)].............................................................. .................................................................................................... <totals> ....[Distribution by Source]........................................................................ .................................................................................................... <totals> WARNING: The perf event count is suspiciously low (0). The performance data might be inaccurate or misleading. Try to do the profiling again, or tune up the sampling frequency. With some profilers on Mac OS X, System Integrity Protection (SIP) may prevent profiling. In such case, temporarily disabling SIP with 'csrutil disable' might help. ``` I'm wondering if this is due to running on a virtual machine (AWS ec2 host)? I came across this [read](http://psy-lob-saw.blogspot.com/2015/07/jmh-perfasm.html) that seemed to indicate the need to run on "real hardware." I tried @jpountz original benchmark branch as well with `-prof perfasm` and get the same issue. I've also confirmed that `-XX:+PrintAssembly` is working on this machine (with `hsdis-amd64.so` under `JAVA_HOME/lib/server`). I'm calling it for today but will see if I can make more progress tomorrow. If I'm doing something obviously wrong here, please point it out! Thanks again for the help! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org