gsmiller commented on a change in pull request #69:
URL: https://github.com/apache/lucene/pull/69#discussion_r609160727



##########
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/PForUtil.java
##########
@@ -121,4 +167,146 @@ void skip(DataInput in) throws IOException {
       in.skipBytes(forUtil.numBytes(bitsPerValue) + (numExceptions << 1));
     }
   }
+
+  /**
+   * Fill {@code longs} with the final values for the case of all deltas being 
1. Note this assumes
+   * there are no exceptions to apply.
+   */
+  private static void prefixSumOfOnes(long[] longs, long base) {
+    System.arraycopy(IDENTITY_PLUS_ONE, 0, longs, 0, ForUtil.BLOCK_SIZE);
+    // This loop gets auto-vectorized
+    for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) {
+      longs[i] += base;
+    }
+  }
+
+  /**
+   * Fill {@code longs} with the final values for the case of all deltas being 
{@code val}. Note
+   * this assumes there are no exceptions to apply.
+   */
+  private static void prefixSumOf(long[] longs, long base, long val) {
+    for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) {
+      longs[i] = (i + 1) * val + base;

Review comment:
       Hmm... I'm not getting anything useful out of `perfasm`. I've got `perf` 
installed and running on linux, but no matter what I seem to do, I just get the 
following (I've tried lots of different settings for `hotThreshold` and 
`frequency`, and have also increased the number of benchmark iterations and 
time-per-iteration in various ways):
   ```
   Secondary result 
"jpountz.PackedIntsDeltaDecodeBenchmark.pForDeltaDecoder:·asm":
   PrintAssembly processed: 361476 total address lines.
   Perf output processed (skipped 16.036 seconds):
    Column 1: cycles (0 events)
   
   WARNING: No hottest code region above the threshold (10.00%) for disassembly.
   Use "hotThreshold" profiler option to lower the filter threshold.
   
   ....[Hottest 
Regions]...............................................................................
   
....................................................................................................
            <totals>
   
   ....[Hottest Methods (after 
inlining)]..............................................................
   
....................................................................................................
            <totals>
   
   ....[Distribution by 
Source]........................................................................
   
....................................................................................................
            <totals>
   
   WARNING: The perf event count is suspiciously low (0). The performance data 
might be
   inaccurate or misleading. Try to do the profiling again, or tune up the 
sampling frequency.
   With some profilers on Mac OS X, System Integrity Protection (SIP) may 
prevent profiling.
   In such case, temporarily disabling SIP with 'csrutil disable' might help.
   ```
   
   I'm wondering if this is due to running on a virtual machine (AWS ec2 host)? 
I came across this 
[read](http://psy-lob-saw.blogspot.com/2015/07/jmh-perfasm.html) that seemed to 
indicate the need to run on "real hardware." I tried @jpountz original 
benchmark branch as well with `-prof perfasm` and get the same issue. I've also 
confirmed that `-XX:+PrintAssembly` is working on this machine (with 
`hsdis-amd64.so` under `JAVA_HOME/lib/server`). I'm calling it for today but 
will see if I can make more progress tomorrow. If I'm doing something obviously 
wrong here, please point it out! Thanks again for the help!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to