mikemccand commented on a change in pull request #973: LUCENE-9027: Use SIMD 
instructions to decode postings.
URL: https://github.com/apache/lucene-solr/pull/973#discussion_r347420183
 
 

 ##########
 File path: lucene/core/src/java/org/apache/lucene/codecs/lucene84/PForUtil.java
 ##########
 @@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.lucene84;
+
+import java.io.IOException;
+import java.util.Arrays;
+
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.store.DataOutput;
+import org.apache.lucene.util.packed.PackedInts;
+
+/**
+ * Utility class to encode sequences of 128 small positive integers.
+ */
+final class PForUtil {
+
+  static boolean allEqual(long[] l) {
+    for (int i = 1; i < ForUtil.BLOCK_SIZE; ++i) {
+      if (l[i] != l[0]) {
+        return false;
+      }
+    }
+    return true;
+  }
+
+  private final ForUtil forUtil;
+
+  PForUtil(ForUtil forUtil) {
+    this.forUtil = forUtil;
+  }
+
+  /**
+   * Encode 128 integers from {@code longs} into {@code out}.
+   */
+  void encode(long[] longs, DataOutput out) throws IOException {
+    // At most 3 exceptions
+    final long[] top4 = new long[4];
+    Arrays.fill(top4, -1L);
+    for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) {
+      if (longs[i] > top4[0]) {
+        top4[0] = longs[i];
+        Arrays.sort(top4); // For only 4 entries we just sort on every 
iteration instead of maintaining a PQ
 
 Review comment:
   Well I got too curious about this and played around with some silly 
micro-benchmarks.  I tested three approaches.
   
   First approach is to inline the PQ as a `long[4]`:
   
   ```
       private static long[] top4_a(long[] input) {
           long[] top4 = new long[4];
           Arrays.fill(top4, Long.MIN_VALUE);
           for (long elem : input) {
               if (elem > top4[3]) {
                   if (elem > top4[1]) {
                       if (elem > top4[0]) {
                           top4[3] = top4[2];
                           top4[2] = top4[1];
                           top4[1] = top4[0];
                           top4[0] = elem;
                       } else {
                           top4[3] = top4[2];
                           top4[2] = top4[1];
                           top4[1] = elem;
                       }
                   } else if (elem > top4[2]) {
                       top4[3] = top4[2];
                       top4[2] = elem;
                   } else {
                       top4[3] = elem;
                   }
               }
           }
   
           return top4;
       }
   ```
   
   Second approach is the same thing, use local variables for the four slots 
instead of `long[]`:
   
   ```
       private static long[] top4_b(long[] input) {
           long first = Long.MIN_VALUE;
           long second = Long.MIN_VALUE;
           long third = Long.MIN_VALUE;
           long forth = Long.MIN_VALUE;
           for (long elem : input) {
               if (elem > forth) {
                   if (elem > second) {
                       if (elem > first) {
                           forth = third;
                           third = second;
                           second = first;
                           first = elem;
                       } else {
                           forth = third;
                           third = second;
                           second = elem;
                       }
                   } else if (elem > third) {
                       forth = third;
                       third = elem;
                   } else {
                       forth = elem;
                   }
               }
           }
   
           return new long[] {first, second, third, forth};
       }
   ```
   
   Last approach just uses `Arrays.sort` (like here):
   
   ```
       private static long[] top4_c(long[] input) {
           long[] top4 = new long[4];
           Arrays.fill(top4, Long.MIN_VALUE);
           for (long elem : input) {
               if (elem > top4[0]) {
                   top4[0] = elem;
                   Arrays.sort(top4);
               }
           }
   
           for (int i = 0; i < 2; i++) {
               // swap                                                          
                                                                                
                          
               long x = top4[i];
               top4[i] = top4[4 - i - 1];
               top4[4 - i - 1] = x;
           }
   
           return top4;
       }
   ```
   
   And then I wrote a silly micro-benchmark on a randomly generated `long[]`:
   
   ```
       public static void main(String[] args) throws Exception {
   
           // long seed = System.currentTimeMillis();                           
                                                                                
                          
           long seed = 1574030230334L;
           System.out.println("SEED: " + seed);
           Random r = new Random(seed);
   
           /*                                                                   
                                                                                
                          
           for (int i = 0; i < 10000; i++) {                                    
                                                                                
                          
               int len = 1000 + r.nextInt(9000);                                
                                                                                
                          
               int valueRange = 1000 + r.nextInt(9000);                         
                                                                                
                          
               long[] values = new long[len];                                   
                                                                                
                          
               for (int j = 0; j < len; j++) {                                  
                                                                                
                          
                   values[j] = r.nextInt(valueRange);                           
                                                                                
                          
               }                                                                
                                                                                
                          
               long[] top4 = top4_b(values);                                    
                                                                                
                          
               Arrays.sort(values);                                             
                                                                                
                          
               long[] correctTop4 = new long[4];                                
                                                                                
                          
               for (int j = 0; j < 4; j++) {                                    
                                                                                
                          
                   correctTop4[j] = values[len - j - 1];                        
                                                                                
                          
               }                                                                
                                                                                
                          
               if (Arrays.equals(top4, correctTop4) == false) {                 
                                                                                
                          
                   throw new RuntimeException("FAILED:\n  seed=" + seed + "\n  
input=" + Arrays.toString(values) +                                             
                           
                                              "\n  top4=" + 
Arrays.toString(top4) + "\n answer=" + Arrays.toString(correctTop4));           
                                              
               }                                                                
                                                                                
                          
           }                                                                    
                                                                                
                          
           */
   
           int len = 1000000;
           long[] values = new long[len];
           int valueRange = 10000 + r.nextInt(90000);
           for (int j = 0; j < len; j++) {
               values[j] = r.nextInt(valueRange);
           }
           // warmup                                                            
                                                                                
                          
           for (int i = 0; i < 5000; i++) {
               long[] top4 = top4_c(values);
           }
           System.out.println("Done warmup");
           // test                                                              
                                                                                
                          
           long bestNS = Long.MAX_VALUE;
           for (int iter = 0; iter < 10; iter++) {
               long t0 = System.nanoTime();
               for (int i = 0; i < 2000; i++) {
                   long[] top4 = top4_c(values);
               }
               long t1 = System.nanoTime();
               long ns = t1 - t0;
               String extra;
               if (ns < bestNS) {
                   bestNS = ns;
                   extra = " ***";
               } else {
                   extra = "";
               }
               System.out.println(String.format(Locale.ROOT, "iter %d: %.3f 
sec%s", iter, (t1 - t0) / 1000000000., extra));
           }
       }
   ```
   
   And the results are interesting: it is a bit faster to "inline" the PQ 
approach:
   
   ```
   top4_a:                                                                      
                                                                                
                          
                                                                                
                                                                                
                          
   SEED: 1574030230334                                                          
                                                                                
                          
   Done warmup                                                                  
                                                                                
                          
   iter 0: 0.875 sec ***                                                        
                                                                                
                          
   iter 1: 0.876 sec                                                            
                                                                                
                          
   iter 2: 0.876 sec                                                            
                                                                                
                          
   iter 3: 0.875 sec ***                                                        
                                                                                
                          
   iter 4: 0.876 sec                                                            
                                                                                
                          
   iter 5: 0.874 sec ***                                                        
                                                                                
                          
   iter 6: 0.874 sec                                                            
                                                                                
                          
   iter 7: 0.872 sec ***                                                        
                                                                                
                          
   iter 8: 0.874 sec                                                            
                                                                                
                          
   iter 9: 0.872 sec                                                            
                                                                                
                          
                                                                                
                                                                                
                          
                                                                                
                                                                                
                          
   top4_b:                                                                      
                                                                                
                          
                                                                                
                                                                                
                          
   SEED: 1574030230334                                                          
                                                                                
                          
   Done warmup                                                                  
                                                                                
                          
   iter 0: 0.788 sec ***                                                        
                                                                                
                          
   iter 1: 0.790 sec                                                            
                                                                                
                          
   iter 2: 0.786 sec ***                                                        
                                                                                
                          
   iter 3: 0.784 sec ***                                                        
                                                                                
                          
   iter 4: 0.784 sec                                                            
                                                                                
                          
   iter 5: 0.784 sec                                                            
                                                                                
                          
   iter 6: 0.785 sec                                                            
                                                                                
                          
   iter 7: 0.785 sec                                                            
                                                                                
                          
   iter 8: 0.786 sec                                                            
                                                                                
                          
   iter 9: 0.785 sec                                                            
                                                                                
                          
                                                                                
                                                                                
                          
                                                                                
                                                                                
                          
   top4_c:                                                                      
                                                                                
                          
   SEED: 1574030230334                                                          
                                                                                
                          
   Done warmup                                                                  
                                                                                
                          
   iter 0: 1.317 sec ***                                                        
                                                                                
                          
   iter 1: 1.331 sec                                                            
                                                                                
                          
   iter 2: 1.330 sec                                                            
                                                                                
                          
   iter 3: 1.323 sec                                                            
                                                                                
                          
   iter 4: 1.323 sec                                                            
                                                                                
                          
   iter 5: 1.316 sec ***                                                        
                                                                                
                          
   iter 6: 1.319 sec                                                            
                                                                                
                          
   iter 7: 1.318 sec                                                            
                                                                                
                          
   iter 8: 1.317 sec                                                            
                                                                                
                          
   iter 9: 1.313 sec ***                                                        
                                                                                
                          
   ```
   
   I'm using Corretto JDK11: openjdk full version "11.0.4+11-LTS", with all 
defaults for the JVM.
   
   Net/net I don't think this warrants inlining the PQ, this is likely a small 
part of the overall indexing time, and the `Arrays.sort` approach is nice and 
compact and understandable.  I was just curious ;)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to