mikemccand commented on a change in pull request #973: LUCENE-9027: Use SIMD instructions to decode postings. URL: https://github.com/apache/lucene-solr/pull/973#discussion_r347420183
########## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene84/PForUtil.java ########## @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.codecs.lucene84; + +import java.io.IOException; +import java.util.Arrays; + +import org.apache.lucene.store.DataInput; +import org.apache.lucene.store.DataOutput; +import org.apache.lucene.util.packed.PackedInts; + +/** + * Utility class to encode sequences of 128 small positive integers. + */ +final class PForUtil { + + static boolean allEqual(long[] l) { + for (int i = 1; i < ForUtil.BLOCK_SIZE; ++i) { + if (l[i] != l[0]) { + return false; + } + } + return true; + } + + private final ForUtil forUtil; + + PForUtil(ForUtil forUtil) { + this.forUtil = forUtil; + } + + /** + * Encode 128 integers from {@code longs} into {@code out}. + */ + void encode(long[] longs, DataOutput out) throws IOException { + // At most 3 exceptions + final long[] top4 = new long[4]; + Arrays.fill(top4, -1L); + for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) { + if (longs[i] > top4[0]) { + top4[0] = longs[i]; + Arrays.sort(top4); // For only 4 entries we just sort on every iteration instead of maintaining a PQ Review comment: Well I got too curious about this and played around with some silly micro-benchmarks. I tested three approaches. First approach is to inline the PQ as a `long[4]`: ``` private static long[] top4_a(long[] input) { long[] top4 = new long[4]; Arrays.fill(top4, Long.MIN_VALUE); for (long elem : input) { if (elem > top4[3]) { if (elem > top4[1]) { if (elem > top4[0]) { top4[3] = top4[2]; top4[2] = top4[1]; top4[1] = top4[0]; top4[0] = elem; } else { top4[3] = top4[2]; top4[2] = top4[1]; top4[1] = elem; } } else if (elem > top4[2]) { top4[3] = top4[2]; top4[2] = elem; } else { top4[3] = elem; } } } return top4; } ``` Second approach is the same thing, use local variables for the four slots instead of `long[]`: ``` private static long[] top4_b(long[] input) { long first = Long.MIN_VALUE; long second = Long.MIN_VALUE; long third = Long.MIN_VALUE; long forth = Long.MIN_VALUE; for (long elem : input) { if (elem > forth) { if (elem > second) { if (elem > first) { forth = third; third = second; second = first; first = elem; } else { forth = third; third = second; second = elem; } } else if (elem > third) { forth = third; third = elem; } else { forth = elem; } } } return new long[] {first, second, third, forth}; } ``` Last approach just uses `Arrays.sort` (like here): ``` private static long[] top4_c(long[] input) { long[] top4 = new long[4]; Arrays.fill(top4, Long.MIN_VALUE); for (long elem : input) { if (elem > top4[0]) { top4[0] = elem; Arrays.sort(top4); } } for (int i = 0; i < 2; i++) { // swap long x = top4[i]; top4[i] = top4[4 - i - 1]; top4[4 - i - 1] = x; } return top4; } ``` And then I wrote a silly micro-benchmark on a randomly generated `long[]`: ``` public static void main(String[] args) throws Exception { // long seed = System.currentTimeMillis(); long seed = 1574030230334L; System.out.println("SEED: " + seed); Random r = new Random(seed); /* for (int i = 0; i < 10000; i++) { int len = 1000 + r.nextInt(9000); int valueRange = 1000 + r.nextInt(9000); long[] values = new long[len]; for (int j = 0; j < len; j++) { values[j] = r.nextInt(valueRange); } long[] top4 = top4_b(values); Arrays.sort(values); long[] correctTop4 = new long[4]; for (int j = 0; j < 4; j++) { correctTop4[j] = values[len - j - 1]; } if (Arrays.equals(top4, correctTop4) == false) { throw new RuntimeException("FAILED:\n seed=" + seed + "\n input=" + Arrays.toString(values) + "\n top4=" + Arrays.toString(top4) + "\n answer=" + Arrays.toString(correctTop4)); } } */ int len = 1000000; long[] values = new long[len]; int valueRange = 10000 + r.nextInt(90000); for (int j = 0; j < len; j++) { values[j] = r.nextInt(valueRange); } // warmup for (int i = 0; i < 5000; i++) { long[] top4 = top4_c(values); } System.out.println("Done warmup"); // test long bestNS = Long.MAX_VALUE; for (int iter = 0; iter < 10; iter++) { long t0 = System.nanoTime(); for (int i = 0; i < 2000; i++) { long[] top4 = top4_c(values); } long t1 = System.nanoTime(); long ns = t1 - t0; String extra; if (ns < bestNS) { bestNS = ns; extra = " ***"; } else { extra = ""; } System.out.println(String.format(Locale.ROOT, "iter %d: %.3f sec%s", iter, (t1 - t0) / 1000000000., extra)); } } ``` And the results are interesting: it is a bit faster to "inline" the PQ approach: ``` top4_a: SEED: 1574030230334 Done warmup iter 0: 0.875 sec *** iter 1: 0.876 sec iter 2: 0.876 sec iter 3: 0.875 sec *** iter 4: 0.876 sec iter 5: 0.874 sec *** iter 6: 0.874 sec iter 7: 0.872 sec *** iter 8: 0.874 sec iter 9: 0.872 sec top4_b: SEED: 1574030230334 Done warmup iter 0: 0.788 sec *** iter 1: 0.790 sec iter 2: 0.786 sec *** iter 3: 0.784 sec *** iter 4: 0.784 sec iter 5: 0.784 sec iter 6: 0.785 sec iter 7: 0.785 sec iter 8: 0.786 sec iter 9: 0.785 sec top4_c: SEED: 1574030230334 Done warmup iter 0: 1.317 sec *** iter 1: 1.331 sec iter 2: 1.330 sec iter 3: 1.323 sec iter 4: 1.323 sec iter 5: 1.316 sec *** iter 6: 1.319 sec iter 7: 1.318 sec iter 8: 1.317 sec iter 9: 1.313 sec *** ``` I'm using Corretto JDK11: openjdk full version "11.0.4+11-LTS", with all defaults for the JVM. Net/net I don't think this warrants inlining the PQ, this is likely a small part of the overall indexing time, and the `Arrays.sort` approach is nice and compact and understandable. I was just curious ;) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org