RamakrishnaChilaka commented on code in PR #15165:
URL: https://github.com/apache/lucene/pull/15165#discussion_r2332065565


##########
lucene/core/src/java/org/apache/lucene/codecs/lucene103/PForUtil.java:
##########
@@ -46,34 +45,39 @@ static boolean allEqual(int[] l) {
 
   /** Encode 128 integers from {@code ints} into {@code out}. */
   void encode(int[] ints, DataOutput out) throws IOException {
-    // Determine the top MAX_EXCEPTIONS + 1 values
-    final LongHeap top = new LongHeap(MAX_EXCEPTIONS + 1);
-    for (int i = 0; i <= MAX_EXCEPTIONS; ++i) {
-      top.push(ints[i]);
-    }
-    long topValue = top.top();
-    for (int i = MAX_EXCEPTIONS + 1; i < ForUtil.BLOCK_SIZE; ++i) {
-      if (ints[i] > topValue) {
-        topValue = top.updateTop(ints[i]);
-      }
-    }
-
-    long max = 0L;
-    for (int i = 1; i <= top.size(); ++i) {
-      max = Math.max(max, top.get(i));
+    // histogram of bit widths
+    final int[] histogram = new int[32];
+    int maxBitsRequired = 0;
+    for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) {
+      final int v = ints[i];
+      final int bits = PackedInts.bitsRequired(v);
+      histogram[bits]++;
+      maxBitsRequired = Math.max(maxBitsRequired, bits);
     }
 
-    final int maxBitsRequired = PackedInts.bitsRequired(max);
-    // We store the patch on a byte, so we can't decrease the number of bits 
required by more than 8
-    final int patchedBitsRequired =
-        Math.max(PackedInts.bitsRequired(topValue), maxBitsRequired - 8);
+    int bestCost = Integer.MAX_VALUE;
+    int patchedBitsRequired = maxBitsRequired;
     int numExceptions = 0;
-    final long maxUnpatchedValue = (1L << patchedBitsRequired) - 1;
-    for (int i = 2; i <= top.size(); ++i) {
-      if (top.get(i) > maxUnpatchedValue) {
-        numExceptions++;
+
+    // We store patch on a byte, so we can't decrease bits by more than 8
+    final int minBits = Math.max(0, maxBitsRequired - 8);
+    int cumulativeExceptions = 0;
+    for (int b = maxBitsRequired; b >= minBits; --b) {
+      // Early termination if too many exceptions
+      if (cumulativeExceptions > MAX_EXCEPTIONS) break;
+
+      // storage cost
+      int cost = (ForUtil.BLOCK_SIZE * b) + (cumulativeExceptions << 4);
+      if (cost < bestCost) {

Review Comment:
   I agree that it is always true i.e., cost will never swing with 7 
Exceptions. However, I have initially added it as we might increase 
NUM_EXCEPTIONS beyond 7 (with update to the on-disk codec to store 
num_exceptions in more than 3 bits). I have removed the cost logic in the 
latest commit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to