jpountz commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1315146472

   In case it helps this discussion, I ran the following code to get a sense of 
the savings we could get assuming `2^24` vectors that are not clustered and 32 
neighbors per vector.
   
   ```
     public static void main(String[] args) throws IOException {
       final int maxDoc = 1 << 24;
       final int numNeighbors = 32;
       final int[] ints = new int[numNeighbors];
       Random r = new Random(0);
       for (int i = 0; i < ints.length; ++i) {
         ints[i] = r.nextInt(maxDoc);
       }
       Arrays.sort(ints);
       System.out.println("Uncompressed:\t" + (ints.length * Integer.BYTES));
       System.out.println("Bit-packed:\t" + (int) 
Math.ceil(PackedInts.bitsRequired(maxDoc - 1) * ints.length / (double) 
Byte.SIZE));
       // Delta-code
       for (int i = ints.length - 1; i > 0; --i) {
         ints[i] -= ints[i-1];
       }
       ByteBuffersDataOutput out = new ByteBuffersDataOutput();
       for (int i : ints) {
         out.writeVInt(i);
       }
       System.out.println("Delta + vint:\t" + out.size());
       int or = 0;
       for (int i : ints) {
         or |= i;
       }
       int deltaForStorage =
           1 + // number of bits per value
           (int) Math.ceil(PackedInts.bitsRequired(or) * ints.length / (double) 
Byte.SIZE);
       System.out.println("Delta + for (bpv = " + PackedInts.bitsRequired(or) + 
"):\t" + deltaForStorage);
       final int maxExceptions = 4;
       PriorityQueue<Integer> pq = new PriorityQueue<Integer>(maxExceptions + 
1) {
         @Override
         protected boolean lessThan(Integer a, Integer b) {
           return a < b;
         };
       };
       for (int i : ints) {
         pq.insertWithOverflow(i);
       }
       final int numBitsPerValue = PackedInts.bitsRequired(pq.top());
       final int maxValue = 1 << numBitsPerValue;
       int numExceptions = 0;
       for (int i : pq) {
         if (i >= maxValue) {
           numExceptions++;
         }
       }
       final int deltaPforStorage =
           1 + // number of bits per value + number of exceptions
           (int) Math.ceil(numBitsPerValue * ints.length / (double) Byte.SIZE) 
+ // unpatched values
           numExceptions; // patches: 5 bits for the exception index, 3 for the 
exception bits
       System.out.println("Delta + pfor (bpv=" + numBitsPerValue + ", 
exceptions=" + numExceptions + "):\t" + deltaPforStorage);
     }
   ```
   
   This outputs:
   
   ```
   Uncompressed:        128
   Bit-packed:  96
   Delta + vint:        96
   Delta + for (bpv = 22):      89
   Delta + pfor (bpv=20, exceptions=4): 85
   ```
   
   Now if I change the number of neighbors to 16:
   
   ```
   Uncompressed:        64
   Bit-packed:  48
   Delta + vint:        50
   Delta + for (bpv = 22):      45
   Delta + pfor (bpv=21, exceptions=2): 45
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to