jpountz commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1315146472
In case it helps this discussion, I ran the following code to get a sense of
the savings we could get assuming `2^24` vectors that are not clustered and 32
neighbors per vector.
```
public static void main(String[] args) throws IOException {
final int maxDoc = 1 << 24;
final int numNeighbors = 32;
final int[] ints = new int[numNeighbors];
Random r = new Random(0);
for (int i = 0; i < ints.length; ++i) {
ints[i] = r.nextInt(maxDoc);
}
Arrays.sort(ints);
System.out.println("Uncompressed:\t" + (ints.length * Integer.BYTES));
System.out.println("Bit-packed:\t" + (int)
Math.ceil(PackedInts.bitsRequired(maxDoc - 1) * ints.length / (double)
Byte.SIZE));
// Delta-code
for (int i = ints.length - 1; i > 0; --i) {
ints[i] -= ints[i-1];
}
ByteBuffersDataOutput out = new ByteBuffersDataOutput();
for (int i : ints) {
out.writeVInt(i);
}
System.out.println("Delta + vint:\t" + out.size());
int or = 0;
for (int i : ints) {
or |= i;
}
int deltaForStorage =
1 + // number of bits per value
(int) Math.ceil(PackedInts.bitsRequired(or) * ints.length / (double)
Byte.SIZE);
System.out.println("Delta + for (bpv = " + PackedInts.bitsRequired(or) +
"):\t" + deltaForStorage);
final int maxExceptions = 4;
PriorityQueue<Integer> pq = new PriorityQueue<Integer>(maxExceptions +
1) {
@Override
protected boolean lessThan(Integer a, Integer b) {
return a < b;
};
};
for (int i : ints) {
pq.insertWithOverflow(i);
}
final int numBitsPerValue = PackedInts.bitsRequired(pq.top());
final int maxValue = 1 << numBitsPerValue;
int numExceptions = 0;
for (int i : pq) {
if (i >= maxValue) {
numExceptions++;
}
}
final int deltaPforStorage =
1 + // number of bits per value + number of exceptions
(int) Math.ceil(numBitsPerValue * ints.length / (double) Byte.SIZE)
+ // unpatched values
numExceptions; // patches: 5 bits for the exception index, 3 for the
exception bits
System.out.println("Delta + pfor (bpv=" + numBitsPerValue + ",
exceptions=" + numExceptions + "):\t" + deltaPforStorage);
}
```
This outputs:
```
Uncompressed: 128
Bit-packed: 96
Delta + vint: 96
Delta + for (bpv = 22): 89
Delta + pfor (bpv=20, exceptions=4): 85
```
Now if I change the number of neighbors to 16:
```
Uncompressed: 64
Bit-packed: 48
Delta + vint: 50
Delta + for (bpv = 22): 45
Delta + pfor (bpv=21, exceptions=2): 45
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]