jpountz commented on PR #11860: URL: https://github.com/apache/lucene/pull/11860#issuecomment-1315146472
In case it helps this discussion, I ran the following code to get a sense of the savings we could get assuming `2^24` vectors that are not clustered and 32 neighbors per vector. ``` public static void main(String[] args) throws IOException { final int maxDoc = 1 << 24; final int numNeighbors = 32; final int[] ints = new int[numNeighbors]; Random r = new Random(0); for (int i = 0; i < ints.length; ++i) { ints[i] = r.nextInt(maxDoc); } Arrays.sort(ints); System.out.println("Uncompressed:\t" + (ints.length * Integer.BYTES)); System.out.println("Bit-packed:\t" + (int) Math.ceil(PackedInts.bitsRequired(maxDoc - 1) * ints.length / (double) Byte.SIZE)); // Delta-code for (int i = ints.length - 1; i > 0; --i) { ints[i] -= ints[i-1]; } ByteBuffersDataOutput out = new ByteBuffersDataOutput(); for (int i : ints) { out.writeVInt(i); } System.out.println("Delta + vint:\t" + out.size()); int or = 0; for (int i : ints) { or |= i; } int deltaForStorage = 1 + // number of bits per value (int) Math.ceil(PackedInts.bitsRequired(or) * ints.length / (double) Byte.SIZE); System.out.println("Delta + for (bpv = " + PackedInts.bitsRequired(or) + "):\t" + deltaForStorage); final int maxExceptions = 4; PriorityQueue<Integer> pq = new PriorityQueue<Integer>(maxExceptions + 1) { @Override protected boolean lessThan(Integer a, Integer b) { return a < b; }; }; for (int i : ints) { pq.insertWithOverflow(i); } final int numBitsPerValue = PackedInts.bitsRequired(pq.top()); final int maxValue = 1 << numBitsPerValue; int numExceptions = 0; for (int i : pq) { if (i >= maxValue) { numExceptions++; } } final int deltaPforStorage = 1 + // number of bits per value + number of exceptions (int) Math.ceil(numBitsPerValue * ints.length / (double) Byte.SIZE) + // unpatched values numExceptions; // patches: 5 bits for the exception index, 3 for the exception bits System.out.println("Delta + pfor (bpv=" + numBitsPerValue + ", exceptions=" + numExceptions + "):\t" + deltaPforStorage); } ``` This outputs: ``` Uncompressed: 128 Bit-packed: 96 Delta + vint: 96 Delta + for (bpv = 22): 89 Delta + pfor (bpv=20, exceptions=4): 85 ``` Now if I change the number of neighbors to 16: ``` Uncompressed: 64 Bit-packed: 48 Delta + vint: 50 Delta + for (bpv = 22): 45 Delta + pfor (bpv=21, exceptions=2): 45 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org