mikemccand commented on code in PR #12631: URL: https://github.com/apache/lucene/pull/12631#discussion_r1349699402
########## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/FieldReader.java: ########## @@ -99,6 +102,26 @@ public final class FieldReader extends Terms { */ } + long readLongOutput(DataInput in) throws IOException { Review Comment: Maybe rename to `readVLongOutput`? Make it clearly we are doing some form of vlong encoding? ########## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsWriter.java: ########## @@ -460,6 +460,19 @@ public String toString() { return "BLOCK: prefix=" + brToString(prefix); } + private static void writeMSBVLong(long i, DataOutput scratchBytes) throws IOException { Review Comment: Maybe rename `i` to `value`? And then you can use `i` for the for loop below? ########## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsReader.java: ########## @@ -81,8 +81,11 @@ public final class Lucene90BlockTreeTermsReader extends FieldsProducer { /** Initial terms format. */ public static final int VERSION_START = 0; + /** Version that uses MSB VLong encoded output */ + public static final int VERSION_MSB_VLONG_OUTPUT = 1; + /** Current terms format. */ - public static final int VERSION_CURRENT = VERSION_START; + public static final int VERSION_CURRENT = VERSION_MSB_VLONG_OUTPUT; Review Comment: Add a comment explaining what changed, and link to the GitHub issue? ########## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsWriter.java: ########## @@ -460,6 +460,19 @@ public String toString() { return "BLOCK: prefix=" + brToString(prefix); } + private static void writeMSBVLong(long i, DataOutput scratchBytes) throws IOException { + assert i >= 0; + // Keep zero bits on most significant byte to have more chance to get prefix bytes shared. + // e.g. we expect 0x7FFF stored as [0x81, 0xFF, 0x7F] but not [0xFF, 0xFF, 0x40] + int LSBVLongBytes = (Long.SIZE - Long.numberOfLeadingZeros(i) - 1) / 7 + 1; Review Comment: Maybe rename to `bytesNeeded`? (Full disclosure this was suggested by [Bard](https://bard.google.com)). ########## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsWriter.java: ########## @@ -460,6 +460,19 @@ public String toString() { return "BLOCK: prefix=" + brToString(prefix); } + private static void writeMSBVLong(long i, DataOutput scratchBytes) throws IOException { + assert i >= 0; + // Keep zero bits on most significant byte to have more chance to get prefix bytes shared. + // e.g. we expect 0x7FFF stored as [0x81, 0xFF, 0x7F] but not [0xFF, 0xFF, 0x40] + int LSBVLongBytes = (Long.SIZE - Long.numberOfLeadingZeros(i) - 1) / 7 + 1; + i <<= Long.SIZE - LSBVLongBytes * 7; + for (int j = 1; j < LSBVLongBytes; j++) { + scratchBytes.writeByte((byte) (((i >>> 57) & 0x7FL) | 0x80)); Review Comment: Maybe @uschindler can help review this heavy math :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org