yihua commented on code in PR #18465:
URL: https://github.com/apache/hudi/pull/18465#discussion_r3035635967


##########
hudi-io/src/test/java/org/apache/hudi/io/hfile/TestHFileWriter.java:
##########
@@ -185,6 +185,53 @@ void testUniqueKeyLocation() throws IOException {
     }
   }
 
+  @Test
+  void testLongKeys() throws IOException {
+    // Test that HFile blocks with long keys (>= 126 chars) can be written and 
read correctly.
+    // This verifies the fix for the varint encoding mismatch in the root 
index block.
+    HFileContext context = new HFileContext.Builder().blockSize(100).build();
+    String testFile = TEST_FILE;
+    int numRecords = 10;
+    // Generate keys longer than 126 characters to trigger multi-byte protobuf 
varint encoding
+    // in the root index block. The varint encodes (key_content_length + 2), 
so content >= 126
+    // produces a value >= 128 which requires 2+ bytes in protobuf varint 
format.
+    char[] chars = new char[200];
+    Arrays.fill(chars, 'a');
+    String longPrefix = new String(chars);
+    try (DataOutputStream outputStream =
+             new DataOutputStream(Files.newOutputStream(Paths.get(testFile)));
+         HFileWriter writer = new HFileWriterImpl(context, outputStream)) {
+      for (int i = 0; i < numRecords; i++) {
+        String key = longPrefix + String.format("%04d", i);

Review Comment:
   🤖 nit: this comment still says "multi-byte protobuf varint encoding" but the 
whole point of the fix is we're moving *away* from protobuf. Could you update 
it to say something like "multi-byte Hadoop VarInt encoding" to avoid confusing 
future readers?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to