Re: [PR] Add getValueSize() to Dictionary and ValueReader to avoid buffer allocation [pinot]

via GitHub Fri, 27 Mar 2026 17:28:21 -0700


Copilot commented on code in PR #18012:
URL: https://github.com/apache/pinot/pull/18012#discussion_r3003785062



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/MutableOffHeapByteArrayStore.java:
##########
@@ -216,6 +227,18 @@ public byte[] get(int index) {
     throw new RuntimeException("dictionary ID '" + index + "' too low");
   }
 
+  public int getValueSize(int index) {
+    List<Buffer> bufList = _buffers;
+    for (int x = bufList.size() - 1; x >= 0; x--) {
+      Buffer buffer = bufList.get(x);
+      if (index >= buffer.getStartIndex()) {
+        return buffer.getValueSize(index);

Review Comment:
   `getValueSize(int index)` passes the *global* dictionary id directly into 
`Buffer#getValueSize(...)`, but `Buffer` methods expect an index relative to 
`buffer.getStartIndex()` (as done in `get(...)` and `equalsValueAt(...)`). This 
will read the wrong offset entry (and can go out of bounds) for all buffers 
except the first, returning incorrect sizes. Adjust the index before calling 
into the buffer (e.g., `index - buffer.getStartIndex()`).
   ```suggestion
           return buffer.getValueSize(index - buffer.getStartIndex());
   ```



##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/readerwriter/FixedByteValueReaderWriterTest.java:
##########
@@ -75,6 +75,28 @@ public void testFixedByteValueReaderWriter(int 
maxStringLength, int configuredMa
     }
   }
 
+  @Test(dataProvider = "params")
+  public void testGetValueSize(int maxStringLength, int configuredMaxLength, 
ByteOrder byteOrder)
+      throws IOException {
+    byte[] bytes = new byte[configuredMaxLength];
+    try (PinotDataBuffer buffer = 
PinotDataBuffer.allocateDirect(configuredMaxLength * 1000L, byteOrder,
+        "testGetValueSize")) {
+      FixedByteValueReaderWriter readerWriter = new 
FixedByteValueReaderWriter(buffer);
+      List<Integer> lengths = new ArrayList<>(1000);
+      for (int i = 0; i < 1000; i++) {
+        int length = ThreadLocalRandom.current().nextInt(maxStringLength);
+        Arrays.fill(bytes, 0, length, (byte) 'a');
+        readerWriter.writeBytes(i, configuredMaxLength, bytes);
+        lengths.add(length);
+        Arrays.fill(bytes, 0, length, (byte) 0);

Review Comment:
   In the write loop, when `length` is 0, the buffer is never cleared after the 
previous iteration (both `Arrays.fill(bytes, 0, length, ...)` calls become 
no-ops). That means you can end up writing non-zero bytes while expecting 
`getUnpaddedByteSize(...)` to return 0, making this test 
non-deterministic/flaky. Clear the whole `bytes` array each iteration (e.g., 
`Arrays.fill(bytes, (byte) 0)` before filling) or ensure `length` is always > 0 
if zero-length values are out of scope.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add getValueSize() to Dictionary and ValueReader to avoid buffer allocation [pinot]

Reply via email to