richardstartin commented on code in PR #10044: URL: https://github.com/apache/pinot/pull/10044#discussion_r1059802361
########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/BaseImmutableDictionary.java: ########## @@ -259,8 +259,7 @@ protected int binarySearch(byte[] value) { while (low <= high) { int mid = (low + high) >>> 1; - byte[] midValue = _valueReader.getBytes(mid, _numBytesPerValue); - int compareResult = ByteArray.compare(midValue, value); + int compareResult = _valueReader.compareUnsignedBytes(mid, _numBytesPerValue, value); Review Comment: The existing logic here appears to be broken for fixed length `FixedByteValueReaderWriter`: it compares `value` with the _padded_ stored value retrieved by `ValueReader.getBytes`. This comparison will break ties using the lengths when mismatches can't be found, meaning the comparison will never yield zero unless `value` is also padded. I can only find testing for this method with byte arrays with the same size, so this problem would not arise in tests, but I can't find where the padding required to make this work with arbitrary length byte arrays is taking place. I decided to roll this change back as addressing this problem is bigger than the scope of the change I wanted to make, and `STRING` columns are much more common than `BYTES`. I refactored the code to make adding an unsigned comparison trivial once this has been resolved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org