Jackie-Jiang commented on code in PR #10990: URL: https://github.com/apache/pinot/pull/10990#discussion_r1261564543
########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java: ########## @@ -605,6 +597,40 @@ static boolean isValidPropertyValue(String value) { return value.indexOf(',') == -1; } + /** + * Helper method to get the valid value for setting min/max. + */ + @VisibleForTesting + static String getValidPropertyValue(String value) { + int length = value.length(); + if (length > 0) { + // check and replace first character if it's a white space + if (Character.isWhitespace(value.charAt(0))) { + String unicodeValue = "\\u" + String.format("%04x", (int) value.charAt(0)); + value = unicodeValue + value.substring(1); + } + + // check and replace last character if it's a white space + if (Character.isWhitespace(value.charAt(value.length() - 1))) { + String unicodeValue = "\\u" + String.format("%04x", (int) value.charAt(value.length() - 1)); + value = value.substring(0, value.length() - 1) + unicodeValue; + } + } + + // removing the ',' from the value if it's present. + if (length > 0 && value.contains(",")) { + value = value.replace(",", "\\,"); + length = value.length(); // updating the length value after escaping ',' + } + + // taking first m characters of the value if length is greater than METADATA_PROPERTY_LENGTH_LIMIT + if (length > METADATA_PROPERTY_LENGTH_LIMIT) { Review Comment: We cannot simply trim the extra characters because it can result in a wrong min/max value. For min value, the modified value must be smaller than the actual value; for max value, the modified value must be larger than the actual value. One approach to solve the max value problem is to add 1 to the last character or append one character after the last character. For BYTES type, we might also need some special handling because the value needs to be a valid hex encoded string Also, we should first modify the value then handle special characters, or the special characters might be cut in half and cause it unreadable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org