Jackie-Jiang commented on code in PR #10990:
URL: https://github.com/apache/pinot/pull/10990#discussion_r1261564543


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -605,6 +597,40 @@ static boolean isValidPropertyValue(String value) {
     return value.indexOf(',') == -1;
   }
 
+  /**
+   * Helper method to get the valid value for setting min/max.
+   */
+  @VisibleForTesting
+  static String getValidPropertyValue(String value) {
+    int length = value.length();
+    if (length > 0) {
+      // check and replace first character if it's a white space
+      if (Character.isWhitespace(value.charAt(0))) {
+        String unicodeValue = "\\u" + String.format("%04x", (int) 
value.charAt(0));
+        value = unicodeValue + value.substring(1);
+      }
+
+      // check and replace last character if it's a white space
+      if (Character.isWhitespace(value.charAt(value.length() - 1))) {
+        String unicodeValue = "\\u" + String.format("%04x", (int) 
value.charAt(value.length() - 1));
+        value = value.substring(0, value.length() - 1) + unicodeValue;
+      }
+    }
+
+    // removing the ',' from the value if it's present.
+    if (length > 0 && value.contains(",")) {
+      value = value.replace(",", "\\,");
+      length = value.length(); // updating the length value after escaping ','
+    }
+
+    // taking first m characters of the value if length is greater than 
METADATA_PROPERTY_LENGTH_LIMIT
+    if (length > METADATA_PROPERTY_LENGTH_LIMIT) {

Review Comment:
   We cannot simply trim the extra characters because it can result in a wrong 
min/max value. For min value, the modified value must be smaller than the 
actual value; for max value, the modified value must be larger than the actual 
value.
   One approach to solve the max value problem is to add 1 to the last 
character or append one character after the last character. For BYTES type, we 
might also need some special handling because the value needs to be a valid hex 
encoded string
   
   Also, we should first modify the value then handle special characters, or 
the special characters might be cut in half and cause it unreadable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to