rhodo commented on code in PR #16149:
URL: https://github.com/apache/pinot/pull/16149#discussion_r2175888171


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -599,10 +599,14 @@ public static void 
addColumnMetadataInfo(PropertiesConfiguration properties, Str
     properties.setProperty(getKeyFor(column, IS_AUTO_GENERATED),
         String.valueOf(columnIndexCreationInfo.isAutoGenerated()));
     if (dataType.equals(DataType.STRING) || dataType.equals(DataType.BYTES) || 
dataType.equals(DataType.JSON)) {
-      properties.setProperty(getKeyFor(column, SCHEMA_MAX_LENGTH), 
fieldSpec.getMaxLength());
-      FieldSpec.MaxLengthExceedStrategy maxLengthExceedStrategy = 
fieldSpec.getMaxLengthExceedStrategy();
-      if (maxLengthExceedStrategy != null) {
-        properties.setProperty(getKeyFor(column, 
SCHEMA_MAX_LENGTH_EXCEED_STRATEGY), maxLengthExceedStrategy);
+      Integer maxLength = fieldSpec.getMaxLength();

Review Comment:
   Thanks for the suggestion!
   A couple of follow-ups:
   - Should we apply the same idea to SCHEMA_MAX_LENGTH_EXCEED_STRATEGY as well?
   - In the equals() method 
([link](https://github.com/apache/pinot/pull/16149/files#diff-05c7d0209f5495c4bb9b5f93d37d8bf5a706114aa0e9afa11c367000651fd395L500-L502)),
 should we compare the effective max length or just the configured max length? 
We have a number of unit tests that compare a field spec to one reconstructed 
from segment metadata, so the decision here may decide wether we should update 
those tests depending on which semantics we prefer.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SanitizationTransformer.java:
##########
@@ -53,24 +53,19 @@ public class SanitizationTransformer implements 
RecordTransformer {
   private final Map<String, SanitizedColumnInfo> _columnToColumnInfoMap = new 
HashMap<>();
 
   public SanitizationTransformer(Schema schema) {
-    FieldSpec.MaxLengthExceedStrategy maxLengthExceedStrategy;
     for (FieldSpec fieldSpec : schema.getAllFieldSpecs()) {
       if (!fieldSpec.isVirtualColumn()) {
-        if (fieldSpec.getDataType().equals(FieldSpec.DataType.STRING)) {
-          maxLengthExceedStrategy =
-              fieldSpec.getMaxLengthExceedStrategy() == null ? 
FieldSpec.MaxLengthExceedStrategy.TRIM_LENGTH
-                  : fieldSpec.getMaxLengthExceedStrategy();
-          _columnToColumnInfoMap.put(fieldSpec.getName(), new 
SanitizedColumnInfo(fieldSpec.getName(),
-              fieldSpec.getMaxLength(), maxLengthExceedStrategy, 
fieldSpec.getDefaultNullValue()));
-        } else if (fieldSpec.getDataType().equals(FieldSpec.DataType.JSON) || 
fieldSpec.getDataType()
-            .equals(FieldSpec.DataType.BYTES)) {
-          maxLengthExceedStrategy = fieldSpec.getMaxLengthExceedStrategy() == 
null
-              ? FieldSpec.MaxLengthExceedStrategy.NO_ACTION : 
fieldSpec.getMaxLengthExceedStrategy();
-          if 
(maxLengthExceedStrategy.equals(FieldSpec.MaxLengthExceedStrategy.NO_ACTION)) {
-            continue;
+        FieldSpec.DataType dataType = fieldSpec.getDataType();
+        if (dataType.equals(FieldSpec.DataType.STRING)
+            || dataType.equals(FieldSpec.DataType.JSON)
+            || dataType.equals(FieldSpec.DataType.BYTES)) {
+
+          FieldSpec.MaxLengthExceedStrategy strategy = 
fieldSpec.getEffectiveMaxLengthExceedStrategy();
+          if (!strategy.equals(FieldSpec.MaxLengthExceedStrategy.NO_ACTION) || 
dataType.equals(
+              FieldSpec.DataType.STRING)) {

Review Comment:
   Yes, try to make this part a pure refactor rather than changing the existing 
behavior here: even if MaxLengthExceedStrategy.NO_ACTION is used for string 
type, other sanitization logic is still able to be applied when it is String.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to