rhodo commented on code in PR #16149: URL: https://github.com/apache/pinot/pull/16149#discussion_r2175888171
########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java: ########## @@ -599,10 +599,14 @@ public static void addColumnMetadataInfo(PropertiesConfiguration properties, Str properties.setProperty(getKeyFor(column, IS_AUTO_GENERATED), String.valueOf(columnIndexCreationInfo.isAutoGenerated())); if (dataType.equals(DataType.STRING) || dataType.equals(DataType.BYTES) || dataType.equals(DataType.JSON)) { - properties.setProperty(getKeyFor(column, SCHEMA_MAX_LENGTH), fieldSpec.getMaxLength()); - FieldSpec.MaxLengthExceedStrategy maxLengthExceedStrategy = fieldSpec.getMaxLengthExceedStrategy(); - if (maxLengthExceedStrategy != null) { - properties.setProperty(getKeyFor(column, SCHEMA_MAX_LENGTH_EXCEED_STRATEGY), maxLengthExceedStrategy); + Integer maxLength = fieldSpec.getMaxLength(); Review Comment: Thanks for the suggestion! A couple of follow-ups: - Should we apply the same idea to SCHEMA_MAX_LENGTH_EXCEED_STRATEGY as well? - In the equals() method ([link](https://github.com/apache/pinot/pull/16149/files#diff-05c7d0209f5495c4bb9b5f93d37d8bf5a706114aa0e9afa11c367000651fd395L500-L502)), should we compare the effective max length or just the configured max length? We have a number of unit tests that compare a field spec to one reconstructed from segment metadata, so the decision here may decide wether we should update those tests depending on which semantics we prefer. ########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SanitizationTransformer.java: ########## @@ -53,24 +53,19 @@ public class SanitizationTransformer implements RecordTransformer { private final Map<String, SanitizedColumnInfo> _columnToColumnInfoMap = new HashMap<>(); public SanitizationTransformer(Schema schema) { - FieldSpec.MaxLengthExceedStrategy maxLengthExceedStrategy; for (FieldSpec fieldSpec : schema.getAllFieldSpecs()) { if (!fieldSpec.isVirtualColumn()) { - if (fieldSpec.getDataType().equals(FieldSpec.DataType.STRING)) { - maxLengthExceedStrategy = - fieldSpec.getMaxLengthExceedStrategy() == null ? FieldSpec.MaxLengthExceedStrategy.TRIM_LENGTH - : fieldSpec.getMaxLengthExceedStrategy(); - _columnToColumnInfoMap.put(fieldSpec.getName(), new SanitizedColumnInfo(fieldSpec.getName(), - fieldSpec.getMaxLength(), maxLengthExceedStrategy, fieldSpec.getDefaultNullValue())); - } else if (fieldSpec.getDataType().equals(FieldSpec.DataType.JSON) || fieldSpec.getDataType() - .equals(FieldSpec.DataType.BYTES)) { - maxLengthExceedStrategy = fieldSpec.getMaxLengthExceedStrategy() == null - ? FieldSpec.MaxLengthExceedStrategy.NO_ACTION : fieldSpec.getMaxLengthExceedStrategy(); - if (maxLengthExceedStrategy.equals(FieldSpec.MaxLengthExceedStrategy.NO_ACTION)) { - continue; + FieldSpec.DataType dataType = fieldSpec.getDataType(); + if (dataType.equals(FieldSpec.DataType.STRING) + || dataType.equals(FieldSpec.DataType.JSON) + || dataType.equals(FieldSpec.DataType.BYTES)) { + + FieldSpec.MaxLengthExceedStrategy strategy = fieldSpec.getEffectiveMaxLengthExceedStrategy(); + if (!strategy.equals(FieldSpec.MaxLengthExceedStrategy.NO_ACTION) || dataType.equals( + FieldSpec.DataType.STRING)) { Review Comment: Yes, try to make this part a pure refactor rather than changing the existing behavior here: even if MaxLengthExceedStrategy.NO_ACTION is used for string type, other sanitization logic is still able to be applied when it is String. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org