Jackie-Jiang commented on code in PR #13994: URL: https://github.com/apache/pinot/pull/13994#discussion_r1777816900
########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/dictionary/DictionaryIndexType.java: ########## @@ -236,10 +237,18 @@ public static boolean ignoreDictionaryOverride(boolean optimizeDictionary, // Do not create dictionary if index size with dictionary is going to be larger than index size without dictionary // This is done to reduce the cost of dictionary for high cardinality columns // Off by default and needs optimizeDictionary to be set to true - if (fieldSpec.isSingleValueField() && fieldSpec.getDataType().isFixedWidth()) { - // if you can safely enable dictionary, you can ignore overrides - return canSafelyCreateDictionaryWithinThreshold(cardinality, totalNumberOfEntries, - noDictionarySizeRatioThreshold, fieldSpec); + if (fieldSpec.isSingleValueField()) { + if (fieldSpec.getDataType().isFixedWidth()) { + // if you can safely enable dictionary, you can ignore overrides + return canSafelyCreateDictionaryWithinThreshold(cardinality, totalNumberOfEntries, + noDictionarySizeRatioThreshold, fieldSpec); + } + // Config not set, default to old behavior of create dictionary for var width cols + if (noDictionaryCardinalityRatioThreshold == null) { + return true; + } + // variable width type, so create based simply on cardinality threshold size cannot be calculated easily + return noDictionaryCardinalityRatioThreshold * totalNumberOfEntries > cardinality; Review Comment: Currently this is not applied to metric when `optimizeDictionaryForMetrics` is true. I think you might also want it for metric as it is more common to have high cardinality? ########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java: ########## @@ -293,8 +293,9 @@ private boolean createDictionaryForColumn(ColumnIndexCreationInfo info, SegmentG FieldIndexConfigs fieldIndexConfigs = config.getIndexConfigsByColName().get(column); if (DictionaryIndexType.ignoreDictionaryOverride(config.isOptimizeDictionary(), - config.isOptimizeDictionaryForMetrics(), config.getNoDictionarySizeRatioThreshold(), spec, fieldIndexConfigs, - info.getDistinctValueCount(), info.getTotalNumberOfEntries())) { + config.isOptimizeDictionaryForMetrics(), config.getNoDictionarySizeRatioThreshold(), Review Comment: Having 4 configs on the same topic passed in separately is a little bit ugly, but we can address it separately ########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/dictionary/DictionaryIndexType.java: ########## @@ -236,10 +237,18 @@ public static boolean ignoreDictionaryOverride(boolean optimizeDictionary, // Do not create dictionary if index size with dictionary is going to be larger than index size without dictionary // This is done to reduce the cost of dictionary for high cardinality columns // Off by default and needs optimizeDictionary to be set to true - if (fieldSpec.isSingleValueField() && fieldSpec.getDataType().isFixedWidth()) { - // if you can safely enable dictionary, you can ignore overrides - return canSafelyCreateDictionaryWithinThreshold(cardinality, totalNumberOfEntries, - noDictionarySizeRatioThreshold, fieldSpec); + if (fieldSpec.isSingleValueField()) { + if (fieldSpec.getDataType().isFixedWidth()) { + // if you can safely enable dictionary, you can ignore overrides + return canSafelyCreateDictionaryWithinThreshold(cardinality, totalNumberOfEntries, Review Comment: For fixed width type `noDictionarySizeRatioThreshold` is always picked even if it is not configured (default value being used). Do we want to provide a way so that `noDictionaryCardinalityRatioThreshold` is picked when `noDictionarySizeRatioThreshold` is not configured or explicitly disabled? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org