Jackie-Jiang commented on code in PR #13994:
URL: https://github.com/apache/pinot/pull/13994#discussion_r1777816900


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/dictionary/DictionaryIndexType.java:
##########
@@ -236,10 +237,18 @@ public static boolean ignoreDictionaryOverride(boolean 
optimizeDictionary,
       // Do not create dictionary if index size with dictionary is going to be 
larger than index size without dictionary
       // This is done to reduce the cost of dictionary for high cardinality 
columns
       // Off by default and needs optimizeDictionary to be set to true
-      if (fieldSpec.isSingleValueField() && 
fieldSpec.getDataType().isFixedWidth()) {
-        // if you can safely enable dictionary, you can ignore overrides
-        return canSafelyCreateDictionaryWithinThreshold(cardinality, 
totalNumberOfEntries,
-            noDictionarySizeRatioThreshold, fieldSpec);
+      if (fieldSpec.isSingleValueField()) {
+        if (fieldSpec.getDataType().isFixedWidth()) {
+          // if you can safely enable dictionary, you can ignore overrides
+          return canSafelyCreateDictionaryWithinThreshold(cardinality, 
totalNumberOfEntries,
+              noDictionarySizeRatioThreshold, fieldSpec);
+        }
+        // Config not set, default to old behavior of create dictionary for 
var width cols
+        if (noDictionaryCardinalityRatioThreshold == null) {
+          return true;
+        }
+        // variable width type, so create based simply on cardinality 
threshold size cannot be calculated easily
+        return noDictionaryCardinalityRatioThreshold * totalNumberOfEntries > 
cardinality;

Review Comment:
   Currently this is not applied to metric when `optimizeDictionaryForMetrics` 
is true. I think you might also want it for metric as it is more common to have 
high cardinality?



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -293,8 +293,9 @@ private boolean 
createDictionaryForColumn(ColumnIndexCreationInfo info, SegmentG
 
     FieldIndexConfigs fieldIndexConfigs = 
config.getIndexConfigsByColName().get(column);
     if 
(DictionaryIndexType.ignoreDictionaryOverride(config.isOptimizeDictionary(),
-        config.isOptimizeDictionaryForMetrics(), 
config.getNoDictionarySizeRatioThreshold(), spec, fieldIndexConfigs,
-        info.getDistinctValueCount(), info.getTotalNumberOfEntries())) {
+        config.isOptimizeDictionaryForMetrics(), 
config.getNoDictionarySizeRatioThreshold(),

Review Comment:
   Having 4 configs on the same topic passed in separately is a little bit 
ugly, but we can address it separately



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/dictionary/DictionaryIndexType.java:
##########
@@ -236,10 +237,18 @@ public static boolean ignoreDictionaryOverride(boolean 
optimizeDictionary,
       // Do not create dictionary if index size with dictionary is going to be 
larger than index size without dictionary
       // This is done to reduce the cost of dictionary for high cardinality 
columns
       // Off by default and needs optimizeDictionary to be set to true
-      if (fieldSpec.isSingleValueField() && 
fieldSpec.getDataType().isFixedWidth()) {
-        // if you can safely enable dictionary, you can ignore overrides
-        return canSafelyCreateDictionaryWithinThreshold(cardinality, 
totalNumberOfEntries,
-            noDictionarySizeRatioThreshold, fieldSpec);
+      if (fieldSpec.isSingleValueField()) {
+        if (fieldSpec.getDataType().isFixedWidth()) {
+          // if you can safely enable dictionary, you can ignore overrides
+          return canSafelyCreateDictionaryWithinThreshold(cardinality, 
totalNumberOfEntries,

Review Comment:
   For fixed width type `noDictionarySizeRatioThreshold` is always picked even 
if it is not configured (default value being used). Do we want to provide a way 
so that `noDictionaryCardinalityRatioThreshold` is picked when 
`noDictionarySizeRatioThreshold` is not configured or explicitly disabled?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to