gortiz commented on PR #12223:
URL: https://github.com/apache/pinot/pull/12223#issuecomment-1882772321

   This is a very nice feature! Nice contribution @vvivekiyer!
   
   I have some comments to add:
   
   ## The concept itself
   I understand that the issue was detected in on heap dictionaries, but I 
think we can use this approach also in offheap dictionaries as well. 
Specifically, interning could be applied when the value is read from the 
dictionary.
   
   ## The config
   I find this config too repetitive.
   
   Instead of:
   ```json
   {
      "indexes": { 
         "dictionary": { 
            "onHeap": true, 
            "useVarLengthDictionary": true,
            "onHeapConfig": {
               "enableInterning":true, 
               "internerCapacity":32000000 
             }
          } 
      }
   }
   ```
   
   I would suggest something like:
   ```json
   {
      "indexes": { 
         "dictionary": { 
            "onHeap": true, 
            "useVarLengthDictionary": true,
            "intern": {
               "capacity":32000000 
             } 
          } 
      }
   }
   ```
   
   With an optional implicit field `intern.disabled`.
   
   I would use this approach even if we do not support interning in offheap 
dictionaries (which is something we may decide to change in future). This 
approach is simpler and easier to read.
   
   ## Support in older syntax
   As said in the comments, I recommend *against* adding new features in the 
old syntax (in this case, in indexingConfig).
   
   For compatibility reasons `index-spi` had to support all features that were 
supported in the old syntax, but we don't have to add support for new features 
in that syntax.
   
   Users can migrate to the new syntax in case they want to use new features. 
The translation can even be done automatically. Is not like we want to force 
people to use the new syntax, but the index config logic is already too complex 
and 2 ways to configure each new feature will make it even more complex.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to