krishan1390 opened a new pull request, #16845:
URL: https://github.com/apache/pinot/pull/16845
**Summary**
Avoid storing unique values for columns with dictionary disabled,
drastically reducing heap usage during segment creation. Track min/max and
row-length stats without relying on sorted unique sets. Maintain existing
behavior for dictionary-enabled columns. The sorted unique sets were only
needed to build dictionaries, which are not created for no-dictionary columns.
**Key Changes**
1. Added dictionary enablement detection to
AbstractColumnStatisticsCollector
2. Behavior when _dictionaryEnabled == false:
a. getUniqueValuesSet() returns null.
b. getCardinality() returns total entries
c. Cardinality from ColumnIndexCreationInfo.getDistinctValueCount()
becomes UNKNOWN_CARDINALITY (via null unique values)
3. Updated collectors to skip unique-value storage for no-dictionary
columns, lazily allocating sets/arrays only when needed:
Labels: performance
Release Notes -
- Approximate cardinality for no dictionary columns
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]