tarun11Mavani opened a new pull request, #16769:
URL: https://github.com/apache/pinot/pull/16769
Extension of: #16344
### Summary
This PR extends the Commit Time Compaction (#14588) feature to support
column-major segment building, removing the previous limitation that restricted
commit-time compaction to row-major builds only. This enhancement allows users
to leverage commit-time compaction for storage efficiency and column-major
builds for improved build time.
### Technical Implementation
RealtimeSegmentConverter: Enhanced to pass validDocIdsSnapshot to the
column-major build process when using compacted readers, ensuring only valid
documents are processed during column indexing.
SegmentColumnarIndexCreator.indexColumn(): Extended method signature to
accept validDocIds parameter, enabling selective indexing of only valid
documents during column-major builds. Maintains backward compatibility with
deprecated method signature.
SegmentIndexCreationDriverImpl.buildByColumn(): Updated to accept and
utilize validDocIds parameter, passing it through to the column indexing
process for filtering invalid documents
TableConfigUtils: Removed validateCommitTimeCompactionConfig() method and
associated validation logic that previously blocked column-major builds with
commit-time compaction
Enhanced Integration Testing:
- Extended CommitTimeCompactionIntegrationTest to test three table
configurations simultaneously:
- Baseline (no compaction, row-major)
- Commit-time compaction with row-major build
- Commit-time compaction with column-major build
- Improved test methods to handle three-table comparisons and validation
### Impact
**Feature Compatibility**: Users can now enable both
enableCommitTimeCompaction=true and columnMajorSegmentBuilderEnabled=true
simultaneously
**Performance Benefits**: Combines storage optimization from commit-time
compaction with performance improvements from column-major indexing
**Backward Compatibility**: All existing functionality remains intact;
changes are additive
### Testing
Integration Tests: Extended existing test suite to validate three-table
scenarios (baseline, row-major compaction, column-major compaction)
Regression Testing in test cluster: Created tables with and without commit
time compaction with column major build enabled for both tables.
- Both table has same data and segment counts
- 3x Segment build time improved with commit time compaction enabled for
table with 12x compaction ratio, 40K messages per second, 90 to 120 minute
commit frequency
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]