tarun11Mavani opened a new pull request, #16769:
URL: https://github.com/apache/pinot/pull/16769

   Extension of: #16344
   
   ### Summary
   This PR extends the Commit Time Compaction (#14588) feature to support 
column-major segment building, removing the previous limitation that restricted 
commit-time compaction to row-major builds only. This enhancement allows users 
to leverage  commit-time compaction for storage efficiency and column-major 
builds for improved build time.
   
   ### Technical Implementation
   RealtimeSegmentConverter: Enhanced to pass validDocIdsSnapshot to the 
column-major build process when using compacted readers, ensuring only valid 
documents are processed during column indexing.
   
   SegmentColumnarIndexCreator.indexColumn(): Extended method signature to 
accept validDocIds parameter, enabling selective indexing of only valid 
documents during column-major builds. Maintains backward compatibility with 
deprecated method signature.
   
   SegmentIndexCreationDriverImpl.buildByColumn(): Updated to accept and 
utilize validDocIds parameter, passing it through to the column indexing 
process for filtering invalid documents
   
   TableConfigUtils: Removed validateCommitTimeCompactionConfig() method and 
associated validation logic that previously blocked column-major builds with 
commit-time compaction
   
   Enhanced Integration Testing:
    - Extended CommitTimeCompactionIntegrationTest to test three table 
configurations simultaneously:
    - Baseline (no compaction, row-major)
    - Commit-time compaction with row-major build
    - Commit-time compaction with column-major build
    - Improved test methods to handle three-table comparisons and validation
   
   ### Impact
   **Feature Compatibility**: Users can now enable both 
enableCommitTimeCompaction=true and columnMajorSegmentBuilderEnabled=true 
simultaneously
   **Performance Benefits**: Combines storage optimization from commit-time 
compaction with  performance improvements from column-major indexing
   **Backward Compatibility**: All existing functionality remains intact; 
changes are additive
   
   ### Testing
   Integration Tests: Extended existing test suite to validate three-table 
scenarios (baseline, row-major compaction, column-major compaction)
   Regression Testing in test cluster: Created tables with and without commit 
time compaction with column major build enabled for both tables. 
    - Both table has same data and segment counts
    - 3x Segment build time improved with commit time compaction enabled for 
table with 12x compaction ratio, 40K messages per second,  90 to 120 minute 
commit frequency
    
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to