rohityadav1993 opened a new pull request, #16727:
URL: https://github.com/apache/pinot/pull/16727

   `feature``performance``release-notes`
   
   # Feature
   Feature to build segment in columnar fashion for columnar input data sources 
like Pinot segment, Parquet files, etc.
   
   ## Description
   The requirement arises from issue: 
https://github.com/apache/pinot/issues/16461:
   In short, when minion refreshes a servers segment for schema or index config 
change. It rebuilds segment row wise. There are two bottlenecks observed:
   1. Row wise segment stats collection for each column in 
SegmentIndexCreationDriverImpl 
https://github.com/apache/pinot/blob/868e2c9e7ced50c865bfe4fd0aa0e4200f6666ae/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentIndexCreationDriverImpl.java#L573
   2. Row wise segment generation in SegmentIndexCreationDriverImpl 
https://github.com/apache/pinot/blob/868e2c9e7ced50c865bfe4fd0aa0e4200f6666ae/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentIndexCreationDriverImpl.java#L266
   
   Record wise generation also reprocesses transformations. This is not needed 
unless there is a change in transformation logic both in gather stats step and 
segment regeneration step. The bottlenecks can be seen in the CPU profile in 
the attached issue https://github.com/apache/pinot/issues/16461
   
   # Design and Implementation
   New interfaces:
   1. ColumnReader: used to read column data from various data sources for 
columnar segment building
       1. Sequential iteration over all values in a column using hasNext() and 
next()
       2. Null value detection for the current value
   2. ColumnReaderFactory: create column readers for various data sources
    * such as Pinot segments, Parquet files, etc
       1. Creating column readers for existing columns
       2. Creating default value readers for new columns
       3. Data type conversions between source and target schemas
   
   (**) Use `release-notes` label for scenarios like:
   - New configuration options
   - Deprecation of configurations
   - Signature changes to public methods/interfaces
   - New plugins added or old plugins removed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to