rohityadav1993 opened a new pull request, #16727: URL: https://github.com/apache/pinot/pull/16727
`feature``performance``release-notes` # Feature Feature to build segment in columnar fashion for columnar input data sources like Pinot segment, Parquet files, etc. ## Description The requirement arises from issue: https://github.com/apache/pinot/issues/16461: In short, when minion refreshes a servers segment for schema or index config change. It rebuilds segment row wise. There are two bottlenecks observed: 1. Row wise segment stats collection for each column in SegmentIndexCreationDriverImpl https://github.com/apache/pinot/blob/868e2c9e7ced50c865bfe4fd0aa0e4200f6666ae/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentIndexCreationDriverImpl.java#L573 2. Row wise segment generation in SegmentIndexCreationDriverImpl https://github.com/apache/pinot/blob/868e2c9e7ced50c865bfe4fd0aa0e4200f6666ae/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentIndexCreationDriverImpl.java#L266 Record wise generation also reprocesses transformations. This is not needed unless there is a change in transformation logic both in gather stats step and segment regeneration step. The bottlenecks can be seen in the CPU profile in the attached issue https://github.com/apache/pinot/issues/16461 # Design and Implementation New interfaces: 1. ColumnReader: used to read column data from various data sources for columnar segment building 1. Sequential iteration over all values in a column using hasNext() and next() 2. Null value detection for the current value 2. ColumnReaderFactory: create column readers for various data sources * such as Pinot segments, Parquet files, etc 1. Creating column readers for existing columns 2. Creating default value readers for new columns 3. Data type conversions between source and target schemas (**) Use `release-notes` label for scenarios like: - New configuration options - Deprecation of configurations - Signature changes to public methods/interfaces - New plugins added or old plugins removed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
