swaminathanmanish opened a new pull request, #10874: URL: https://github.com/apache/pinot/pull/10874
**Problem**: RecordReaders are used to iterate over the source/input files, in order to ingest data/create segments. Although we iterate one row at at time from a file, we have readers (like ParquetRecordReader) that allocate a rowGroup (collection of rows) for better read throughput, while reading from Parquet files. This uses up heap memory. The SegmentProcessesorFramework takes in N RecordReaders. Users of this framework allocate N RecordReaders using getRecordReader factory, which also initializes the reader. Depending on how many readers are created, there's a possibility of running out of heap space due to eager allocation/initialization. **Solution**: Provide the flexibility to pass the info required to initialize and clean up record reader in the mapper, where it is used. This will ensure that the readers use memory only when being iterated in the mapper and we don't eagerly allocate memory. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org