aishikbh commented on code in PR #12220: URL: https://github.com/apache/pinot/pull/12220#discussion_r1444810618
########## pinot-core/src/main/java/org/apache/pinot/core/segment/processing/mapper/SegmentMapper.java: ########## @@ -141,28 +149,42 @@ private Map<String, GenericRowFileManager> doMap() RecordReaderFactory.getRecordReader(recordReaderFileConfig._fileFormat, recordReaderFileConfig._dataFile, recordReaderFileConfig._fieldsToRead, recordReaderFileConfig._recordReaderConfig); mapAndTransformRow(recordReader, reuse, observer, count, totalCount); + _recordReaderFileConfigs.get(i)._recordReader = recordReader; + if (!_constraintsChecker.canWrite()) { + LOGGER.info("Stopping record readers at index: {} as size limit reached", i); + break; + } } finally { - if (recordReader != null) { + if (recordReader != null && !recordReader.hasNext()) { recordReader.close(); } } } else { + if (!recordReader.hasNext()) { + LOGGER.info("Skipping record reader as it is already processed at index: {}", i); + count++; + continue; + } mapAndTransformRow(recordReader, reuse, observer, count, totalCount); + _recordReaderFileConfigs.get(i)._recordReader = recordReader; + if (!_constraintsChecker.canWrite()) { Review Comment: I put this as an optimisation to reduce calls to `mapAndTransformRow`. Let's consider this flow : we get out of the `mapAndTransformRow` function because the size constraint got violated, if we do not put the check in the post of `mapAndTransformRow` and break, it will iterate through all the recordreaders and make calls to mapAndTransformRow even though it will get out of `mapAndTransformRow` because of the size constraint. Also the upcoming recordreaders will be needlessly initialised. Although, I should put this at the end of the loop instead of putting it twice in the conditionals and once in `mapAndTransformRow`. I will make the change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org