aishikbh commented on code in PR #12220:
URL: https://github.com/apache/pinot/pull/12220#discussion_r1444810618


##########
pinot-core/src/main/java/org/apache/pinot/core/segment/processing/mapper/SegmentMapper.java:
##########
@@ -141,28 +149,42 @@ private Map<String, GenericRowFileManager> doMap()
               
RecordReaderFactory.getRecordReader(recordReaderFileConfig._fileFormat, 
recordReaderFileConfig._dataFile,
                   recordReaderFileConfig._fieldsToRead, 
recordReaderFileConfig._recordReaderConfig);
           mapAndTransformRow(recordReader, reuse, observer, count, totalCount);
+          _recordReaderFileConfigs.get(i)._recordReader = recordReader;
+          if (!_constraintsChecker.canWrite()) {
+            LOGGER.info("Stopping record readers at index: {} as size limit 
reached", i);
+            break;
+          }
         } finally {
-          if (recordReader != null) {
+          if (recordReader != null && !recordReader.hasNext()) {
             recordReader.close();
           }
         }
       } else {
+        if (!recordReader.hasNext()) {
+          LOGGER.info("Skipping record reader as it is already processed at 
index: {}", i);
+          count++;
+          continue;
+        }
         mapAndTransformRow(recordReader, reuse, observer, count, totalCount);
+        _recordReaderFileConfigs.get(i)._recordReader = recordReader;
+        if (!_constraintsChecker.canWrite()) {

Review Comment:
   I put this as an optimisation to reduce calls to mapAndTransformRow. Let's 
consider this flow : we get out of the mapAndTransformRow function because the 
size constraint got violated, if we do not put the check in the post of 
mapAndTransformRow it will iterate through all the recordreaders and make calls 
to mapAndTransformRow even though it will get out of mapAndTransformRow because 
of the size constraint. Also the upcoming recordreaders will be needlessly 
initialised.
   
   Although, I should put this at the end of the loop instead of putting it 
twice in the conditionals.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to