aishikbh commented on code in PR #12220:
URL: https://github.com/apache/pinot/pull/12220#discussion_r1452475077


##########
pinot-core/src/main/java/org/apache/pinot/core/segment/processing/mapper/SegmentMapper.java:
##########
@@ -107,62 +106,62 @@ public SegmentMapper(List<RecordReaderFileConfig> 
recordReaderFileConfigs,
     LOGGER.info("Initialized mapper with {} record readers, output dir: {}, 
timeHandler: {}, partitioners: {}",
         _recordReaderFileConfigs.size(), _mapperOutputDir, 
_timeHandler.getClass(),
         Arrays.stream(_partitioners).map(p -> 
p.getClass().toString()).collect(Collectors.joining(",")));
+
+    // initialize adaptive writer.
+    _adaptiveSizeBasedWriter =
+        new 
AdaptiveSizeBasedWriter(processorConfig.getSegmentConfig().getIntermediateFileSizeThreshold());
   }
 
   /**
    * Reads the input records and generates partitioned generic row files into 
the mapper output directory.
    * Records for each partition are put into a directory of the partition name 
within the mapper output directory.
    */
-  public Map<String, GenericRowFileManager> map()
+  public Map<String, GenericRowFileManager> map(int totalRecordReaderSize)

Review Comment:
   for optimisation purposes we are working only on the sublist of the original 
list of RecordReaderFileConfigs in SegmentMapper. So we infer the overall index 
of the current recordrecorder being processed using the global total count for 
log purposes.
   
   We need to pass the global count somehow or else we will lose the 
granularity of logging data. The other option is to do it in 
`SegmentprocessorFramework`, but in that case we will have sparse logging i.e. 
we will only have logs when we terminate the mapper phase. 
   
   Should we do that? Just as an example, here is how the logging looks 
currently on the UI (will be similar in debug logs as well.)
   <img width="706" alt="Screenshot 2024-01-11 at 11 40 32 PM" 
src="https://github.com/apache/pinot/assets/15700987/5c2109d8-8d84-4195-aa7c-13cdb04520a9";>
    
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to