aishikbh commented on code in PR #12220:
URL: https://github.com/apache/pinot/pull/12220#discussion_r1452475077


##########
pinot-core/src/main/java/org/apache/pinot/core/segment/processing/mapper/SegmentMapper.java:
##########
@@ -107,62 +106,62 @@ public SegmentMapper(List<RecordReaderFileConfig> 
recordReaderFileConfigs,
     LOGGER.info("Initialized mapper with {} record readers, output dir: {}, 
timeHandler: {}, partitioners: {}",
         _recordReaderFileConfigs.size(), _mapperOutputDir, 
_timeHandler.getClass(),
         Arrays.stream(_partitioners).map(p -> 
p.getClass().toString()).collect(Collectors.joining(",")));
+
+    // initialize adaptive writer.
+    _adaptiveSizeBasedWriter =
+        new 
AdaptiveSizeBasedWriter(processorConfig.getSegmentConfig().getIntermediateFileSizeThreshold());
   }
 
   /**
    * Reads the input records and generates partitioned generic row files into 
the mapper output directory.
    * Records for each partition are put into a directory of the partition name 
within the mapper output directory.
    */
-  public Map<String, GenericRowFileManager> map()
+  public Map<String, GenericRowFileManager> map(int totalRecordReaderSize)

Review Comment:
   Saw comments about logs so consolidating the response here :D
   
   for optimisation purposes we are working only on the sublist of the original 
list of RecordReaderFileConfigs in `SegmentMapper`. So we infer the overall 
index of the current recordrecorder being processed using the global total 
count for log purposes.
   
   We need to pass the global count somehow or else we will lose the 
granularity of logging data. The other option is to do it in 
`SegmentprocessorFramework`, but in that case we will have sparse logging i.e. 
we will only have logs when we terminate the mapper phase. Should we use a 
different way to pass the total count? or should the logging in 
`SegmentProcessorFramework` be enough? What do you suggest?
   
   Just as a reference, here is how the logging looks currently on the UI (will 
be similar in debug logs as well.)
   <img width="706" alt="Screenshot 2024-01-11 at 11 40 32 PM" 
src="https://github.com/apache/pinot/assets/15700987/5c2109d8-8d84-4195-aa7c-13cdb04520a9";>
    
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to