aishikbh commented on code in PR #12220: URL: https://github.com/apache/pinot/pull/12220#discussion_r1452475077
########## pinot-core/src/main/java/org/apache/pinot/core/segment/processing/mapper/SegmentMapper.java: ########## @@ -107,62 +106,62 @@ public SegmentMapper(List<RecordReaderFileConfig> recordReaderFileConfigs, LOGGER.info("Initialized mapper with {} record readers, output dir: {}, timeHandler: {}, partitioners: {}", _recordReaderFileConfigs.size(), _mapperOutputDir, _timeHandler.getClass(), Arrays.stream(_partitioners).map(p -> p.getClass().toString()).collect(Collectors.joining(","))); + + // initialize adaptive writer. + _adaptiveSizeBasedWriter = + new AdaptiveSizeBasedWriter(processorConfig.getSegmentConfig().getIntermediateFileSizeThreshold()); } /** * Reads the input records and generates partitioned generic row files into the mapper output directory. * Records for each partition are put into a directory of the partition name within the mapper output directory. */ - public Map<String, GenericRowFileManager> map() + public Map<String, GenericRowFileManager> map(int totalRecordReaderSize) Review Comment: for optimisation purposes we are working only on the sublist of the original list of RecordReaderFileConfigs in SegmentMapper. So we infer the overall index of the current recordrecorder being processed using the global total count for log purposes. We need to pass the global count somehow or else we will lose the granularity of logging data. The other option is to do it in `SegmentprocessorFramework`, but in that case we will have sparse logging i.e. we will only have logs when we terminate the mapper phase. Should we do that? Just as an example, here is how the logging looks currently on the UI (will be similar in debug logs as well.) <img width="706" alt="Screenshot 2024-01-11 at 11 40 32 PM" src="https://github.com/apache/pinot/assets/15700987/5c2109d8-8d84-4195-aa7c-13cdb04520a9"> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org