aishikbh commented on code in PR #12220: URL: https://github.com/apache/pinot/pull/12220#discussion_r1452475077
########## pinot-core/src/main/java/org/apache/pinot/core/segment/processing/mapper/SegmentMapper.java: ########## @@ -107,62 +106,62 @@ public SegmentMapper(List<RecordReaderFileConfig> recordReaderFileConfigs, LOGGER.info("Initialized mapper with {} record readers, output dir: {}, timeHandler: {}, partitioners: {}", _recordReaderFileConfigs.size(), _mapperOutputDir, _timeHandler.getClass(), Arrays.stream(_partitioners).map(p -> p.getClass().toString()).collect(Collectors.joining(","))); + + // initialize adaptive writer. + _adaptiveSizeBasedWriter = + new AdaptiveSizeBasedWriter(processorConfig.getSegmentConfig().getIntermediateFileSizeThreshold()); } /** * Reads the input records and generates partitioned generic row files into the mapper output directory. * Records for each partition are put into a directory of the partition name within the mapper output directory. */ - public Map<String, GenericRowFileManager> map() + public Map<String, GenericRowFileManager> map(int totalRecordReaderSize) Review Comment: for optimisation purposes we are working only on the sublist of the original list of RecordReaderFileConfigs in `SegmentMapper`. So we infer the overall index of the current recordrecorder being processed using the global total count for log purposes. Saw comments about logs so consolidating the response here :D We need to pass the global count somehow or else we will lose the granularity of logging data. The other option is to do it in `SegmentprocessorFramework`, but in that case we will have sparse logging i.e. we will only have logs when we terminate the mapper phase. Should we use a different way to pass the total count? or should the logging in `SegmentProcessorFramework` be enough? What do you suggest? Just as a reference, here is how the logging looks currently on the UI (will be similar in debug logs as well.) <img width="706" alt="Screenshot 2024-01-11 at 11 40 32 PM" src="https://github.com/apache/pinot/assets/15700987/5c2109d8-8d84-4195-aa7c-13cdb04520a9"> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org