ege-st commented on code in PR #11776: URL: https://github.com/apache/pinot/pull/11776#discussion_r1367258215
########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentIndexCreationDriverImpl.java: ########## @@ -229,16 +232,21 @@ public void build() GenericRow reuse = new GenericRow(); TransformPipeline.Result reusedResult = new TransformPipeline.Result(); while (_recordReader.hasNext()) { - long recordReadStartTime = System.currentTimeMillis(); - long recordReadStopTime = System.currentTimeMillis(); + long recordReadStartTime = System.nanoTime(); + long recordReadStopTime = System.nanoTime(); Review Comment: If we do that it breaks the calculation of `_totalIndexTime += (indexStopTime - recordReadStopTime)` on L264. I'll move the `recordReadStartTime` into the try block to minimize its scope. ########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentIndexCreationDriverImpl.java: ########## @@ -273,6 +281,48 @@ public void build() handlePostCreation(); } + public void buildByColumn(IndexSegment indexSegment) + throws Exception { + // Count the number of documents and gather per-column statistics + LOGGER.debug("Start building StatsCollector!"); + buildIndexCreationInfo(); + LOGGER.info("Finished building StatsCollector!"); + LOGGER.info("Collected stats for {} documents", _totalDocs); + + try { + // Initialize the index creation using the per-column statistics information + // TODO: _indexCreationInfoMap holds the reference to all unique values on heap (ColumnIndexCreationInfo -> + // ColumnStatistics) throughout the segment creation. Find a way to release the memory early. + _indexCreator.init(_config, _segmentIndexCreationInfo, _indexCreationInfoMap, _dataSchema, _tempIndexDir); + + // Build the indexes + LOGGER.info("Start building Index by column"); + + TreeSet<String> columns = _dataSchema.getPhysicalColumnNames(); + + // TODO: Eventually pull the doc Id sorting logic out of Record Reader so that all row oriented logic can be + // removed from this code. + int[] sortedDocIds = ((PinotSegmentRecordReader) _recordReader).getSortedDocIds(); + boolean skip = ((PinotSegmentRecordReader) _recordReader).getSkipDefaultNullValues(); Review Comment: Removed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org