Jackie-Jiang commented on code in PR #10905: URL: https://github.com/apache/pinot/pull/10905#discussion_r1237648011
########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java: ########## @@ -61,8 +62,10 @@ public class MultipleTreesBuilder implements Closeable { private final List<StarTreeV2BuilderConfig> _builderConfigs; private final BuildMode _buildMode; private final File _segmentDirectory; + private final File _indexDirectory; Review Comment: (minor) For convention, we usually call it `_indexDir` so that it is easier to map names ########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java: ########## @@ -148,9 +180,54 @@ public void build() StarTreeIndexMapUtils .storeToFile(indexMaps, new File(_segmentDirectory, StarTreeV2Constants.INDEX_MAP_FILE_NAME)); FileUtils.forceDelete(starTreeIndexDir); + _existingStarTrees.cleanOutputDirectory(); } - LOGGER.info("Finished building {} star-trees in {}ms", numStarTrees, System.currentTimeMillis() - startTime); + LOGGER.info("Newly built start-trees {}.\n Reused star-trees {}.", numStarTrees - reusedStarTrees, reusedStarTrees); + LOGGER.info("Finished building star-trees in {}ms", System.currentTimeMillis() - startTime); Review Comment: (minor) Combine them into one single log line `Finished building {} star-trees ({} reused) in {}ms` ########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java: ########## @@ -81,9 +84,10 @@ public MultipleTreesBuilder(List<StarTreeV2BuilderConfig> builderConfigs, File i _builderConfigs = builderConfigs; _buildMode = buildMode; _segmentDirectory = SegmentDirectoryPaths.findSegmentDirectory(indexDir); + _indexDirectory = indexDir; _metadataProperties = CommonsConfigurationUtils.fromFile(new File(_segmentDirectory, V1Constants.MetadataKeys.METADATA_FILE_NAME)); - Preconditions.checkState(!_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT), "Star-tree already exists"); + _existingStarTrees = getExistingStarTrees(); Review Comment: We might not want to always separate all the trees. Ideally the separator can track all the star-tree metadata (merge `SeparatedStarTreesMetadata` into the separator), and only extract the tree that can be reused into the target folder. Currently we always copy all the trees into separate files, even though they might not be reusable. This way we don't need the temp folder as well. ########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java: ########## @@ -148,9 +180,54 @@ public void build() StarTreeIndexMapUtils .storeToFile(indexMaps, new File(_segmentDirectory, StarTreeV2Constants.INDEX_MAP_FILE_NAME)); FileUtils.forceDelete(starTreeIndexDir); + _existingStarTrees.cleanOutputDirectory(); } - LOGGER.info("Finished building {} star-trees in {}ms", numStarTrees, System.currentTimeMillis() - startTime); + LOGGER.info("Newly built start-trees {}.\n Reused star-trees {}.", numStarTrees - reusedStarTrees, reusedStarTrees); + LOGGER.info("Finished building star-trees in {}ms", System.currentTimeMillis() - startTime); + } + + /** + * Extracts the individual star-trees from the combined index file and returns {@link SeparatedStarTreesMetadata} + * which contains the information about the output directory where the extracted star-trees are located and also + * has the builder configs of each of the star-tree. + */ + private SeparatedStarTreesMetadata extractStarTrees() + throws Exception { + SeparatedStarTreesMetadata metadata = new SeparatedStarTreesMetadata(_segmentDirectory); + if (_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT)) { + File indexFile = new File(_segmentDirectory, StarTreeV2Constants.INDEX_FILE_NAME); + File indexMapFile = new File(_segmentDirectory, StarTreeV2Constants.INDEX_MAP_FILE_NAME); + try (StarTreeIndexSeparator separator = + new StarTreeIndexSeparator(indexMapFile, indexFile, + _metadataProperties.getInt(MetadataKey.STAR_TREE_COUNT))) { + separator.separate(metadata.getOutputDirectory()); + metadata.setBuilderConfigList(separator.extractBuilderConfigs(_metadataProperties)); Review Comment: Consider making these fields read-only (set them in the constructor). `SeparatedStarTreesMetadata` should be a read-only helper class to extract fields from the metadata ########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java: ########## @@ -148,9 +180,54 @@ public void build() StarTreeIndexMapUtils .storeToFile(indexMaps, new File(_segmentDirectory, StarTreeV2Constants.INDEX_MAP_FILE_NAME)); FileUtils.forceDelete(starTreeIndexDir); + _existingStarTrees.cleanOutputDirectory(); Review Comment: Should we move this to `close()` so that it can be cleaned up when exception happens? ########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java: ########## @@ -115,13 +120,33 @@ public MultipleTreesBuilder(@Nullable List<StarTreeIndexConfig> indexConfigs, bo } } + private SeparatedStarTreesMetadata getExistingStarTrees() + throws Exception { + try { + if (_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT)) { + // Extract existing startrees + // clean star-tree related files and configs once all the star-trees are separated and extracted + SeparatedStarTreesMetadata existingStarTrees = extractStarTrees(); + StarTreeBuilderUtils.removeStarTrees(_indexDirectory); + _metadataProperties.refresh(); + return existingStarTrees; + } else { + return new SeparatedStarTreesMetadata(_segmentDirectory); Review Comment: Consider returning `null` here to avoid overhead (`null` basically represents there is no existing star-tree) ########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java: ########## @@ -136,9 +161,16 @@ public void build() for (int i = 0; i < numStarTrees; i++) { StarTreeV2BuilderConfig builderConfig = _builderConfigs.get(i); Configuration metadataProperties = _metadataProperties.subset(MetadataKey.getStarTreePrefix(i)); - try (SingleTreeBuilder singleTreeBuilder = getSingleTreeBuilder(builderConfig, starTreeIndexDir, _segment, - metadataProperties, _buildMode)) { - singleTreeBuilder.build(); + if (_existingStarTrees.containsTree(builderConfig)) { Review Comment: We can directly get index here to avoid one extra lookup ########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java: ########## @@ -148,9 +180,54 @@ public void build() StarTreeIndexMapUtils .storeToFile(indexMaps, new File(_segmentDirectory, StarTreeV2Constants.INDEX_MAP_FILE_NAME)); FileUtils.forceDelete(starTreeIndexDir); + _existingStarTrees.cleanOutputDirectory(); } - LOGGER.info("Finished building {} star-trees in {}ms", numStarTrees, System.currentTimeMillis() - startTime); + LOGGER.info("Newly built start-trees {}.\n Reused star-trees {}.", numStarTrees - reusedStarTrees, reusedStarTrees); + LOGGER.info("Finished building star-trees in {}ms", System.currentTimeMillis() - startTime); + } + + /** + * Extracts the individual star-trees from the combined index file and returns {@link SeparatedStarTreesMetadata} + * which contains the information about the output directory where the extracted star-trees are located and also + * has the builder configs of each of the star-tree. + */ + private SeparatedStarTreesMetadata extractStarTrees() + throws Exception { + SeparatedStarTreesMetadata metadata = new SeparatedStarTreesMetadata(_segmentDirectory); + if (_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT)) { Review Comment: (minor) This check is redundant -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org