yashmayya commented on code in PR #15817: URL: https://github.com/apache/pinot/pull/15817#discussion_r2120150368
########## pinot-common/src/main/java/org/apache/pinot/common/utils/SegmentUtils.java: ########## @@ -33,38 +34,42 @@ public class SegmentUtils { private SegmentUtils() { } - // Returns the partition id of a realtime segment based segment name and segment metadata info retrieved via Helix. - // Important: The method is costly because it may read data from zookeeper. Do not use it in any query execution - // path. + /// Returns the partition id of a segment based on segment name or ZK metadata. + /// Can return `null` if the partition id cannot be determined. + /// Important: The method is costly because it may read data from zookeeper. Do not use it in query execution path. @Nullable - public static Integer getRealtimeSegmentPartitionId(String segmentName, String realtimeTableName, - HelixManager helixManager, @Nullable String partitionColumn) { - Integer partitionId = getPartitionIdFromRealtimeSegmentName(segmentName); + public static Integer getSegmentPartitionId(String segmentName, String tableNameWithType, HelixManager helixManager, Review Comment: This util method is now being used in certain contexts for `OFFLINE` tables as well so we'll first try to get the partition ID from the segment name using the realtime table logic (LLC or uploaded segment). Is that intentional in order to consolidate the logic? What if the offline segment name happens to have 4 or 5 parts separated by `__`? Looks like `UploadedRealtimeSegmentName.of` handles parse exceptions (expected number of parts, but token in `partitionId` position not an `int` for instance) and returns `null`, but `LLCSegmentName.of` does not. ########## pinot-common/src/main/java/org/apache/pinot/common/utils/SegmentUtils.java: ########## @@ -76,25 +81,61 @@ public static Integer getPartitionIdFromRealtimeSegmentName(String segmentName) return null; } + /// Returns the partition id of a segment based on segment ZK metadata. + /// Can return `null` if the partition id cannot be determined. @Nullable - private static Integer getRealtimeSegmentPartitionId(SegmentZKMetadata segmentZKMetadata, + private static Integer getPartitionIdFromSegmentZKMetadata(SegmentZKMetadata segmentZKMetadata, @Nullable String partitionColumn) { SegmentPartitionMetadata segmentPartitionMetadata = segmentZKMetadata.getPartitionMetadata(); - if (segmentPartitionMetadata != null) { - Map<String, ColumnPartitionMetadata> columnPartitionMap = segmentPartitionMetadata.getColumnPartitionMap(); - ColumnPartitionMetadata columnPartitionMetadata = null; - if (partitionColumn != null) { - columnPartitionMetadata = columnPartitionMap.get(partitionColumn); - } else { - if (columnPartitionMap.size() == 1) { - columnPartitionMetadata = columnPartitionMap.values().iterator().next(); - } - } - if (columnPartitionMetadata != null && columnPartitionMetadata.getPartitions().size() == 1) { - return columnPartitionMetadata.getPartitions().iterator().next(); + return segmentPartitionMetadata != null ? getPartitionIdFromSegmentPartitionMetadata(segmentPartitionMetadata, + partitionColumn) : null; + } + + /// Returns the partition id of a segment based on [SegmentPartitionMetadata]. + /// Can return `null` if the partition id cannot be determined. + @VisibleForTesting + @Nullable + static Integer getPartitionIdFromSegmentPartitionMetadata(SegmentPartitionMetadata segmentPartitionMetadata, + @Nullable String partitionColumn) { + Map<String, ColumnPartitionMetadata> columnPartitionMap = segmentPartitionMetadata.getColumnPartitionMap(); + ColumnPartitionMetadata columnPartitionMetadata = null; + if (partitionColumn != null) { + columnPartitionMetadata = columnPartitionMap.get(partitionColumn); + } else { + if (columnPartitionMap.size() == 1) { + columnPartitionMetadata = columnPartitionMap.values().iterator().next(); } } - return null; + if (columnPartitionMetadata != null && columnPartitionMetadata.getPartitions().size() == 1) { + return columnPartitionMetadata.getPartitions().iterator().next(); + } else { + return null; + } + } + + /// Returns the partition id of a segment based on segment name or ZK metadata, or a default partition id based on the + /// hash of the segment name. + /// Important: The method is costly because it may read data from zookeeper. Do not use it in query execution path. + public static int getSegmentPartitionIdOrDefault(String segmentName, String tableNameWithType, + HelixManager helixManager, @Nullable String partitionColumn) { + Integer partitionId = getSegmentPartitionId(segmentName, tableNameWithType, helixManager, partitionColumn); + return partitionId != null ? partitionId : getDefaultPartitionId(segmentName); + } + + /// Returns the partition id of a segment based on segment name or ZK metadata, or a default partition id based on the + /// hash of the segment name. + public static int getSegmentPartitionIdOrDefault(SegmentZKMetadata segmentZKMetadata, Review Comment: Unused? ########## pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/TableRebalancer.java: ########## @@ -542,9 +542,19 @@ private RebalanceResult doRebalance(TableConfig tableConfig, RebalanceConfig reb if (segmentsToMoveChanged) { try { // Re-calculate the instance partitions in case the instance configs changed during the rebalance - instancePartitionsMap = - getInstancePartitionsMap(tableConfig, reassignInstances, bootstrap, false, - minimizeDataMovement, tableRebalanceLogger).getLeft(); + Pair<Map<InstancePartitionsType, InstancePartitions>, Boolean> instancePartitionsMapAndUnchanged = + getInstancePartitionsMap(tableConfig, reassignInstances, bootstrap, false, minimizeDataMovement, + tableRebalanceLogger); + instancePartitionsMap = instancePartitionsMapAndUnchanged.getLeft(); + instancePartitionsUnchanged = instancePartitionsMapAndUnchanged.getRight(); + // If the instance partitions have changed, clear the segmentPartitionIdMap as the number of partitions + // may have changed, resulting in a different partitionId calculation. This change will only make a + // difference for the scenario when it was changed from or to 1 partition. The numPartitions is not used + // otherwise. + if (!instancePartitionsUnchanged) { + LOGGER.info("Clear the cached segmentPartitionIdMap as the instance partitions has changed"); + segmentPartitionIdMap.clear(); + } Review Comment: I don't follow this assertion: > This change will only make a difference for the scenario when it was changed from or to 1 partition. The numPartitions is not used otherwise. Isn't `numPartitions` still used to calculate the actual partition ID to assign a segment to within a replica group (using the segment's partition ID)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org