yashmayya commented on code in PR #15817:
URL: https://github.com/apache/pinot/pull/15817#discussion_r2120150368


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/SegmentUtils.java:
##########
@@ -33,38 +34,42 @@ public class SegmentUtils {
   private SegmentUtils() {
   }
 
-  // Returns the partition id of a realtime segment based segment name and 
segment metadata info retrieved via Helix.
-  // Important: The method is costly because it may read data from zookeeper. 
Do not use it in any query execution
-  // path.
+  /// Returns the partition id of a segment based on segment name or ZK 
metadata.
+  /// Can return `null` if the partition id cannot be determined.
+  /// Important: The method is costly because it may read data from zookeeper. 
Do not use it in query execution path.
   @Nullable
-  public static Integer getRealtimeSegmentPartitionId(String segmentName, 
String realtimeTableName,
-      HelixManager helixManager, @Nullable String partitionColumn) {
-    Integer partitionId = getPartitionIdFromRealtimeSegmentName(segmentName);
+  public static Integer getSegmentPartitionId(String segmentName, String 
tableNameWithType, HelixManager helixManager,

Review Comment:
   This util method is now being used in certain contexts for `OFFLINE` tables 
as well so we'll first try to get the partition ID from the segment name using 
the realtime table logic (LLC or uploaded segment). Is that intentional in 
order to consolidate the logic? What if the offline segment name happens to 
have 4 or 5 parts separated by `__`? Looks like 
`UploadedRealtimeSegmentName.of` handles parse exceptions (expected number of 
parts, but token in `partitionId` position not an `int` for instance) and 
returns `null`, but `LLCSegmentName.of` does not.



##########
pinot-common/src/main/java/org/apache/pinot/common/utils/SegmentUtils.java:
##########
@@ -76,25 +81,61 @@ public static Integer 
getPartitionIdFromRealtimeSegmentName(String segmentName)
     return null;
   }
 
+  /// Returns the partition id of a segment based on segment ZK metadata.
+  /// Can return `null` if the partition id cannot be determined.
   @Nullable
-  private static Integer getRealtimeSegmentPartitionId(SegmentZKMetadata 
segmentZKMetadata,
+  private static Integer getPartitionIdFromSegmentZKMetadata(SegmentZKMetadata 
segmentZKMetadata,
       @Nullable String partitionColumn) {
     SegmentPartitionMetadata segmentPartitionMetadata = 
segmentZKMetadata.getPartitionMetadata();
-    if (segmentPartitionMetadata != null) {
-      Map<String, ColumnPartitionMetadata> columnPartitionMap = 
segmentPartitionMetadata.getColumnPartitionMap();
-      ColumnPartitionMetadata columnPartitionMetadata = null;
-      if (partitionColumn != null) {
-        columnPartitionMetadata = columnPartitionMap.get(partitionColumn);
-      } else {
-        if (columnPartitionMap.size() == 1) {
-          columnPartitionMetadata = 
columnPartitionMap.values().iterator().next();
-        }
-      }
-      if (columnPartitionMetadata != null && 
columnPartitionMetadata.getPartitions().size() == 1) {
-        return columnPartitionMetadata.getPartitions().iterator().next();
+    return segmentPartitionMetadata != null ? 
getPartitionIdFromSegmentPartitionMetadata(segmentPartitionMetadata,
+        partitionColumn) : null;
+  }
+
+  /// Returns the partition id of a segment based on 
[SegmentPartitionMetadata].
+  /// Can return `null` if the partition id cannot be determined.
+  @VisibleForTesting
+  @Nullable
+  static Integer 
getPartitionIdFromSegmentPartitionMetadata(SegmentPartitionMetadata 
segmentPartitionMetadata,
+      @Nullable String partitionColumn) {
+    Map<String, ColumnPartitionMetadata> columnPartitionMap = 
segmentPartitionMetadata.getColumnPartitionMap();
+    ColumnPartitionMetadata columnPartitionMetadata = null;
+    if (partitionColumn != null) {
+      columnPartitionMetadata = columnPartitionMap.get(partitionColumn);
+    } else {
+      if (columnPartitionMap.size() == 1) {
+        columnPartitionMetadata = 
columnPartitionMap.values().iterator().next();
       }
     }
-    return null;
+    if (columnPartitionMetadata != null && 
columnPartitionMetadata.getPartitions().size() == 1) {
+      return columnPartitionMetadata.getPartitions().iterator().next();
+    } else {
+      return null;
+    }
+  }
+
+  /// Returns the partition id of a segment based on segment name or ZK 
metadata, or a default partition id based on the
+  /// hash of the segment name.
+  /// Important: The method is costly because it may read data from zookeeper. 
Do not use it in query execution path.
+  public static int getSegmentPartitionIdOrDefault(String segmentName, String 
tableNameWithType,
+      HelixManager helixManager, @Nullable String partitionColumn) {
+    Integer partitionId = getSegmentPartitionId(segmentName, 
tableNameWithType, helixManager, partitionColumn);
+    return partitionId != null ? partitionId : 
getDefaultPartitionId(segmentName);
+  }
+
+  /// Returns the partition id of a segment based on segment name or ZK 
metadata, or a default partition id based on the
+  /// hash of the segment name.
+  public static int getSegmentPartitionIdOrDefault(SegmentZKMetadata 
segmentZKMetadata,

Review Comment:
   Unused?



##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/TableRebalancer.java:
##########
@@ -542,9 +542,19 @@ private RebalanceResult doRebalance(TableConfig 
tableConfig, RebalanceConfig reb
         if (segmentsToMoveChanged) {
           try {
             // Re-calculate the instance partitions in case the instance 
configs changed during the rebalance
-            instancePartitionsMap =
-                getInstancePartitionsMap(tableConfig, reassignInstances, 
bootstrap, false,
-                    minimizeDataMovement, tableRebalanceLogger).getLeft();
+            Pair<Map<InstancePartitionsType, InstancePartitions>, Boolean> 
instancePartitionsMapAndUnchanged =
+                getInstancePartitionsMap(tableConfig, reassignInstances, 
bootstrap, false, minimizeDataMovement,
+                    tableRebalanceLogger);
+            instancePartitionsMap = 
instancePartitionsMapAndUnchanged.getLeft();
+            instancePartitionsUnchanged = 
instancePartitionsMapAndUnchanged.getRight();
+            // If the instance partitions have changed, clear the 
segmentPartitionIdMap as the number of partitions
+            // may have changed, resulting in a different partitionId 
calculation. This change will only make a
+            // difference for the scenario when it was changed from or to 1 
partition. The numPartitions is not used
+            // otherwise.
+            if (!instancePartitionsUnchanged) {
+              LOGGER.info("Clear the cached segmentPartitionIdMap as the 
instance partitions has changed");
+              segmentPartitionIdMap.clear();
+            }

Review Comment:
   I don't follow this assertion:
   
   > This change will only make a difference for the scenario when it was 
changed from or to 1 partition. The numPartitions is not used otherwise.
   
   Isn't `numPartitions` still used to calculate the actual partition ID to 
assign a segment to within a replica group (using the segment's partition ID)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to