Jackie-Jiang commented on code in PR #15617:
URL: https://github.com/apache/pinot/pull/15617#discussion_r2054996050


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/TableRebalancer.java:
##########
@@ -479,6 +482,16 @@ private RebalanceResult doRebalance(TableConfig 
tableConfig, RebalanceConfig reb
         externalViewStabilizationTimeoutInMs);
     int expectedVersion = currentIdealState.getRecord().getVersion();
 
+    // Cache segment partition id to avoid ZK reads. Similar behavior as cache 
used in StrictReplicaGroupAssignment
+    // NOTE:
+    // 1. This cache is used for table rebalance only, but not segment 
assignment. During rebalance, rebalanceTable()
+    //    can be invoked multiple times when the ideal state changes during 
the rebalance process.
+    // 2. The cache won't be refreshed when an existing segment is replaced 
with a segment from a different partition.
+    //    Replacing a segment with a segment from a different partition should 
not be allowed for upsert table because
+    //    it will cause the segment being served by the wrong servers. If this 
happens during the table rebalance,
+    //    another rebalance might be needed to fix the assignment.
+    Object2IntOpenHashMap<String> segmentPartitionIdMap = new 
Object2IntOpenHashMap<>();

Review Comment:
   For my understanding, is this introduced to reduce the overhead of computing 
the partition id for the same segment multiple times? Does this improvement 
also apply to the current algorithm?



##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/TableRebalancer.java:
##########
@@ -598,9 +611,12 @@ private RebalanceResult doRebalance(TableConfig 
tableConfig, RebalanceConfig reb
             "Rebalance has stopped already before updating the IdealState", 
instancePartitionsMap,
             tierToInstancePartitionsMap, targetAssignment, preChecksResult, 
summaryResult);
       }
+      String partitionColumn = 
TableConfigUtils.getPartitionColumn(tableConfig);
       Map<String, Map<String, String>> nextAssignment =
           getNextAssignment(currentAssignment, targetAssignment, 
minAvailableReplicas, enableStrictReplicaGroup,

Review Comment:
   Seems most of the new arguments are used to calculate the partition id, 
consider passing a function (lambda) to simplify this



##########
pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java:
##########
@@ -624,6 +624,13 @@ public RebalanceResult rebalance(
           + "more servers.") @DefaultValue("false") @QueryParam("lowDiskMode") 
boolean lowDiskMode,
       @ApiParam(value = "Whether to use best-efforts to rebalance (not fail 
the rebalance when the no-downtime "
           + "contract cannot be achieved)") @DefaultValue("false") 
@QueryParam("bestEfforts") boolean bestEfforts,
+      @ApiParam(value = "How many maximum segment adds per server to update in 
the IdealState in each step. For "
+          + "non-strict replica group based assignment, this number will be 
the closest possible without splitting up "
+          + "a single segment's step's replicas across steps (so some servers 
may get fewer segments). For strict "
+          + "replica group based assignment, this is a per-server best effort 
value since each partition of a replica "
+          + "group must be moved as a whole and at least one partition in a 
replica group should be moved. A value of "
+          + "Integer.MAX_VALUE is used to indicate an unlimited batch size, 
which is the non-batching behavior.")
+      @DefaultValue("2147483647") @QueryParam("batchSizePerServer") int 
batchSizePerServer,

Review Comment:
   Let's make default `0` or `-1` and treat non-positive value as unlimited



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to