mcvsubbu commented on a change in pull request #6031:
URL: https://github.com/apache/incubator-pinot/pull/6031#discussion_r491053143



##########
File path: 
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java
##########
@@ -510,6 +511,11 @@ private LLCRealtimeSegmentZKMetadata 
updateCommittingSegmentZKMetadata(String re
     committingSegmentZKMetadata.setIndexVersion(segmentMetadata.getVersion());
     committingSegmentZKMetadata.setTotalDocs(segmentMetadata.getTotalDocs());
 
+    // Update the partition metadata based on the segment metadata
+    // NOTE: When the stream partition changes, or the records are not 
properly partitioned from the stream, the
+    //       partition of the segment can be different from the stream 
partition.

Review comment:
       ```suggestion
       //       partition of the segment is undefined
   ```

##########
File path: 
pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerMeter.java
##########
@@ -39,6 +39,7 @@
   REALTIME_CONSUMPTION_EXCEPTIONS("exceptions", true),
   REALTIME_OFFSET_COMMITS("commits", true),
   REALTIME_OFFSET_COMMIT_EXCEPTIONS("exceptions", false),
+  REALTIME_PARTITION_MISMATCH("mismatch", false),

Review comment:
       Can we add the metric on the controller instead? If it happens for one 
stream partition, it is highly likely that it will happen to all partitions, so 
might as well reduce noise

##########
File path: 
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java
##########
@@ -560,7 +566,8 @@ private void createNewSegmentZKMetadata(TableConfig 
tableConfig, PartitionLevelS
   }
 
   @Nullable
-  private SegmentPartitionMetadata 
getPartitionMetadataFromTableConfig(TableConfig tableConfig, int partitionId) {
+  private SegmentPartitionMetadata 
getPartitionMetadataFromTableConfig(TableConfig tableConfig, int numPartitions,

Review comment:
       ```suggestion
     private SegmentPartitionMetadata 
getPartitionMetadataFromTableConfig(TableConfig tableConfig, int 
numStreamPartitions,
   ```

##########
File path: 
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java
##########
@@ -1204,13 +1207,20 @@ public 
LLRealtimeSegmentDataManager(RealtimeSegmentZKMetadata segmentZKMetadata,
         String partitionColumn = entry.getKey();
         ColumnPartitionConfig columnPartitionConfig = entry.getValue();
         String partitionFunctionName = columnPartitionConfig.getFunctionName();
+
+        // NOTE: Here we compare the number of partitions from the config and 
the stream, and log a warning and emit a
+        //       metric when they don't match, but use the one from the 
stream. The mismatch could happen when the
+        //       stream partitions are changed, but the table config has not 
been updated to reflect the change. In such
+        //       case, picking the number of partitions from the stream can 
keep the segment properly partitioned as

Review comment:
       We don't recognize new partitions instantly. Everytime the realtime 
validation manager runs, it checks if the number of aprtitions have changed, 
and if so, starts a new consuming partition.
   Let us say at time T1 we checked and the partition number did not change
   At time T1 + 10, the partition numbers changed, but we did not know. The 
stream divided the records into a different partitioning system, thus having 
mismatched rows in (most likely) all partitions . At time T1 + 50, we check 
again, and create the new consuming segment for the new partition we detected.
   In this case, all the segments that have the mismatched rows should be 
marked as not belonging to any partition.
   
   I am not sure this condition is being handled




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to