hydrogenlee commented on PR #14080:
URL: https://github.com/apache/iceberg/pull/14080#issuecomment-3342759027

   > Another question: any idea how to test this? It would be good to add a 
test, so we could prevent the issue recurring
   
   I originally added the check(see below) in `initializeState` function to 
make sure there’s only one `GlobalStatistics` in state, but removed it before 
submitting the PR because it might break backward compatibility. The old 
version’s `globalStatisticsState` contains multiple states, after upgrading 
iceberg connector, which may cause exception and prevent the job from starting.
   
   Adding a check like this would help prevent the issue and we could test it 
in unit tests, but I’m not sure if we can change it that way. Do you have any 
suggestions?
   
   ```
   if (context.isRestored()) {
         List<GlobalStatistics> globalStatisticsList = new ArrayList<>();
         
IterableUtils.emptyIfNull(globalStatisticsState.get()).forEach(globalStatisticsList::add);
   
         if (CollectionUtils.isEmpty(globalStatisticsList)) {
           LOG.info(
               "Operator {} subtask {} doesn't have global statistics state to 
restore",
               operatorName,
               subtaskIndex);
           // If Flink deprecates union state in the future, 
RequestGlobalStatisticsEvent can be
           // leveraged to request global statistics from coordinator if new 
subtasks (scale-up case)
           // has nothing to restore from.
         } else {
           if (globalStatisticsList.size() > 1) {
             throw new IllegalStateException("There should be at most one 
global stats written by the first subtask");
           }
   
           GlobalStatistics restoredStatistics = globalStatisticsList.get(0);
           LOG.info(
               "Operator {} subtask {} restored global statistics state", 
operatorName, subtaskIndex);
           this.globalStatistics = restoredStatistics;
         }
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to