michalantkowicz commented on issue #13752:
URL: https://github.com/apache/iceberg/issues/13752#issuecomment-3162716380

   @nferrario I've checked and it seems that my control topic has 7D retention 
(the default one) and the data was skipped anyway even after restarts - what's 
even more interesting not all of the data was lost - it usually was ~33% or 66% 
what makes sense while I have maxWorkers set to `3` - seems that some of them 
were corrupted 
   
   @kumarpritam863 - unfortunately in environment I have it set up I cannot 
test changes that are not released yet.. but I can describe the scenario:
   1. I have two pods with docker image of `iceberg-kafka-connect` deployed in 
kubernetes cluster, let say they are working fine putting some messages from 
source topic to iceberg tables, one of them has coord process 
   2. There's restart (e.g. deployment of newer version)
   3. After rolling restart sometimes the `java.lang.IllegalStateException: 
Connection pool shut down` logs start to appear on pod with coord process and 
it remains unhealthy until next restart (if the situation will not repeat after 
the restart)
   
   I thought that it happens when I have more than one consumer-id connected to 
`control-iceberg` topic but now I see that I have this situation currently and 
there's no data loss 
   
   ```
   GROUP                               TOPIC           PARTITION  
CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                     
                                          HOST            CLIENT-ID
   connect-mantkowicz-test-iceberg-coord control-iceberg 0          156894      
    156895          1               adbfe6cb-e30a-... /10.4.26.137    
adbfe6cb-e30a-...
   connect-mantkowicz-test-iceberg-coord control-iceberg 1          166922      
    166927          5               adbfe6cb-e30a-... /10.4.26.137    
adbfe6cb-e30a-...
   connect-mantkowicz-test-iceberg-coord control-iceberg 2          173379      
    173386          7               e803e448-6dad-... /10.4.26.137    
e803e448-6dad-...
   ```
   
   (I've compared counts by hour on kafka and in iceberg and they match)
   
   I've observed that this situation is more likely to happen if the pod 
WITHOUT coord process will be restarted first


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to