mjsax commented on PR #16970:
URL: https://github.com/apache/kafka/pull/16970#issuecomment-2313771856

   Seems this test is full of race conditions? -- But it seems there is no easy 
way to control it fully...
   
   Not sure if using more input records by itself is the right way to go 
though? Seems only to reduce the likelihood that it fails? Should we maybe 
instead change the test condition (eg, we could count down 
`onRestoreSuspendedLatch` also in `onRestoreEnd` callback)?
   
   Or we could decrease `MAX_POLL_RECORDS_CONFIG` to `1` to slow down 
restoration even more, and maybe make the `RESTORATION_DELAY` "dynamic" -- ie, 
keep it at 500ms, but after we reached a condition, reduce it to zero?
   
   For the new test failure: given that we need to restore 1000 records, now, 
it seems we might just need more time to transit to RUNNING. Not the test log 
line:
   ```
   
[shouldInvokeUserDefinedGlobalStateRestoreListeners_dZty6RRJSL5X__RAIPGg-ks2-StreamThread-1]
 task [0_0] Suspended from RESTORING
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to