zhouxiz9 commented on PR #10979:
URL: https://github.com/apache/pinot/pull/10979#issuecomment-1661155624

   > @jtao15 @zhouxiz9 I synced up with @jtao15 yesterday. It looks that the 
issue that we need to address 2 things:
   > 
   > 1. detect the failure issue earlier than 24 hours.
   > 2. optimize the runtime by only running the failed portion.
   > 
   > It looks that this PR potentially improve 2; however, this may not address 
the issue that @zhouxiz9 and @jtao15 is currently facing. We will revisit the 
PR once the root cause of the ongoing is identified.
   
   Hi @snleee, I synced with @jtao15 today and understand that a more 
comprehensive design is needed to cover the cases such as late events and 
spill-over. The current fix is only solving part of the problem. I'll close 
this PR and create a new one once that design is ready.
   
   Since we are facing the minion delay issue (due to repetitively merging 
already merged segments) in production, a short term fix is needed. I've 
created another [PR](https://github.com/apache/pinot/pull/11243) to make 
`MaxAttemptsPerTask` configurable so that we can try to increase this value and 
to better handle the transient errors. Please let me know if that works.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to