Jackie-Jiang opened a new pull request, #16696:
URL: https://github.com/apache/pinot/pull/16696

   ## Summary
   
   This PR fixes a critical issue in Multi-Stage Engine operators where 
earlyTerminate was not being called consistently when operators were 
interrupted, leading to incomplete cleanup and potential resource leaks.
   
   ## Problem
   
   The main issue was that operators were checking for interruption using 
Tracing.ThreadAccountantOps.isInterrupted and throwing 
EarlyTerminationException directly, but NOT calling earlyTerminate before 
throwing the exception. This caused:
   
   - Incomplete operator cleanup when queries were cancelled or timed out
   - Potential resource leaks in operators that needed cleanup
   - Inconsistent termination behavior across different operators
   
   ## Root Cause
   
   Operators were using direct calls that bypassed the earlyTerminate method 
which is crucial for proper operator cleanup.
   
   ## Solution
   
   ### Centralized Early Termination with Proper Cleanup
   
   - Fixed checkInterruption method: Now properly calls earlyTerminate before 
throwing exceptions
   - Consistent interruption handling: All operators now use centralized 
methods that ensure earlyTerminate is called
   - Proper cleanup sequence: Timeout and resource limit exceptions now trigger 
proper cleanup
   
   ### Key Changes
   
   #### MultiStageOperator Base Class
   - checkInterruption: Now calls earlyTerminate before throwing timeout or 
resource limit exceptions
   - sampleAndCheckInterruption: Ensures proper cleanup through 
checkInterruption
   - sampleAndCheckInterruptionPeriodically: Periodic checking with proper 
cleanup
   - Enhanced nextBlock: Uses checkInterruption to ensure cleanup on 
interruption
   
   #### All Operators Updated
   - Replaced direct 
Tracing.ThreadAccountantOps.sampleAndCheckInterruptionPeriodically calls
   - Now use centralized sampleAndCheckInterruptionPeriodically which ensures 
earlyTerminate is called
   - Consistent behavior across join operators, window operators, and mailbox 
operators
   
   ## Impact
   
   This fix ensures that when queries are cancelled, timed out, or hit resource 
limits:
   1. earlyTerminate is always called for proper operator cleanup
   2. Resources are properly released instead of being leaked
   3. Consistent behavior across all MSE operators
   4. Improved system stability under high load or timeout scenarios
   
   ## Files Modified
   
   - MultiStageOperator.java - Fixed interruption checking to call 
earlyTerminate
   - BaseMailboxReceiveOperator.java - Updated to use proper interruption 
methods
   - MailboxSendOperator.java - Updated to use proper interruption methods
   - AsofJoinOperator.java - Uses centralized interruption checking with cleanup
   - HashJoinOperator.java - Uses centralized interruption checking with cleanup
   - NonEquiJoinOperator.java - Uses centralized interruption checking with 
cleanup
   - WindowAggregateOperator.java - Uses centralized interruption checking with 
cleanup
   
   ## Testing
   
   - Code compiles successfully
   - All existing tests pass
   - Manual testing of query cancellation scenarios
   - Verified earlyTerminate is called on timeouts and resource limits
   - Tested resource cleanup under various interruption scenarios
   
   This fix is critical for system stability and proper resource management in 
multi-stage query execution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to