Jackie-Jiang opened a new pull request, #16696: URL: https://github.com/apache/pinot/pull/16696
## Summary This PR fixes a critical issue in Multi-Stage Engine operators where earlyTerminate was not being called consistently when operators were interrupted, leading to incomplete cleanup and potential resource leaks. ## Problem The main issue was that operators were checking for interruption using Tracing.ThreadAccountantOps.isInterrupted and throwing EarlyTerminationException directly, but NOT calling earlyTerminate before throwing the exception. This caused: - Incomplete operator cleanup when queries were cancelled or timed out - Potential resource leaks in operators that needed cleanup - Inconsistent termination behavior across different operators ## Root Cause Operators were using direct calls that bypassed the earlyTerminate method which is crucial for proper operator cleanup. ## Solution ### Centralized Early Termination with Proper Cleanup - Fixed checkInterruption method: Now properly calls earlyTerminate before throwing exceptions - Consistent interruption handling: All operators now use centralized methods that ensure earlyTerminate is called - Proper cleanup sequence: Timeout and resource limit exceptions now trigger proper cleanup ### Key Changes #### MultiStageOperator Base Class - checkInterruption: Now calls earlyTerminate before throwing timeout or resource limit exceptions - sampleAndCheckInterruption: Ensures proper cleanup through checkInterruption - sampleAndCheckInterruptionPeriodically: Periodic checking with proper cleanup - Enhanced nextBlock: Uses checkInterruption to ensure cleanup on interruption #### All Operators Updated - Replaced direct Tracing.ThreadAccountantOps.sampleAndCheckInterruptionPeriodically calls - Now use centralized sampleAndCheckInterruptionPeriodically which ensures earlyTerminate is called - Consistent behavior across join operators, window operators, and mailbox operators ## Impact This fix ensures that when queries are cancelled, timed out, or hit resource limits: 1. earlyTerminate is always called for proper operator cleanup 2. Resources are properly released instead of being leaked 3. Consistent behavior across all MSE operators 4. Improved system stability under high load or timeout scenarios ## Files Modified - MultiStageOperator.java - Fixed interruption checking to call earlyTerminate - BaseMailboxReceiveOperator.java - Updated to use proper interruption methods - MailboxSendOperator.java - Updated to use proper interruption methods - AsofJoinOperator.java - Uses centralized interruption checking with cleanup - HashJoinOperator.java - Uses centralized interruption checking with cleanup - NonEquiJoinOperator.java - Uses centralized interruption checking with cleanup - WindowAggregateOperator.java - Uses centralized interruption checking with cleanup ## Testing - Code compiles successfully - All existing tests pass - Manual testing of query cancellation scenarios - Verified earlyTerminate is called on timeouts and resource limits - Tested resource cleanup under various interruption scenarios This fix is critical for system stability and proper resource management in multi-stage query execution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
