asadjan4611 opened a new issue, #17985:
URL: https://github.com/apache/dolphinscheduler/issues/17985

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   The alert server process can terminate immediately during startup when 
plugin metadata table validation fails.
   
   In `AlertPluginManager`, if `checkPluginDefineTableExist()` returns false, 
the code calls `System.exit(1)`.  
   This causes hard process termination instead of graceful handling/retry, 
which can result in alert service downtime from transient 
DB/connectivity/migration timing issues.
   
   
   ### What you expected to happen
   
   The alert server should not hard-exit on this condition.  
   It should fail gracefully (clear error + retry/backoff or controlled startup 
failure without force process exit), so operators can recover without abrupt 
crash behavior.
   
   
   ### How to reproduce
   
   1. Start DolphinScheduler alert-server with database temporarily unavailable 
or with metadata table check failing condition.
   2. Alert server initializes `AlertPluginManager`.
   3. `checkPluginDefineTableExist()` returns false.
   4. Process executes `System.exit(1)` and terminates.
   
   Relevant code path:
   - 
`dolphinscheduler-alert/dolphinscheduler-alert-server/.../AlertPluginManager.java`
   - `checkAlertPluginExist()` calls `System.exit(1)` when plugin table is not 
detected.
   
   
   
   ### Anything else
   
   Current behavior introduces an avoidable hard-stop path:
   
   ```
   private void checkAlertPluginExist() {
       if (!pluginDao.checkPluginDefineTableExist()) {
           log.error("Plugin Define Table t_ds_plugin_define Not Exist. Please 
Create it First!");
           System.exit(1);
       }
   }
   ```
   
   Potential direction:
   - Replace `System.exit(1)` with exception propagation + managed lifecycle 
stop/retry strategy.
   - Add startup diagnostics and retry/backoff for transient DB unavailability.
   
   
   ### Version
   
   dev
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to