Hi everyone:

Regarding the “Flink Cluster Alarm Function”, the following is a summary of the 
discussion:

Requirement:
l  When Flink Cluster encounters an exception and is unable to run, alert the 
user and block the job alert notification running in the Flink Cluster
Detailed logic:
1.      Flink Cluster implements mentality detection and status updates, as 
detailed in:  https://github.com/apache/incubator-streampark/pull/2675

2.      When an exception occurs in a job, it is necessary to determine whether 
the job deployment mode is remote, yarn session or k8s session:

a)      If not, send the job alarm directly.

b)      If so, obtain the flink cluster status through the Flink Cluster ID of 
the job:

                 i.          If the flink cluster status is STOP or LOST, block 
the job alarm and wait for the flink cluster alarm.

                ii.          If the status of flink cluster is RUNNING, 
actively trigger a flink cluster status update request to update the relevant 
status of flink cluster. If flink cluster is updated to STOP or LOST status in 
the latest update, the job alarm will be blocked; If the flink cluster status 
is still RUNNING, send an alarm notification for the job.
3.      Flink cluster alarm template uses job alarm template and adds 
information: number of affected jobs.
4.      Abstract the alarm template code to avoid code redundancy issues.
Issue details can be found at: 
https://github.com/apache/incubator-streampark/issues/2423





Best,
JiangFeng Xu

Reply via email to