sollhui opened a new pull request, #39790:
URL: https://github.com/apache/doris/pull/39790

   `updateCloudProgress()` will be called when job need schedule, and then:
   1. update job statistic by accumulation
   2. do data quality check
   
   
   Update job statistic by accumulation will cause job statistic incorrect, for 
job pause and resume will double the job statistic.
   
   do data quality check is not need, for `updateCloudProgress()` is similar to 
the reply operation, 
   ```
   if (this.jobStatistic.currentErrorRows > maxErrorNum
                   || (this.jobStatistic.currentTotalRows > 0
                       && ((double) this.jobStatistic.currentErrorRows
                               / this.jobStatistic.currentTotalRows) > 
maxFilterRatio)) {
               LOG.info(new LogBuilder(LogKey.ROUTINE_LOAD_JOB, id)
                       .add("current_total_rows", 
this.jobStatistic.currentTotalRows)
                       .add("current_error_rows", 
this.jobStatistic.currentErrorRows)
                       .add("max_error_num", maxErrorNum)
                       .add("max_filter_ratio", maxFilterRatio)
                       .add("msg", "current error rows is more than 
max_error_number "
                               + "or the max_filter_ratio is more than the max, 
begin to pause job")
                       .build());
               if (!isReplay) {
                   // remove all of task in jobs and change job state to paused
                   updateState(JobState.PAUSED, new 
ErrorReason(InternalErrorCode.TOO_MANY_FAILURE_ROWS_ERR,
                           "current error rows is more than max_error_number "
                               + "or the max_filter_ratio is more than the 
value set"), isReplay);
               }
               // reset currentTotalNum and currentErrorNum
               this.jobStatistic.currentErrorRows = 0;
               this.jobStatistic.currentTotalRows = 0;
           }
   ```
   if quality checks are performed, it will result in the job not being able to 
`resume`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to