platoneko opened a new issue, #11874:
URL: https://github.com/apache/doris/issues/11874

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Description
   
   Current log output content is not easy to locate the problems. The error 
message received by FE  also has no useful information. The error handling of 
many codes in `be/src/olap` directory is to print out the warning log first and 
then construct  an `OLAPInternalError` with `precise_code`; Then, the returned 
`OLAPInternalError` at the call is translated into other `Status`, which cause 
the origin error cannot be passed to the top of the call stack. In order to 
locate the source of an error, we often need to trace multiple lines of 
discontinuous logs. So I think a better error handling mode should be:
   1. The function whose return value is `Status` should directly return this 
`Status` when encountering an error `Status`. The involved data (i.e. 
`table_id`, `rowset_id`, `txn_id`, `signature`, etc.) with error msg should be 
output through the warning log in these situations:
       1. Function returns non status value. i.e.
       ```c++
       void Caller() {
           ...
           Status s = callee();
           if (!s.ok()) {
               LOG(WARNING) << "failed to xxx. reason: " << s;
           }
           ...
       }
       ```
       2. Sometimes a non OK Status should not be considered as an error. i.e.
       ```c++
       Status Caller() {
           ...
           Status s = callee();
           if (s.is_already_exist()) {
               LOG(WARNING) << "failed to xxx. reason: " << s;
               s = Status::OK();
           }
           ...
       }
       ```
       3. A batch of operations can tolerate some failures. i.e.
       ```c++
       Status Caller() {
           ...
           for (auto& arg : args) {
               Status s = callee(arg);
               if (!s.ok()) {
                   LOG(WARNING) << "failed to xxx. arg=" << arg << ", reason: " 
<< s;
               }
           }
           ...
       }
       ```
       4. Retry operations. i.e
       ```c++
       Status Caller() {
           ...
           Status s;
           while (retry > 0) {
               s = callee();
               --retry;
               if (s.ok()) {
                   break;
               } else {
                   LOG(WARNING) << "failed to xxx. retry=" << retry << ", 
reason: " << s;
               }
           }
           if (!s.ok()) {
               return s;
           }
           ...
       }
       ```
   2. The function of the `precise_code` should be to judge the type of error 
returned by the callee, not as a description of the error. The reason for the 
error can be described in more detailed and flexible `err_msg`.
   3. The asynchronously processed function should carry a `Status` in the 
context to facilitate passing the error reason to the join point.
   
   ### Solution
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to