platoneko opened a new issue, #11874: URL: https://github.com/apache/doris/issues/11874
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues. ### Description Current log output content is not easy to locate the problems. The error message received by FE also has no useful information. The error handling of many codes in `be/src/olap` directory is to print out the warning log first and then construct an `OLAPInternalError` with `precise_code`; Then, the returned `OLAPInternalError` at the call is translated into other `Status`, which cause the origin error cannot be passed to the top of the call stack. In order to locate the source of an error, we often need to trace multiple lines of discontinuous logs. So I think a better error handling mode should be: 1. The function whose return value is `Status` should directly return this `Status` when encountering an error `Status`. The involved data (i.e. `table_id`, `rowset_id`, `txn_id`, `signature`, etc.) with error msg should be output through the warning log in these situations: 1. Function returns non status value. i.e. ```c++ void Caller() { ... Status s = callee(); if (!s.ok()) { LOG(WARNING) << "failed to xxx. reason: " << s; } ... } ``` 2. Sometimes a non OK Status should not be considered as an error. i.e. ```c++ Status Caller() { ... Status s = callee(); if (s.is_already_exist()) { LOG(WARNING) << "failed to xxx. reason: " << s; s = Status::OK(); } ... } ``` 3. A batch of operations can tolerate some failures. i.e. ```c++ Status Caller() { ... for (auto& arg : args) { Status s = callee(arg); if (!s.ok()) { LOG(WARNING) << "failed to xxx. arg=" << arg << ", reason: " << s; } } ... } ``` 4. Retry operations. i.e ```c++ Status Caller() { ... Status s; while (retry > 0) { s = callee(); --retry; if (s.ok()) { break; } else { LOG(WARNING) << "failed to xxx. retry=" << retry << ", reason: " << s; } } if (!s.ok()) { return s; } ... } ``` 2. The function of the `precise_code` should be to judge the type of error returned by the callee, not as a description of the error. The reason for the error can be described in more detailed and flexible `err_msg`. 3. The asynchronously processed function should carry a `Status` in the context to facilitate passing the error reason to the join point. ### Solution _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org