zclllyybb commented on issue #63921:
URL: https://github.com/apache/doris/issues/63921#issuecomment-4587215224

   Thanks for the detailed report. I checked the current `apache/doris` 
upstream `master` code at `487f783334648c60d982c49bdb315b01c29cdcc7`, and this 
looks like a real ORC reader lifecycle bug rather than ASAN noise.
   
   Confirmed code path:
   
   - `ORCFileInputStream::read()` and `StripeStreamInputStream::read()` throw 
`orc::ParseError("stop")` when `IOContext::should_stop` is set.
   - `_create_file_reader()` already converts that same stop condition to 
`Status::EndOfFile("stop")`.
   - `_init_orc_row_reader()` catches exceptions from 
`_reader->createRowReader(...)`, but for `should_stop && e.what() == "stop"` it 
suppresses the exception and then returns `Status::OK()` without constructing 
`_row_reader` or `_batch`.
   - After a successful `init_reader()`, the read-by-row path assumes 
`_row_reader` exists. `_seek_to_read_one_line()` dereferences it via 
`_row_reader->seekToRow(...)`, so an OK-with-null-reader state can crash 
exactly as reported. The same invalid state would also be unsafe for later 
`nextBatch()` callers.
   
   Judgement:
   
   - I agree with the reported root cause: successful ORC reader initialization 
must not leave `_row_reader == nullptr`.
   - The x86 versus ARM difference should not be treated as a platform-specific 
ORC behavior. Once `_row_reader` is dereferenced while null, the code is 
already in undefined behavior.
   - One nuance from the current code is that `_get_next_block_impl()` has an 
early `should_stop` check. If the stop flag remains set until read time, it can 
return EOF before `_seek_to_read_one_line()`. That does not remove the bug: the 
invalid state is introduced earlier by returning OK from 
`_init_orc_row_reader()` after row-reader creation failed.
   
   Suggested fix:
   
   1. In `_init_orc_row_reader()`, handle `should_stop && "stop"` consistently 
with `_create_file_reader()` and return `Status::EndOfFile("stop")` instead of 
`Status::OK()`.
   2. Add a defensive invariant before returning OK from 
`_init_orc_row_reader()`, for example returning an internal error if 
`_row_reader` or `_batch` is still null. The important contract should be: OK 
means the ORC row reader is usable.
   3. Add or adjust a BE UT that forces the stop path during ORC row-reader 
creation and verifies no SEGV and no OK-with-null-reader state. 
`OrcReadLinesTest.test0` is a useful regression surface, but a narrower 
invariant test around row-reader initialization would make the failure mode 
clearer.
   
   Missing information that would help the PR but does not block the code fix:
   
   - Exact Doris commit SHA used by the failing ASAN run.
   - Full ASAN log and BE unit-test command output.
   - Confirmation of what set `IOContext::should_stop` during row-reader 
creation in the failing run, if known.
   
   Breakwater-GitHub-Analysis-Slot: slot_1cd2a3e1dfdc
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to