zclllyybb commented on issue #63921:
URL: https://github.com/apache/doris/issues/63921#issuecomment-4587215224
Thanks for the detailed report. I checked the current `apache/doris`
upstream `master` code at `487f783334648c60d982c49bdb315b01c29cdcc7`, and this
looks like a real ORC reader lifecycle bug rather than ASAN noise.
Confirmed code path:
- `ORCFileInputStream::read()` and `StripeStreamInputStream::read()` throw
`orc::ParseError("stop")` when `IOContext::should_stop` is set.
- `_create_file_reader()` already converts that same stop condition to
`Status::EndOfFile("stop")`.
- `_init_orc_row_reader()` catches exceptions from
`_reader->createRowReader(...)`, but for `should_stop && e.what() == "stop"` it
suppresses the exception and then returns `Status::OK()` without constructing
`_row_reader` or `_batch`.
- After a successful `init_reader()`, the read-by-row path assumes
`_row_reader` exists. `_seek_to_read_one_line()` dereferences it via
`_row_reader->seekToRow(...)`, so an OK-with-null-reader state can crash
exactly as reported. The same invalid state would also be unsafe for later
`nextBatch()` callers.
Judgement:
- I agree with the reported root cause: successful ORC reader initialization
must not leave `_row_reader == nullptr`.
- The x86 versus ARM difference should not be treated as a platform-specific
ORC behavior. Once `_row_reader` is dereferenced while null, the code is
already in undefined behavior.
- One nuance from the current code is that `_get_next_block_impl()` has an
early `should_stop` check. If the stop flag remains set until read time, it can
return EOF before `_seek_to_read_one_line()`. That does not remove the bug: the
invalid state is introduced earlier by returning OK from
`_init_orc_row_reader()` after row-reader creation failed.
Suggested fix:
1. In `_init_orc_row_reader()`, handle `should_stop && "stop"` consistently
with `_create_file_reader()` and return `Status::EndOfFile("stop")` instead of
`Status::OK()`.
2. Add a defensive invariant before returning OK from
`_init_orc_row_reader()`, for example returning an internal error if
`_row_reader` or `_batch` is still null. The important contract should be: OK
means the ORC row reader is usable.
3. Add or adjust a BE UT that forces the stop path during ORC row-reader
creation and verifies no SEGV and no OK-with-null-reader state.
`OrcReadLinesTest.test0` is a useful regression surface, but a narrower
invariant test around row-reader initialization would make the failure mode
clearer.
Missing information that would help the PR but does not block the code fix:
- Exact Doris commit SHA used by the failing ASAN run.
- Full ASAN log and BE unit-test command output.
- Confirmation of what set `IOContext::should_stop` during row-reader
creation in the failing run, if known.
Breakwater-GitHub-Analysis-Slot: slot_1cd2a3e1dfdc
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]