cambyzju opened a new pull request, #30738: URL: https://github.com/apache/doris/pull/30738
## Proposed changes While read orc data through multi-catalog, if column type in meta is string, but the real data type is INT or Timestamp or others, it may make be crash: ``` (gdb) bt #0 0x00007f73c30c2387 in raise () from /lib64/libc.so.6 #1 0x00007f73c30c3a78 in abort () from /lib64/libc.so.6 #2 0x000056552fad78b9 in ?? () #3 0x000056552faccecd in google::LogMessage::Fail() () #4 0x000056552facf409 in google::LogMessage::SendToLog() () #5 0x000056552facca36 in google::LogMessage::Flush() () #6 0x000056552facfa79 in google::LogMessageFatal::~LogMessageFatal() () #7 0x000056552b76b115 in doris::vectorized::ColumnString::check_chars_length (element_number=<optimized out>, total_length=<optimized out>, this=<optimized out>) at /data/doris-1.x/be/src/vec/columns/column_string.h:66 #8 doris::vectorized::ColumnString::insert_many_strings (this=<optimized out>, strings=<optimized out>, num=4064) at /data/doris-1.x/be/src/vec/columns/column_string.h:270 #9 0x000056552f66548c in doris::vectorized::OrcReader::_decode_string_column (this=this@entry=0x7f726d3b6a00, col_name=..., data_column=..., type_kind=@0x7f67b47bc0e0: orc::TIMESTAMP, cvb=cvb@entry=0x7f6d98aadd60, num_values=4064) at /data/doris-1.x/be/src/vec/exec/format/orc/vorc_reader.cpp:649 #10 0x000056552f670300 in doris::vectorized::OrcReader::_orc_column_to_doris_column (this=this@entry=0x7f726d3b6a00, col_name=..., doris_column=..., data_type=..., orc_column_type=0x7f6d96907440, cvb=0x7f6d98aadd60, num_values=4064) at /data/doris-1.x/be/src/vec/exec/format/orc/vorc_reader.cpp:738 #11 0x000056552f672d50 in doris::vectorized::OrcReader::get_next_block (this=0x7f726d3b6a00, block=0x7f6d7fe59b20, read_rows=0x7f67b47bc318, eof=<optimized out>) at /data/doris-1.x/be/src/vec/exec/format/orc/vorc_reader.cpp:788 #12 0x000056552f633ad8 in doris::vectorized::VFileScanner::_get_block_impl (this=0x7f6f051b3a00, state=<optimized out>, block=0x7f6d7fe59b20, eof=0x7f67b47bc539) at /data/doris-1.x/be/src/vec/exec/scan/vfile_scanner.cpp:155 #13 0x000056552f5ff329 in doris::vectorized::VScanner::get_block (this=this@entry=0x7f6f051b3a00, state=state@entry=0x7f6d70a61900, block=block@entry=0x7f6d7fe59b20, eof=eof@entry=0x7f67b47bc539) at /data/doris-1.x/be/src/vec/exec/scan/vscanner.cpp:54 #14 0x000056552f5fc682 in doris::vectorized::ScannerScheduler::_scanner_scan (this=<optimized out>, scheduler=<optimized out>, ctx=0x7f6f007bdc00, scanner=0x7f6f051b3a00) at /data/doris-1.x/be/src/vec/exec/scan/scanner_scheduler.cpp:247 #15 0x000056552a903b15 in std::function<void ()>::operator()() const (this=<optimized out>) at /var/local/ldb-toolchain/include/c++/11/bits/std_function.h:560 #16 doris::FunctionRunnable::run (this=<optimized out>) at /data/doris-1.x/be/src/util/threadpool.cpp:46 #17 doris::ThreadPool::dispatch_thread (this=0x7f739ad6b180) at /data/doris-1.x/be/src/util/threadpool.cpp:535 #18 0x000056552a8f8f6f in std::function<void ()>::operator()() const (this=0x7f7267d7da38) at /var/local/ldb-toolchain/include/c++/11/bits/std_function.h:560 #19 doris::Thread::supervise_thread (arg=0x7f7267d7da20) at /data/doris-1.x/be/src/util/thread.cpp:454 #20 0x00007f73c2e77ea5 in start_thread () from /lib64/libpthread.so.0 #21 0x00007f73c318a9fd in ioperm () from /lib64/libc.so.6 #22 0x0000000000000000 in ?? () ``` Here we expect `orc::StringVectorBatch*`, but the real type is `orc::TimestampVectorBatch*`: <img width="1167" alt="image" src="https://github.com/apache/doris/assets/10771715/ffbfc127-f1d8-4a42-ada3-18755d8a8aad"> In the code, we already try to check the cast result: ``` auto* data = down_cast<orc::StringVectorBatch*>(cvb); if (data == nullptr) { return Status::InternalError("Wrong data type for colum '{}'", col_name); } ``` Unfortunately `down_cast` equals to `static_cast`, because assert do not work in release build type. ``` template <typename To, typename From> // use like this: down_cast<T*>(foo); inline To down_cast(From* f) { // so we only accept pointers // Ensures that To is a sub-type of From *. This test is here only // for compile-time type checking, and has no overhead in an // optimized build at run-time, as it will be optimized away // completely. // TODO(user): This should use COMPILE_ASSERT. if (false) { ::implicit_cast<From*, To>(NULL); } // uses RTTI in dbg and fastbuild. asserts are disabled in opt builds. assert(f == NULL || dynamic_cast<To>(f) != NULL); return static_cast<To>(f); } ``` While we build release version, we define `NDEBUG` to turn off assert ``` # For CMAKE_BUILD_TYPE=Release # -O3: Enable all compiler optimizations # -DNDEBUG: Turn off dchecks/asserts/debug only code. set(CXX_FLAGS_RELEASE "${CXX_GCC_FLAGS} -O3 -DNDEBUG") ``` ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org