cambyzju opened a new pull request, #30738:
URL: https://github.com/apache/doris/pull/30738

   ## Proposed changes
   
   While read orc data through multi-catalog, if column type in meta is string, 
but the real data type is INT or Timestamp or others, it may make be crash:
   ```
   (gdb) bt
   #0  0x00007f73c30c2387 in raise () from /lib64/libc.so.6
   #1  0x00007f73c30c3a78 in abort () from /lib64/libc.so.6
   #2  0x000056552fad78b9 in ?? ()
   #3  0x000056552faccecd in google::LogMessage::Fail() ()
   #4  0x000056552facf409 in google::LogMessage::SendToLog() ()
   #5  0x000056552facca36 in google::LogMessage::Flush() ()
   #6  0x000056552facfa79 in google::LogMessageFatal::~LogMessageFatal() ()
   #7  0x000056552b76b115 in 
doris::vectorized::ColumnString::check_chars_length (element_number=<optimized 
out>, total_length=<optimized out>, 
       this=<optimized out>) at 
/data/doris-1.x/be/src/vec/columns/column_string.h:66
   #8  doris::vectorized::ColumnString::insert_many_strings (this=<optimized 
out>, strings=<optimized out>, num=4064)
       at /data/doris-1.x/be/src/vec/columns/column_string.h:270
   #9  0x000056552f66548c in 
doris::vectorized::OrcReader::_decode_string_column 
(this=this@entry=0x7f726d3b6a00, col_name=..., data_column=..., 
       type_kind=@0x7f67b47bc0e0: orc::TIMESTAMP, cvb=cvb@entry=0x7f6d98aadd60, 
num_values=4064)
       at /data/doris-1.x/be/src/vec/exec/format/orc/vorc_reader.cpp:649
   #10 0x000056552f670300 in 
doris::vectorized::OrcReader::_orc_column_to_doris_column 
(this=this@entry=0x7f726d3b6a00, col_name=..., doris_column=..., 
       data_type=..., orc_column_type=0x7f6d96907440, cvb=0x7f6d98aadd60, 
num_values=4064) at 
/data/doris-1.x/be/src/vec/exec/format/orc/vorc_reader.cpp:738
   #11 0x000056552f672d50 in doris::vectorized::OrcReader::get_next_block 
(this=0x7f726d3b6a00, block=0x7f6d7fe59b20, read_rows=0x7f67b47bc318, 
       eof=<optimized out>) at 
/data/doris-1.x/be/src/vec/exec/format/orc/vorc_reader.cpp:788
   #12 0x000056552f633ad8 in doris::vectorized::VFileScanner::_get_block_impl 
(this=0x7f6f051b3a00, state=<optimized out>, block=0x7f6d7fe59b20, 
       eof=0x7f67b47bc539) at 
/data/doris-1.x/be/src/vec/exec/scan/vfile_scanner.cpp:155
   #13 0x000056552f5ff329 in doris::vectorized::VScanner::get_block 
(this=this@entry=0x7f6f051b3a00, state=state@entry=0x7f6d70a61900, 
       block=block@entry=0x7f6d7fe59b20, eof=eof@entry=0x7f67b47bc539) at 
/data/doris-1.x/be/src/vec/exec/scan/vscanner.cpp:54
   #14 0x000056552f5fc682 in doris::vectorized::ScannerScheduler::_scanner_scan 
(this=<optimized out>, scheduler=<optimized out>, ctx=0x7f6f007bdc00, 
       scanner=0x7f6f051b3a00) at 
/data/doris-1.x/be/src/vec/exec/scan/scanner_scheduler.cpp:247
   #15 0x000056552a903b15 in std::function<void ()>::operator()() const 
(this=<optimized out>) at 
/var/local/ldb-toolchain/include/c++/11/bits/std_function.h:560
   #16 doris::FunctionRunnable::run (this=<optimized out>) at 
/data/doris-1.x/be/src/util/threadpool.cpp:46
   #17 doris::ThreadPool::dispatch_thread (this=0x7f739ad6b180) at 
/data/doris-1.x/be/src/util/threadpool.cpp:535
   #18 0x000056552a8f8f6f in std::function<void ()>::operator()() const 
(this=0x7f7267d7da38) at 
/var/local/ldb-toolchain/include/c++/11/bits/std_function.h:560
   #19 doris::Thread::supervise_thread (arg=0x7f7267d7da20) at 
/data/doris-1.x/be/src/util/thread.cpp:454
   #20 0x00007f73c2e77ea5 in start_thread () from /lib64/libpthread.so.0
   #21 0x00007f73c318a9fd in ioperm () from /lib64/libc.so.6
   #22 0x0000000000000000 in ?? ()
   ```
   
   Here we expect `orc::StringVectorBatch*`, but the real type is 
`orc::TimestampVectorBatch*`:
   <img width="1167" alt="image" 
src="https://github.com/apache/doris/assets/10771715/ffbfc127-f1d8-4a42-ada3-18755d8a8aad";>
   
   In the code, we already try to check the cast result:
   ```
       auto* data = down_cast<orc::StringVectorBatch*>(cvb);
       if (data == nullptr) {
           return Status::InternalError("Wrong data type for colum '{}'", 
col_name);
       }
   ```
   
   Unfortunately `down_cast` equals to `static_cast`, because assert do not 
work in release build type.
   ```
   template <typename To, typename From> // use like this: down_cast<T*>(foo);
   inline To down_cast(From* f) {        // so we only accept pointers
       // Ensures that To is a sub-type of From *.  This test is here only
       // for compile-time type checking, and has no overhead in an
       // optimized build at run-time, as it will be optimized away
       // completely.
   
       // TODO(user): This should use COMPILE_ASSERT.
       if (false) {
           ::implicit_cast<From*, To>(NULL);
       }
   
       // uses RTTI in dbg and fastbuild. asserts are disabled in opt builds.
       assert(f == NULL || dynamic_cast<To>(f) != NULL);
       return static_cast<To>(f);
   }
   ```
   
   While we build release version, we define `NDEBUG` to turn off assert
   ```
   # For CMAKE_BUILD_TYPE=Release
   #   -O3: Enable all compiler optimizations
   #   -DNDEBUG: Turn off dchecks/asserts/debug only code.
   set(CXX_FLAGS_RELEASE "${CXX_GCC_FLAGS} -O3 -DNDEBUG")
   ```
   
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
[d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you 
chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to