xujunwei916 opened a new issue, #33819:
URL: https://github.com/apache/doris/issues/33819

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Version
   
   2.1.2
   
   ### What's Wrong?
   
   使用外部catalog查询hive数据插入到doris时,会有一定概率导致BE宕机,数据量越大宕机的几率就越高。
   ## 环境说明
   doris 有三个节点物理机,机器配置是64核512G的机器
   hive 使用的时CDH 6.3.2版本hive,hive版本是Hive 2.1.1-cdh6.3.2,并开启了kerbores认证
   
   ## 数据量说明
   hive表一个日期分区数据是1亿6千万,对应有有1300个左右的子分区,数据存储大约在25G左右,
   当我把doris的bucket设置成6个的时候很容易导致be宕机,后修改成24之后是有一定几率宕机,几率小了很多
   
   ## 执行过程
   1. 创建catalog
   ```sql   
    'type'='hms',
       'hive.metastore.uris' = 'thrift://**:9083',
       'hive.metastore.sasl.enabled' = 'true',
       'hive.metastore.kerberos.principal' = 'hive/_HOST@**',
       'dfs.nameservices'='nameservice1',
       'dfs.ha.namenodes.nameservice1'='namenode152,namenode154',
       'dfs.namenode.rpc-address.nameservice1.namenode152'='**:8020',
       'dfs.namenode.rpc-address.nameservice1.namenode154'='**:8020',
       
'dfs.client.failover.proxy.provider.nameservice1'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
       'hadoop.security.authentication' = 'kerberos',
       'hadoop.kerberos.keytab' = '/opt/apache-doris/keytab/**.keytab',   
       'hadoop.kerberos.principal' = '**@**,
       'yarn.resourcemanager.principal' = 'yarn/_HOST@**' ,
       'file.meta.cache.ttl-second' = '0',
       'hive.version' = '2.1.1'
   );
   ```
   
   2. 创建doris表,且对应hive有一张相同的表,只是字段顺序不一致:
   ```sql
   create table if not exists ${DB}.`xx_user`
   (
       `dt`                                int             NOT NULL
      ,`tenant_code`                       VARCHAR(32)     NOT NULL
      ,`project_id`                        VARCHAR(50)     NOT NULL
      ,`user_id`                           VARCHAR(512)    NOT NULL
      ,`event_scene`                       VARCHAR(10)     NOT NULL
      -- ... 这里有180左右的字段
      ,`xx`                     STRING
   )
   UNIQUE KEY(
        `dt`
       ,`tenant_code`
       ,`project_id`
       ,`user_id`
       ,`event_scene`
    )
   PARTITION BY LIST(`dt`) (
      PARTITION `p19700101` VALUES IN (19700101)
   )
   DISTRIBUTED BY HASH(`user_id`) BUCKETS 24
   PROPERTIES (
      "replication_allocation" = "tag.location.default: 2"
     ,"enable_unique_key_merge_on_write" = "true"
     ,"colocate_with" = "colocate_with_group_24"
   );
   ```
   
   3. 执行sql插入到doris(已经提前添加分区p20230703)
   ```sql
   INSERT INTO ${DB}.`xx_user` partition (p20230703)
   SELECT
       `dt`,
       `tenant_code`,
       `project_id`,
       `user_id`,
       `event_scene`,
       ...
   FROM hive.${DB}.`xx_user`
   where dt='20230703'
   ```
   ## 错误日志
   
   be.out
   ```text
   *** Query id: c3b5c1bdf7e34938-882f8be86b1f367c ***
   *** is nereids: 1 ***
   *** tablet id: 0 ***
   *** Aborted at 1713409234 (unix time) try "date -d @1713409234" if you are 
using GNU date ***
   *** Current BE git commitID: b130df2488 ***
   *** SIGSEGV unknown detail explain (@0x0) received by PID 60981 (TID 57603 
OR 0x7f74a87f6700) from PID 0; stack trace: ***
    0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, 
siginfo_t*, void*) at 
/home/zcp/repo_center/doris_release/doris/be/src/common/signal_handler.h:421
    1# os::Linux::chained_handler(int, siginfo_t*, void*) in 
/usr/lib/jvm/java-1.8.0-openjdk/jre/lib/amd64/server/libjvm.so
    2# JVM_handle_linux_signal in 
/usr/lib/jvm/java-1.8.0-openjdk/jre/lib/amd64/server/libjvm.so
    3# signalHandler(int, siginfo_t*, void*) in 
/usr/lib/jvm/java-1.8.0-openjdk/jre/lib/amd64/server/libjvm.so
    4# 0x00007F95F7803400 in /lib64/libc.so.6
    5# doris::RuntimeState::is_cancelled() const at 
/home/zcp/repo_center/doris_release/doris/be/src/runtime/runtime_state.cpp:360
    6# doris::DeltaWriterV2::write(doris::vectorized::Block const*, 
std::vector<unsigned int, std::allocator<unsigned int> > const&, bool) at 
/home/zcp/repo_center/doris_release/doris/be/src/olap/delta_writer_v2.cpp:164
    7# 
doris::vectorized::VTabletWriterV2::_write_memtable(std::shared_ptr<doris::vectorized::Block>,
 long, doris::vectorized::Rows const&, 
std::vector<std::shared_ptr<doris::LoadStreamStub>, 
std::allocator<std::shared_ptr<doris::LoadStreamStub> > > const&) at 
/home/zcp/repo_center/doris_release/doris/be/src/vec/sink/writer/vtablet_writer_v2.cpp:461
    8# doris::vectorized::VTabletWriterV2::write(doris::vectorized::Block&) at 
/home/zcp/repo_center/doris_release/doris/be/src/vec/sink/writer/vtablet_writer_v2.cpp:413
    9# 
_ZN5doris10vectorized15AsyncWriterSinkINS0_15VTabletWriterV2EXadsoKcL_ZNS0_19VOLAP_TABLE_SINK_V2EEEEE4sendEPNS_12RuntimeStateEPNS0_5BlockEb
 at 
/home/zcp/repo_center/doris_release/doris/be/src/vec/sink/async_writer_sink.h:89
   10# doris::PlanFragmentExecutor::open_vectorized_internal() at 
/home/zcp/repo_center/doris_release/doris/be/src/runtime/plan_fragment_executor.cpp:342
   11# doris::PlanFragmentExecutor::open() at 
/home/zcp/repo_center/doris_release/doris/be/src/runtime/plan_fragment_executor.cpp:274
   12# doris::PlanFragmentExecutor::execute() at 
/home/zcp/repo_center/doris_release/doris/be/src/runtime/plan_fragment_executor.cpp:405
   13# 
doris::FragmentMgr::_exec_actual(std::shared_ptr<doris::PlanFragmentExecutor>, 
std::function<void (doris::RuntimeState*, doris::Status*)> const&) at 
/home/zcp/repo_center/doris_release/doris/be/src/runtime/fragment_mgr.cpp:459
   14# std::_Function_handler<void (), 
doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&, 
std::function<void (doris::RuntimeState*, doris::Status*)> 
const&)::$_0>::_M_invoke(std::_Any_data const&) at 
/var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
   15# doris::ThreadPool::dispatch_thread() in /opt/apache-doris/be/lib/doris_be
   16# doris::Thread::supervise_thread(void*) at 
/home/zcp/repo_center/doris_release/doris/be/src/util/thread.cpp:499
   17# start_thread in /lib64/libpthread.so.0
   18# clone in /lib64/libc.so.6
   ```
   be.WARNING
   
   ```text
   W20240418 10:49:42.835407 64553 status.h:380] meet error status: 
[INTERNAL_ERROR]Couldn't deserialize thrift msg:
   No more data to read.
   
           0#  doris::Status 
doris::deserialize_thrift_msg<tparquet::PageHeader>(unsigned char const*, 
unsigned int*, bool, tparquet::PageHeader*) at 
/home/zcp/repo_center/doris_release/doris/be/src/util/thrift_util.h:151
           1#  doris::vectorized::PageReader::next_page_header() at 
/home/zcp/repo_center/doris_release/doris/be/src/common/status.h:449
           2#  doris::vectorized::ColumnChunkReader::next_page() at 
/home/zcp/repo_center/doris_release/doris/be/src/common/status.h:449
           3#  
doris::vectorized::ScalarColumnReader::read_column_data(COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>&,
 std::shared_ptr<doris::vectorized::IDataType const>&, 
doris::vectorized::ColumnSelectVector&, unsigned long, unsigned long*, bool*, 
bool) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:449
           4#  
doris::vectorized::RowGroupReader::_read_column_data(doris::vectorized::Block*, 
std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > > > const&, unsigned long, 
unsigned long*, bool*, doris::vectorized::ColumnSelectVector&) at 
/home/zcp/repo_center/doris_release/doris/be/src/vec/exec/format/parquet/vparquet_group_reader.cpp:414
           5#  
doris::vectorized::RowGroupReader::next_batch(doris::vectorized::Block*, 
unsigned long, unsigned long*, bool*) at 
/home/zcp/repo_center/doris_release/doris/be/src/vec/exec/format/parquet/vparquet_group_reader.cpp:314
           6#  
doris::vectorized::ParquetReader::get_next_block(doris::vectorized::Block*, 
unsigned long*, bool*) at 
/home/zcp/repo_center/doris_release/doris/be/src/common/status.h:444
           7#  
doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState*, 
doris::vectorized::Block*, bool*) at 
/home/zcp/repo_center/doris_release/doris/be/src/common/status.h:449
           8#  doris::vectorized::VScanner::get_block(doris::RuntimeState*, 
doris::vectorized::Block*, bool*) at 
/home/zcp/repo_center/doris_release/doris/be/src/vec/exec/scan/vscanner.cpp:0
           9#  
doris::vectorized::VScanner::get_block_after_projects(doris::RuntimeState*, 
doris::vectorized::Block*, bool*) at 
/home/zcp/repo_center/doris_release/doris/be/src/common/status.h:449
           10# 
doris::vectorized::ScannerScheduler::_scanner_scan(std::shared_ptr<doris::vectorized::ScannerContext>,
 std::shared_ptr<doris::vectorized::ScanTask>) at 
/home/zcp/repo_center/doris_release/doris/be/src/common/status.h:345
           11# std::_Function_handler<void (), 
doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>,
 std::shared_ptr<doris::vectorized::ScanTask>)::$_3>::_M_invoke(std::_Any_data 
const&) at 
/var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701
           12# doris::ThreadPool::dispatch_thread() at 
/home/zcp/repo_center/doris_release/doris/be/src/util/threadpool.cpp:0
           13# doris::Thread::supervise_thread(void*) at 
/var/local/ldb_toolchain/bin/../usr/include/pthread.h:562
           14# start_thread
           15# clone
   ```
   
   
   ### What You Expected?
   
   如果是因为数据量的问题,应该是sql失败而不应该doris的be宕机。
   
   ### How to Reproduce?
   
   _No response_
   
   ### Anything Else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to