zicat opened a new issue, #51629: URL: https://github.com/apache/doris/issues/51629
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues. ### Version 3.0.5 ### What's Wrong? **表属性** 创建一张表,该表参数如下: PROPERTIES ( "replication_allocation" = "tag.location.default: 3", "min_load_replica_num" = "-1", "is_being_synced" = "false", "storage_medium" = "hdd", "storage_format" = "V2", "inverted_index_storage_format" = "V2", "light_schema_change" = "true", "disable_auto_compaction" = "false", "enable_single_replica_compaction" = "true", "group_commit_interval_ms" = "3000", "group_commit_data_bytes" = "134217728" ); **客户端导入逻辑** 先创建临时分区,通过stream load往临时分区导入数据,导入完成后用replace切换上线。 **be情况** 一共有5台be,其中有一台be,一旦作为主tablet时就会让客户端超时。具体日志如下: ` I20250611 03:16:02.752750 27040 tablets_channel.cpp:136] open tablets channel (load_id=a34d68fa8faf8fbb-f1967b76bf5a23b1, index_id=1748587725613), tablets num: 1 timeout(s): 259200 I20250611 03:16:02.753113 27040 tablets_channel.cpp:165] txn 105113269: TabletsChannel of index 1748587725613 init senders 1 with incremental off I20250611 03:16:03.517274 26904 tablets_channel.cpp:284] txn 105113269: close tablets channel of index 1748587725613 , sender id: 0, backend 10006, remain senders: 0 I20250611 03:16:03.817480 32412 segment_creator.cpp:255] tablet_id:1748587730932, flushing rowset_dir: /data/11/doris/data/20/1748587730932/1153520585, rowset_id:02000000000c56b88d46611643fd7ac0a772eea7bd2bd196, data size:4856612, index size:0 ` 而如果选中其他be,而有问题的be作为replica,则客户端能正常返回,其他be的日志如下 ` I20250611 03:32:11.085319 35967 tablets_channel.cpp:136] open tablets channel (load_id=06447c6bc8f0aff5-862487659a6a21b4, index_id=1748587725613), tablets num: 1 timeout(s): 259200 I20250611 03:32:11.085731 35967 tablets_channel.cpp:165] txn 105121670: TabletsChannel of index 1748587725613 init senders 1 with incremental off I20250611 03:32:11.797734 35762 tablets_channel.cpp:284] txn 105121670: close tablets channel of index 1748587725613 , sender id: 0, backend 1747300745514, remain senders: 0 I20250611 03:32:12.250734 37529 segment_creator.cpp:255] tablet_id:1748587739401, flushing rowset_dir: /data/2/doris/data/64/1748587739401/1153520585, rowset_id:02000000001b8c050e437662a982774728933275c8445080, data size:4853112, index size:0 I20250611 03:32:12.453536 35762 load_channel.cpp:244] txn 105121670 closed tablets_channel 1748587725613 I20250611 03:32:12.453783 35762 load_channel.cpp:86] load channel removed load_id=06447c6bc8f0aff5-862487659a6a21b4, is high priority=0, sender_ip=10.11.15.152, index id: 1748587725613, total_received_rows: 61843, num_rows_filtered: 0 I20250611 03:32:12.460809 35589 engine_publish_version_task.cpp:440] publish version successfully on tablet, table_id=1748587725612, tablet=1748587739401, transaction_id=105121670, version=2, num_rows=61843, res=[OK], cost: 298(us) ` **两者差异** I20250611 03:32:12.453536 35762 load_channel.cpp:244] txn xxxxx closed tablets_channel xxxxxxxxxx 上述日志在有问题的节点上没有打印,应该是在代码前有失败,但是查看warning日志,没有任何发现。 **BE之间差异** BE之间的配置,运行的代码都是一致的。 **其他** 该集群最早使用的是官方的二进制包运行,后来发现一个bug后通过修改源码编译了自己的版本,但是该版本会crash,所以又回滚会官方的版本,之后就出现了该现象。 另外该现象在新创建的表中也会出现。 ### What You Expected? 能帮忙看下如何进一步排查问题 ### How to Reproduce? _No response_ ### Anything Else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
