mongo360 opened a new pull request, #23679: URL: https://github.com/apache/doris/pull/23679
Problem: the stream load record result of _show stream load_ command is wrong some times when one stream load transaction use the label same with the last abort transaction Example: 1. create test table > CREATE TABLE `test_table` ( `id` bigint(20) NOT NULL, `create_day` date NOT NULL, `line` bigint(20) SUM NULL DEFAULT "0" ) ENGINE = OLAP AGGREGATE KEY(`id`, `create_day`) COMMENT 'olap' PARTITION BY RANGE(`create_day`) ( PARTITION pbefroe202308 VALUES [('0000-01-01'), ('2023-08-01')), PARTITION p202308 VALUES [('2023-08-01'), ('2023-09-01')), PARTITION p202309 VALUES [('2023-09-01'), ('2023-10-01')), PARTITION p202310 VALUES [('2023-10-01'), ('2023-11-01'))) DISTRIBUTED BY HASH(`id`) BUCKETS 16 PROPERTIES ( "replication_allocation" = "tag.location.default: 2", "dynamic_partition.enable" = "true", "dynamic_partition.time_unit" = "month", "dynamic_partition.time_zone" = "Asia/Shanghai", "dynamic_partition.start" = "-2147483648", "dynamic_partition.end" = "5", "dynamic_partition.prefix" = "p", "dynamic_partition.replication_allocation" = "tag.location.default: 2", "dynamic_partition.buckets" = "64", "dynamic_partition.create_history_partition" = "false", "dynamic_partition.history_ partition_num" = "-1", "dynamic_partition.hot_partition_num" = "0", "dynamic_partition.reserved_history_periods" = "NULL", "dynamic_partition.storage_policy" = "", "dynamic_partition.storage_medium" = "HDD", "dynamic_partition.start_day_of_month" = "1", "in_memory" = "false", "storage_format" = "V2", "disable_auto_compaction" = "false" ); 2. create empty csv file for abort transaction > /tmp/abort.txt 3. create csv file for commit transaction > /tmp/commit.txt 3000,2023-08-10,1 3001,2023-08-10,2 3002,2023-08-10,3 4. create first 2pc stream load transaction and abort it > curl --location-trusted -u admin:pwd -H "label:test_label_01" -H "column_separator:," -H "format:csv" -H "two_phase_commit:true" -T /tmp/abort.txt http://127.0.0.1:8030/api/db/test_table/_stream_load" curl -X PUT --location-trusted -u admin:pwd -H "txn_id:1001" -H "txn_operation:abort" http://127.0.0.1:8030/api/db/_stream_load_2pc" 5. create second 2pc stream load txn with same label of the first txn and commit it > curl --location-trusted -u admin:pwd -H "label:test_label_01" -H "column_separator:," -H "format:csv" -H "two_phase_commit:true" -T /tmp/commit.txt http://127.0.0.1:8030/api/db/test_table/_stream_load" curl -X PUT --location-trusted -u admin:pwd -H "txn_id:1002" -H "txn_operation:commit" http://127.0.0.1:8030/api/db/_stream_load_2pc" 6. show stream load > +---------------------------------+------+------------------------------------+-------+-------------+---------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------------+--------------+----------------+-----------+-------------------------+-------------------------+ | Label | Db | Table | User | ClientIp | Status | Message | Url | TotalRows | LoadedRows | FilteredRows | UnselectedRows | LoadBytes | StartTime | FinishTime | +---------------------------------+------+------------------------------------+-------+-------------+---------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------------+--------------+----------------+-----------+-------------------------+-------------------------+ | test_label_01 | db | test_table | admin | 11.10.0.1 | Success | OK | N/A | 0 | 0 | 0 | 0 | 0 | 2023-08-29 10:00:00.617 | 2023-08-29 10:00:00.817 | the result is the first abort transaction, we really want the result is the second valid transaction; Reason: The aborted label can be reused, in function DatabaseTransactionMgr::beginTransaction but when get stream load record from be in StreamLoadRecordMgr::runAfterCatalogReady we will get two stream load record with the two transaction with same label. but when add record with function StreamLoadRecordMgr::addStreamLoadRecord just use label to check if the label exists already. > if (!labelToStreamLoadRecord.containsKey(label)) { labelToStreamLoadRecord.put(label, streamLoadRecord); } so if the record of the first abort transaction is add into the map,the record of the second transaction with be give up Solved: when the label exists already, check if the finish time of current transaction is large than the transaction exist in the map. replace it with current transaction record. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org