mongo360 opened a new pull request, #23679:
URL: https://github.com/apache/doris/pull/23679

   Problem:
   the stream load record result of _show stream load_ command is wrong some 
times when one stream load transaction use the label same with the last abort 
transaction
   
   Example:
   1. create test table
   > CREATE TABLE `test_table` ( `id` bigint(20) NOT NULL, `create_day` date 
NOT NULL, `line` bigint(20) SUM NULL DEFAULT "0" ) ENGINE = OLAP AGGREGATE 
KEY(`id`, `create_day`) COMMENT 'olap' PARTITION BY RANGE(`create_day`) ( 
PARTITION pbefroe202308 VALUES [('0000-01-01'), ('2023-08-01')), PARTITION 
p202308 VALUES [('2023-08-01'), ('2023-09-01')), PARTITION p202309 VALUES 
[('2023-09-01'), ('2023-10-01')), PARTITION p202310 VALUES [('2023-10-01'), 
('2023-11-01'))) DISTRIBUTED BY HASH(`id`) BUCKETS 16 PROPERTIES ( 
"replication_allocation" = "tag.location.default: 2", 
"dynamic_partition.enable" = "true", "dynamic_partition.time_unit" = "month", 
"dynamic_partition.time_zone" = "Asia/Shanghai", "dynamic_partition.start" = 
"-2147483648", "dynamic_partition.end" = "5", "dynamic_partition.prefix" = "p", 
"dynamic_partition.replication_allocation" = "tag.location.default: 2", 
"dynamic_partition.buckets" = "64", 
"dynamic_partition.create_history_partition" = "false", 
"dynamic_partition.history_
 partition_num" = "-1", "dynamic_partition.hot_partition_num" = "0", 
"dynamic_partition.reserved_history_periods" = "NULL", 
"dynamic_partition.storage_policy" = "", "dynamic_partition.storage_medium" = 
"HDD", "dynamic_partition.start_day_of_month" = "1", "in_memory" = "false", 
"storage_format" = "V2", "disable_auto_compaction" = "false" );
   2. create empty csv file for abort transaction
   > /tmp/abort.txt
   3. create csv file for commit transaction
   > /tmp/commit.txt
   3000,2023-08-10,1
   3001,2023-08-10,2
   3002,2023-08-10,3
   4. create first 2pc stream load transaction and abort it
   > curl --location-trusted -u admin:pwd -H "label:test_label_01" -H 
"column_separator:," -H "format:csv" -H "two_phase_commit:true" -T 
/tmp/abort.txt http://127.0.0.1:8030/api/db/test_table/_stream_load";
   curl -X PUT --location-trusted -u admin:pwd -H "txn_id:1001" -H 
"txn_operation:abort" http://127.0.0.1:8030/api/db/_stream_load_2pc";
   5. create second 2pc stream load txn with same label of the first txn and 
commit it
   > curl --location-trusted -u admin:pwd -H "label:test_label_01" -H 
"column_separator:," -H "format:csv" -H "two_phase_commit:true" -T 
/tmp/commit.txt http://127.0.0.1:8030/api/db/test_table/_stream_load";
   curl -X PUT --location-trusted -u admin:pwd -H "txn_id:1002" -H 
"txn_operation:commit" http://127.0.0.1:8030/api/db/_stream_load_2pc";
   6. show stream load 
   > 
+---------------------------------+------+------------------------------------+-------+-------------+---------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------------+--------------+----------------+-----------+-------------------------+-------------------------+
   | Label                           | Db   | Table                             
 | User  | ClientIp    | Status  | Message | Url                                
                                                                                
                                   | TotalRows | LoadedRows | FilteredRows | 
UnselectedRows | LoadBytes | StartTime               | FinishTime              |
   
+---------------------------------+------+------------------------------------+-------+-------------+---------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------------+--------------+----------------+-----------+-------------------------+-------------------------+
   | test_label_01 | db   | test_table | admin | 11.10.0.1 | Success | OK      
| N/A                                                                           
                                                                        | 0     
    | 0          | 0            | 0              | 0         | 2023-08-29 
10:00:00.617 | 2023-08-29 10:00:00.817  |
   
   the result is the first abort transaction, we really want the result is the 
second valid transaction; 
   
   Reason:
   The aborted label can be reused, in function 
DatabaseTransactionMgr::beginTransaction
   but when get stream load record from be in 
StreamLoadRecordMgr::runAfterCatalogReady we will get two stream load record 
with the two transaction with same label. but when add record with function 
StreamLoadRecordMgr::addStreamLoadRecord just use label to check if the label 
exists already.
   > if (!labelToStreamLoadRecord.containsKey(label)) {
               labelToStreamLoadRecord.put(label, streamLoadRecord);
           }
   so if the record of the first abort transaction is add into the map,the 
record of the second transaction with be give up
   
   Solved:
   when the label exists already, check if the finish time of current 
transaction is large than the  transaction exist in the map. replace it with 
current transaction record.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to