BsoBird opened a new issue, #7431:
URL: https://github.com/apache/iceberg/issues/7431

   ### Query engine
   
   iceberg 1.2.1
   spark 3.3.2
   
   
   
   ### Question
   
   I created two tables with the same amount of data, one MOR table and one COW 
table, both with nearly 400 million data and a data size of 200G.
   I MERGE INTO 100 million data to each table. 
   COW table took 9 minutes to complete the execution. 
   MOR table is currently executing close to 30 minutes has not yet finished.
   
   Why is the MOR table so slow to execute MERGE?
   
   Here is my execution SQL:
   ```
   CREATE TABLE IF NOT EXISTS local.test.b_std_trade_4_iceberg_orc_zstd ( 
`uni_order_id` string, `data_from` bigint, `partner` string, `plat_code` 
string, `order_id` string, `uni_shop_id` string, `uni_id` string, `guide_id` 
string, `shop_id` string, `plat_account` string, `total_fee` double, 
`item_discount_fee` double, `trade_discount_fee` double, `adjust_fee` double, 
`post_fee` double, `discount_rate` double, `payment_no_postfee` double, 
`payment` double, `pay_time` string, `product_num` bigint, `order_status` 
string, `is_refund` string, `refund_fee` double, `insert_time` string, 
`created` string, `endtime` string, `modified` string, `trade_type` string, 
`receiver_name` string, `receiver_country` string, `receiver_state` string, 
`receiver_city` string, `receiver_district` string, `receiver_town` string, 
`receiver_address` string, `receiver_mobile` string, `trade_source` string, 
`delivery_type` string, `consign_time` string, `orders_num` bigint, 
`is_presale` bigint, `presale_status` 
 string, `first_fee_paytime` string, `last_fee_paytime` string, 
`first_paid_fee` double, `tenant` string, `tidb_modified` string, 
`step_paid_fee` double, `seller_flag` string, `is_used_store_card` BIGINT, 
`store_card_used` DOUBLE, `store_card_basic_used` DOUBLE, 
`store_card_expand_used` DOUBLE, `order_promotion_num` BIGINT, 
`item_promotion_num` BIGINT, `buyer_remark` string, `seller_remark` string, 
trade_business_type string )TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='tenant,shop_id,plat_code,order_status,payment,pay_time,created','kyuubi.zorder.enabled'='true','kyuubi.zorder.cols'='tenant,shop_id,plat_code,order_status,payment,pay_time,created','write.orc.compression-codec'='zstd')
 STORED AS iceberg;
   
   merge into local.test.b_std_trade_4_iceberg_orc_zstd t using( select 
`uni_order_id`, `data_from`, `partner`, `plat_code`, `order_id`, `uni_shop_id`, 
`uni_id`, `guide_id`, `shop_id`, `plat_account`, `total_fee`, 
`item_discount_fee`, `trade_discount_fee`, `adjust_fee`, `post_fee`, 
`discount_rate`, `payment_no_postfee`, `payment`, `pay_time`, `product_num`, 
`order_status`, `is_refund`, `refund_fee`, `insert_time`, `created`, `endtime`, 
`modified`, `trade_type`, `receiver_name`, `receiver_country`, 
`receiver_state`, `receiver_city`, `receiver_district`, `receiver_town`, 
`receiver_address`, `receiver_mobile`, `trade_source`, `delivery_type`, 
`consign_time`, `orders_num`, `is_presale`, `presale_status`, 
`first_fee_paytime`, `last_fee_paytime`, `first_paid_fee`, `tenant`, 
`tidb_modified`, `step_paid_fee`, `seller_flag`, `is_used_store_card`, 
`store_card_used`, `store_card_basic_used`, `store_card_expand_used`, 
`order_promotion_num`, `item_promotion_num`, `buyer_remark`, `seller_remark`, 
 trade_business_type from ( select `uni_order_id`, `data_from`, `partner`, 
`plat_code`, `order_id`, `uni_shop_id`, `uni_id`, `guide_id`, `shop_id`, 
`plat_account`, `total_fee`, `item_discount_fee`, `trade_discount_fee`, 
`adjust_fee`, `post_fee`, `discount_rate`, `payment_no_postfee`, `payment`, 
`pay_time`, `product_num`, `order_status`, `is_refund`, `refund_fee`, 
`insert_time`, `created`, `endtime`, `modified`, `trade_type`, `receiver_name`, 
`receiver_country`, `receiver_state`, `receiver_city`, `receiver_district`, 
`receiver_town`, `receiver_address`, `receiver_mobile`, `trade_source`, 
`delivery_type`, `consign_time`, `orders_num`, `is_presale`, `presale_status`, 
`first_fee_paytime`, `last_fee_paytime`, `first_paid_fee`, `tenant`, 
`tidb_modified`, `step_paid_fee`, `seller_flag`, `is_used_store_card`, 
`store_card_used`, `store_card_basic_used`, `store_card_expand_used`, 
`order_promotion_num`, `item_promotion_num`, `buyer_remark`, `seller_remark`, 
trade_business_type,row_number() over
 (partition by uni_order_id order by modified desc,dt desc) as rank from 
spark_catalog.dw_source.s_std_trade_tidb where dt>='2023-04-20' ) small where 
rank=1) s ON t.uni_order_id = s.uni_order_id and t.tenant = s.tenant and 
t.partner = s.partner  WHEN MATCHED AND s.modified>=t.modified then UPDATE SET 
t.uni_order_id = s.uni_order_id, t.data_from = s.data_from, t.partner = 
s.partner, t.plat_code = s.plat_code, t.order_id = s.order_id, t.uni_shop_id = 
s.uni_shop_id, t.uni_id = s.uni_id, t.guide_id = s.guide_id, t.shop_id = 
s.shop_id, t.plat_account = s.plat_account, t.total_fee = s.total_fee, 
t.item_discount_fee = s.item_discount_fee, t.trade_discount_fee = 
s.trade_discount_fee, t.adjust_fee = s.adjust_fee, t.post_fee = s.post_fee, 
t.discount_rate = s.discount_rate, t.payment_no_postfee = s.payment_no_postfee, 
t.payment = s.payment, t.pay_time = s.pay_time, t.product_num = s.product_num, 
t.order_status = s.order_status, t.is_refund = s.is_refund, t.refund_fee = 
s.refund_fee, t.insert_t
 ime = s.insert_time, t.created = s.created, t.endtime = s.endtime, t.modified 
= s.modified, t.trade_type = s.trade_type, t.receiver_name = s.receiver_name, 
t.receiver_country = s.receiver_country, t.receiver_state = s.receiver_state, 
t.receiver_city = s.receiver_city, t.receiver_district = s.receiver_district, 
t.receiver_town = s.receiver_town, t.receiver_address = s.receiver_address, 
t.receiver_mobile = s.receiver_mobile, t.trade_source = s.trade_source, 
t.delivery_type = s.delivery_type, t.consign_time = s.consign_time, 
t.orders_num = s.orders_num, t.is_presale = s.is_presale, t.presale_status = 
s.presale_status, t.first_fee_paytime = s.first_fee_paytime, t.last_fee_paytime 
= s.last_fee_paytime, t.first_paid_fee = s.first_paid_fee, t.tenant = s.tenant, 
t.tidb_modified = s.tidb_modified, t.step_paid_fee = s.step_paid_fee, 
t.seller_flag = s.seller_flag, t.is_used_store_card = s.is_used_store_card, 
t.store_card_used = s.store_card_used, t.store_card_basic_used = 
s.store_card_basic_us
 ed, t.store_card_expand_used = s.store_card_expand_used, t.order_promotion_num 
= s.order_promotion_num, t.item_promotion_num = s.item_promotion_num, 
t.buyer_remark = s.buyer_remark, t.seller_remark = s.seller_remark, 
t.trade_business_type = s.trade_business_type  WHEN NOT MATCHED THEN INSERT * ;
   --9 minutes to execute
   
   
   
   CREATE TABLE IF NOT EXISTS local.test.b_std_trade_4_iceberg_orc_zstd_mor ( 
`uni_order_id` string, `data_from` bigint, `partner` string, `plat_code` 
string, `order_id` string, `uni_shop_id` string, `uni_id` string, `guide_id` 
string, `shop_id` string, `plat_account` string, `total_fee` double, 
`item_discount_fee` double, `trade_discount_fee` double, `adjust_fee` double, 
`post_fee` double, `discount_rate` double, `payment_no_postfee` double, 
`payment` double, `pay_time` string, `product_num` bigint, `order_status` 
string, `is_refund` string, `refund_fee` double, `insert_time` string, 
`created` string, `endtime` string, `modified` string, `trade_type` string, 
`receiver_name` string, `receiver_country` string, `receiver_state` string, 
`receiver_city` string, `receiver_district` string, `receiver_town` string, 
`receiver_address` string, `receiver_mobile` string, `trade_source` string, 
`delivery_type` string, `consign_time` string, `orders_num` bigint, 
`is_presale` bigint, `presale_stat
 us` string, `first_fee_paytime` string, `last_fee_paytime` string, 
`first_paid_fee` double, `tenant` string, `tidb_modified` string, 
`step_paid_fee` double, `seller_flag` string, `is_used_store_card` BIGINT, 
`store_card_used` DOUBLE, `store_card_basic_used` DOUBLE, 
`store_card_expand_used` DOUBLE, `order_promotion_num` BIGINT, 
`item_promotion_num` BIGINT, `buyer_remark` string, `seller_remark` string, 
trade_business_type string )TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='tenant,shop_id,plat_code,order_status,payment,pay_time,created','kyuubi.zorder.enabled'='true','kyuubi.zorder.cols'='tenant,shop_id,plat_code,order_status,payment,pay_time,created','write.orc.compression-codec'='zstd','write.delete.mode'='merge-on-read','write.update.mode'='merge-on-read','write.merge.mode'='merge-on-read')
 STORED AS iceberg;
   
   merge into local.test.b_std_trade_4_iceberg_orc_zstd_mor t using( select 
`uni_order_id`, `data_from`, `partner`, `plat_code`, `order_id`, `uni_shop_id`, 
`uni_id`, `guide_id`, `shop_id`, `plat_account`, `total_fee`, 
`item_discount_fee`, `trade_discount_fee`, `adjust_fee`, `post_fee`, 
`discount_rate`, `payment_no_postfee`, `payment`, `pay_time`, `product_num`, 
`order_status`, `is_refund`, `refund_fee`, `insert_time`, `created`, `endtime`, 
`modified`, `trade_type`, `receiver_name`, `receiver_country`, 
`receiver_state`, `receiver_city`, `receiver_district`, `receiver_town`, 
`receiver_address`, `receiver_mobile`, `trade_source`, `delivery_type`, 
`consign_time`, `orders_num`, `is_presale`, `presale_status`, 
`first_fee_paytime`, `last_fee_paytime`, `first_paid_fee`, `tenant`, 
`tidb_modified`, `step_paid_fee`, `seller_flag`, `is_used_store_card`, 
`store_card_used`, `store_card_basic_used`, `store_card_expand_used`, 
`order_promotion_num`, `item_promotion_num`, `buyer_remark`, `seller_remar
 k`, trade_business_type from ( select `uni_order_id`, `data_from`, `partner`, 
`plat_code`, `order_id`, `uni_shop_id`, `uni_id`, `guide_id`, `shop_id`, 
`plat_account`, `total_fee`, `item_discount_fee`, `trade_discount_fee`, 
`adjust_fee`, `post_fee`, `discount_rate`, `payment_no_postfee`, `payment`, 
`pay_time`, `product_num`, `order_status`, `is_refund`, `refund_fee`, 
`insert_time`, `created`, `endtime`, `modified`, `trade_type`, `receiver_name`, 
`receiver_country`, `receiver_state`, `receiver_city`, `receiver_district`, 
`receiver_town`, `receiver_address`, `receiver_mobile`, `trade_source`, 
`delivery_type`, `consign_time`, `orders_num`, `is_presale`, `presale_status`, 
`first_fee_paytime`, `last_fee_paytime`, `first_paid_fee`, `tenant`, 
`tidb_modified`, `step_paid_fee`, `seller_flag`, `is_used_store_card`, 
`store_card_used`, `store_card_basic_used`, `store_card_expand_used`, 
`order_promotion_num`, `item_promotion_num`, `buyer_remark`, `seller_remark`, 
trade_business_type,row_number() 
 over(partition by uni_order_id order by modified desc,dt desc) as rank from 
spark_catalog.dw_source.s_std_trade_tidb where dt>='2023-04-20' ) small where 
rank=1) s ON t.uni_order_id = s.uni_order_id and t.tenant = s.tenant and 
t.partner = s.partner  WHEN MATCHED AND s.modified>=t.modified then UPDATE SET 
t.uni_order_id = s.uni_order_id, t.data_from = s.data_from, t.partner = 
s.partner, t.plat_code = s.plat_code, t.order_id = s.order_id, t.uni_shop_id = 
s.uni_shop_id, t.uni_id = s.uni_id, t.guide_id = s.guide_id, t.shop_id = 
s.shop_id, t.plat_account = s.plat_account, t.total_fee = s.total_fee, 
t.item_discount_fee = s.item_discount_fee, t.trade_discount_fee = 
s.trade_discount_fee, t.adjust_fee = s.adjust_fee, t.post_fee = s.post_fee, 
t.discount_rate = s.discount_rate, t.payment_no_postfee = s.payment_no_postfee, 
t.payment = s.payment, t.pay_time = s.pay_time, t.product_num = s.product_num, 
t.order_status = s.order_status, t.is_refund = s.is_refund, t.refund_fee = 
s.refund_fee, t.inse
 rt_time = s.insert_time, t.created = s.created, t.endtime = s.endtime, 
t.modified = s.modified, t.trade_type = s.trade_type, t.receiver_name = 
s.receiver_name, t.receiver_country = s.receiver_country, t.receiver_state = 
s.receiver_state, t.receiver_city = s.receiver_city, t.receiver_district = 
s.receiver_district, t.receiver_town = s.receiver_town, t.receiver_address = 
s.receiver_address, t.receiver_mobile = s.receiver_mobile, t.trade_source = 
s.trade_source, t.delivery_type = s.delivery_type, t.consign_time = 
s.consign_time, t.orders_num = s.orders_num, t.is_presale = s.is_presale, 
t.presale_status = s.presale_status, t.first_fee_paytime = s.first_fee_paytime, 
t.last_fee_paytime = s.last_fee_paytime, t.first_paid_fee = s.first_paid_fee, 
t.tenant = s.tenant, t.tidb_modified = s.tidb_modified, t.step_paid_fee = 
s.step_paid_fee, t.seller_flag = s.seller_flag, t.is_used_store_card = 
s.is_used_store_card, t.store_card_used = s.store_card_used, 
t.store_card_basic_used = s.store_card_basi
 c_used, t.store_card_expand_used = s.store_card_expand_used, 
t.order_promotion_num = s.order_promotion_num, t.item_promotion_num = 
s.item_promotion_num, t.buyer_remark = s.buyer_remark, t.seller_remark = 
s.seller_remark, t.trade_business_type = s.trade_business_type  WHEN NOT 
MATCHED THEN INSERT * ;
   --30 minutes and still not executed
   ```
   
   This is a screenshot of the current execution status of the MOR table:
   
![image](https://user-images.githubusercontent.com/30249475/234352878-13ed5257-5146-42de-880e-1cc5f6a5b819.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to