BsoBird opened a new issue, #7431:
URL: https://github.com/apache/iceberg/issues/7431
### Query engine
iceberg 1.2.1
spark 3.3.2
### Question
I created two tables with the same amount of data, one MOR table and one COW
table, both with nearly 400 million data and a data size of 200G.
I MERGE INTO 100 million data to each table.
COW table took 9 minutes to complete the execution.
MOR table is currently executing close to 30 minutes has not yet finished.
Why is the MOR table so slow to execute MERGE?
Here is my execution SQL:
```
CREATE TABLE IF NOT EXISTS local.test.b_std_trade_4_iceberg_orc_zstd (
`uni_order_id` string, `data_from` bigint, `partner` string, `plat_code`
string, `order_id` string, `uni_shop_id` string, `uni_id` string, `guide_id`
string, `shop_id` string, `plat_account` string, `total_fee` double,
`item_discount_fee` double, `trade_discount_fee` double, `adjust_fee` double,
`post_fee` double, `discount_rate` double, `payment_no_postfee` double,
`payment` double, `pay_time` string, `product_num` bigint, `order_status`
string, `is_refund` string, `refund_fee` double, `insert_time` string,
`created` string, `endtime` string, `modified` string, `trade_type` string,
`receiver_name` string, `receiver_country` string, `receiver_state` string,
`receiver_city` string, `receiver_district` string, `receiver_town` string,
`receiver_address` string, `receiver_mobile` string, `trade_source` string,
`delivery_type` string, `consign_time` string, `orders_num` bigint,
`is_presale` bigint, `presale_status`
string, `first_fee_paytime` string, `last_fee_paytime` string,
`first_paid_fee` double, `tenant` string, `tidb_modified` string,
`step_paid_fee` double, `seller_flag` string, `is_used_store_card` BIGINT,
`store_card_used` DOUBLE, `store_card_basic_used` DOUBLE,
`store_card_expand_used` DOUBLE, `order_promotion_num` BIGINT,
`item_promotion_num` BIGINT, `buyer_remark` string, `seller_remark` string,
trade_business_type string )TBLPROPERTIES
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='tenant,shop_id,plat_code,order_status,payment,pay_time,created','kyuubi.zorder.enabled'='true','kyuubi.zorder.cols'='tenant,shop_id,plat_code,order_status,payment,pay_time,created','write.orc.compression-codec'='zstd')
STORED AS iceberg;
merge into local.test.b_std_trade_4_iceberg_orc_zstd t using( select
`uni_order_id`, `data_from`, `partner`, `plat_code`, `order_id`, `uni_shop_id`,
`uni_id`, `guide_id`, `shop_id`, `plat_account`, `total_fee`,
`item_discount_fee`, `trade_discount_fee`, `adjust_fee`, `post_fee`,
`discount_rate`, `payment_no_postfee`, `payment`, `pay_time`, `product_num`,
`order_status`, `is_refund`, `refund_fee`, `insert_time`, `created`, `endtime`,
`modified`, `trade_type`, `receiver_name`, `receiver_country`,
`receiver_state`, `receiver_city`, `receiver_district`, `receiver_town`,
`receiver_address`, `receiver_mobile`, `trade_source`, `delivery_type`,
`consign_time`, `orders_num`, `is_presale`, `presale_status`,
`first_fee_paytime`, `last_fee_paytime`, `first_paid_fee`, `tenant`,
`tidb_modified`, `step_paid_fee`, `seller_flag`, `is_used_store_card`,
`store_card_used`, `store_card_basic_used`, `store_card_expand_used`,
`order_promotion_num`, `item_promotion_num`, `buyer_remark`, `seller_remark`,
trade_business_type from ( select `uni_order_id`, `data_from`, `partner`,
`plat_code`, `order_id`, `uni_shop_id`, `uni_id`, `guide_id`, `shop_id`,
`plat_account`, `total_fee`, `item_discount_fee`, `trade_discount_fee`,
`adjust_fee`, `post_fee`, `discount_rate`, `payment_no_postfee`, `payment`,
`pay_time`, `product_num`, `order_status`, `is_refund`, `refund_fee`,
`insert_time`, `created`, `endtime`, `modified`, `trade_type`, `receiver_name`,
`receiver_country`, `receiver_state`, `receiver_city`, `receiver_district`,
`receiver_town`, `receiver_address`, `receiver_mobile`, `trade_source`,
`delivery_type`, `consign_time`, `orders_num`, `is_presale`, `presale_status`,
`first_fee_paytime`, `last_fee_paytime`, `first_paid_fee`, `tenant`,
`tidb_modified`, `step_paid_fee`, `seller_flag`, `is_used_store_card`,
`store_card_used`, `store_card_basic_used`, `store_card_expand_used`,
`order_promotion_num`, `item_promotion_num`, `buyer_remark`, `seller_remark`,
trade_business_type,row_number() over
(partition by uni_order_id order by modified desc,dt desc) as rank from
spark_catalog.dw_source.s_std_trade_tidb where dt>='2023-04-20' ) small where
rank=1) s ON t.uni_order_id = s.uni_order_id and t.tenant = s.tenant and
t.partner = s.partner WHEN MATCHED AND s.modified>=t.modified then UPDATE SET
t.uni_order_id = s.uni_order_id, t.data_from = s.data_from, t.partner =
s.partner, t.plat_code = s.plat_code, t.order_id = s.order_id, t.uni_shop_id =
s.uni_shop_id, t.uni_id = s.uni_id, t.guide_id = s.guide_id, t.shop_id =
s.shop_id, t.plat_account = s.plat_account, t.total_fee = s.total_fee,
t.item_discount_fee = s.item_discount_fee, t.trade_discount_fee =
s.trade_discount_fee, t.adjust_fee = s.adjust_fee, t.post_fee = s.post_fee,
t.discount_rate = s.discount_rate, t.payment_no_postfee = s.payment_no_postfee,
t.payment = s.payment, t.pay_time = s.pay_time, t.product_num = s.product_num,
t.order_status = s.order_status, t.is_refund = s.is_refund, t.refund_fee =
s.refund_fee, t.insert_t
ime = s.insert_time, t.created = s.created, t.endtime = s.endtime, t.modified
= s.modified, t.trade_type = s.trade_type, t.receiver_name = s.receiver_name,
t.receiver_country = s.receiver_country, t.receiver_state = s.receiver_state,
t.receiver_city = s.receiver_city, t.receiver_district = s.receiver_district,
t.receiver_town = s.receiver_town, t.receiver_address = s.receiver_address,
t.receiver_mobile = s.receiver_mobile, t.trade_source = s.trade_source,
t.delivery_type = s.delivery_type, t.consign_time = s.consign_time,
t.orders_num = s.orders_num, t.is_presale = s.is_presale, t.presale_status =
s.presale_status, t.first_fee_paytime = s.first_fee_paytime, t.last_fee_paytime
= s.last_fee_paytime, t.first_paid_fee = s.first_paid_fee, t.tenant = s.tenant,
t.tidb_modified = s.tidb_modified, t.step_paid_fee = s.step_paid_fee,
t.seller_flag = s.seller_flag, t.is_used_store_card = s.is_used_store_card,
t.store_card_used = s.store_card_used, t.store_card_basic_used =
s.store_card_basic_us
ed, t.store_card_expand_used = s.store_card_expand_used, t.order_promotion_num
= s.order_promotion_num, t.item_promotion_num = s.item_promotion_num,
t.buyer_remark = s.buyer_remark, t.seller_remark = s.seller_remark,
t.trade_business_type = s.trade_business_type WHEN NOT MATCHED THEN INSERT * ;
--9 minutes to execute
CREATE TABLE IF NOT EXISTS local.test.b_std_trade_4_iceberg_orc_zstd_mor (
`uni_order_id` string, `data_from` bigint, `partner` string, `plat_code`
string, `order_id` string, `uni_shop_id` string, `uni_id` string, `guide_id`
string, `shop_id` string, `plat_account` string, `total_fee` double,
`item_discount_fee` double, `trade_discount_fee` double, `adjust_fee` double,
`post_fee` double, `discount_rate` double, `payment_no_postfee` double,
`payment` double, `pay_time` string, `product_num` bigint, `order_status`
string, `is_refund` string, `refund_fee` double, `insert_time` string,
`created` string, `endtime` string, `modified` string, `trade_type` string,
`receiver_name` string, `receiver_country` string, `receiver_state` string,
`receiver_city` string, `receiver_district` string, `receiver_town` string,
`receiver_address` string, `receiver_mobile` string, `trade_source` string,
`delivery_type` string, `consign_time` string, `orders_num` bigint,
`is_presale` bigint, `presale_stat
us` string, `first_fee_paytime` string, `last_fee_paytime` string,
`first_paid_fee` double, `tenant` string, `tidb_modified` string,
`step_paid_fee` double, `seller_flag` string, `is_used_store_card` BIGINT,
`store_card_used` DOUBLE, `store_card_basic_used` DOUBLE,
`store_card_expand_used` DOUBLE, `order_promotion_num` BIGINT,
`item_promotion_num` BIGINT, `buyer_remark` string, `seller_remark` string,
trade_business_type string )TBLPROPERTIES
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='tenant,shop_id,plat_code,order_status,payment,pay_time,created','kyuubi.zorder.enabled'='true','kyuubi.zorder.cols'='tenant,shop_id,plat_code,order_status,payment,pay_time,created','write.orc.compression-codec'='zstd','write.delete.mode'='merge-on-read','write.update.mode'='merge-on-read','write.merge.mode'='merge-on-read')
STORED AS iceberg;
merge into local.test.b_std_trade_4_iceberg_orc_zstd_mor t using( select
`uni_order_id`, `data_from`, `partner`, `plat_code`, `order_id`, `uni_shop_id`,
`uni_id`, `guide_id`, `shop_id`, `plat_account`, `total_fee`,
`item_discount_fee`, `trade_discount_fee`, `adjust_fee`, `post_fee`,
`discount_rate`, `payment_no_postfee`, `payment`, `pay_time`, `product_num`,
`order_status`, `is_refund`, `refund_fee`, `insert_time`, `created`, `endtime`,
`modified`, `trade_type`, `receiver_name`, `receiver_country`,
`receiver_state`, `receiver_city`, `receiver_district`, `receiver_town`,
`receiver_address`, `receiver_mobile`, `trade_source`, `delivery_type`,
`consign_time`, `orders_num`, `is_presale`, `presale_status`,
`first_fee_paytime`, `last_fee_paytime`, `first_paid_fee`, `tenant`,
`tidb_modified`, `step_paid_fee`, `seller_flag`, `is_used_store_card`,
`store_card_used`, `store_card_basic_used`, `store_card_expand_used`,
`order_promotion_num`, `item_promotion_num`, `buyer_remark`, `seller_remar
k`, trade_business_type from ( select `uni_order_id`, `data_from`, `partner`,
`plat_code`, `order_id`, `uni_shop_id`, `uni_id`, `guide_id`, `shop_id`,
`plat_account`, `total_fee`, `item_discount_fee`, `trade_discount_fee`,
`adjust_fee`, `post_fee`, `discount_rate`, `payment_no_postfee`, `payment`,
`pay_time`, `product_num`, `order_status`, `is_refund`, `refund_fee`,
`insert_time`, `created`, `endtime`, `modified`, `trade_type`, `receiver_name`,
`receiver_country`, `receiver_state`, `receiver_city`, `receiver_district`,
`receiver_town`, `receiver_address`, `receiver_mobile`, `trade_source`,
`delivery_type`, `consign_time`, `orders_num`, `is_presale`, `presale_status`,
`first_fee_paytime`, `last_fee_paytime`, `first_paid_fee`, `tenant`,
`tidb_modified`, `step_paid_fee`, `seller_flag`, `is_used_store_card`,
`store_card_used`, `store_card_basic_used`, `store_card_expand_used`,
`order_promotion_num`, `item_promotion_num`, `buyer_remark`, `seller_remark`,
trade_business_type,row_number()
over(partition by uni_order_id order by modified desc,dt desc) as rank from
spark_catalog.dw_source.s_std_trade_tidb where dt>='2023-04-20' ) small where
rank=1) s ON t.uni_order_id = s.uni_order_id and t.tenant = s.tenant and
t.partner = s.partner WHEN MATCHED AND s.modified>=t.modified then UPDATE SET
t.uni_order_id = s.uni_order_id, t.data_from = s.data_from, t.partner =
s.partner, t.plat_code = s.plat_code, t.order_id = s.order_id, t.uni_shop_id =
s.uni_shop_id, t.uni_id = s.uni_id, t.guide_id = s.guide_id, t.shop_id =
s.shop_id, t.plat_account = s.plat_account, t.total_fee = s.total_fee,
t.item_discount_fee = s.item_discount_fee, t.trade_discount_fee =
s.trade_discount_fee, t.adjust_fee = s.adjust_fee, t.post_fee = s.post_fee,
t.discount_rate = s.discount_rate, t.payment_no_postfee = s.payment_no_postfee,
t.payment = s.payment, t.pay_time = s.pay_time, t.product_num = s.product_num,
t.order_status = s.order_status, t.is_refund = s.is_refund, t.refund_fee =
s.refund_fee, t.inse
rt_time = s.insert_time, t.created = s.created, t.endtime = s.endtime,
t.modified = s.modified, t.trade_type = s.trade_type, t.receiver_name =
s.receiver_name, t.receiver_country = s.receiver_country, t.receiver_state =
s.receiver_state, t.receiver_city = s.receiver_city, t.receiver_district =
s.receiver_district, t.receiver_town = s.receiver_town, t.receiver_address =
s.receiver_address, t.receiver_mobile = s.receiver_mobile, t.trade_source =
s.trade_source, t.delivery_type = s.delivery_type, t.consign_time =
s.consign_time, t.orders_num = s.orders_num, t.is_presale = s.is_presale,
t.presale_status = s.presale_status, t.first_fee_paytime = s.first_fee_paytime,
t.last_fee_paytime = s.last_fee_paytime, t.first_paid_fee = s.first_paid_fee,
t.tenant = s.tenant, t.tidb_modified = s.tidb_modified, t.step_paid_fee =
s.step_paid_fee, t.seller_flag = s.seller_flag, t.is_used_store_card =
s.is_used_store_card, t.store_card_used = s.store_card_used,
t.store_card_basic_used = s.store_card_basi
c_used, t.store_card_expand_used = s.store_card_expand_used,
t.order_promotion_num = s.order_promotion_num, t.item_promotion_num =
s.item_promotion_num, t.buyer_remark = s.buyer_remark, t.seller_remark =
s.seller_remark, t.trade_business_type = s.trade_business_type WHEN NOT
MATCHED THEN INSERT * ;
--30 minutes and still not executed
```
This is a screenshot of the current execution status of the MOR table:

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]