bk-mz commented on issue #9833:
URL: https://github.com/apache/iceberg/issues/9833#issuecomment-1989424746

   >do the files have incorrect partition tuples (the values actually used for 
filtering)? 
   how do I check this?
   
   Something is odd, because I see bloated small-file partitions, i.e. each 
batch I run 3 consecutive compactions:
   
   ```
   CALL glue.system.rewrite_position_delete_files(table => 'table', where => 
"data_load_ts between TIMESTAMP '2024-02-10 20:39:47.669' and TIMESTAMP 
'2024-03-11 19:39:47.669'", options => map('partial-progress.enabled', 'true', 
'min-file-size-bytes', '26843545', 'max-file-size-bytes', '134217728', 
'min-input-files', '10', 'max-concurrent-file-group-rewrites', '500'))
   
+----------------------------+------------------------+---------------------+-----------------+
   
|rewritten_delete_files_count|added_delete_files_count|rewritten_bytes_count|added_bytes_count|
   
+----------------------------+------------------------+---------------------+-----------------+
   |693                         |689                     |10353631             
|10322951         |
   
+----------------------------+------------------------+---------------------+-----------------+
   
   CALL glue.system.rewrite_data_files(table => 'table', where => "data_load_ts 
between TIMESTAMP '2024-03-04 20:41:16.346' and TIMESTAMP '2024-03-11 
19:41:16.346'", options => map('partial-progress.enabled', 'true', 
'min-file-size-bytes', '53687091', 'max-file-size-bytes', '268435456', 
'min-input-files', '20', 'max-concurrent-file-group-rewrites', '500'))
   
+--------------------------+----------------------+---------------------+-----------------------+
   
|rewritten_data_files_count|added_data_files_count|rewritten_bytes_count|failed_data_files_count|
   
+--------------------------+----------------------+---------------------+-----------------------+
   |60                        |6                     |131249114            |0   
                   |
   
+--------------------------+----------------------+---------------------+-----------------------+
   
   CALL glue.system.rewrite_data_files(table => 'table', where => "data_load_ts 
<= TIMESTAMP '2024-03-04 20:41:42.204'", options => 
map('partial-progress.enabled', 'true', 'min-file-size-bytes', '53687091', 
'max-file-size-bytes', '268435456', 'min-input-files', '20', 
'max-concurrent-file-group-rewrites', '1000'))
   
+--------------------------+----------------------+---------------------+-----------------------+
   
|rewritten_data_files_count|added_data_files_count|rewritten_bytes_count|failed_data_files_count|
   
+--------------------------+----------------------+---------------------+-----------------------+
   |20                        |2                     |655906               |0   
                   |
   
+--------------------------+----------------------+---------------------+-----------------------+
   ```
   
   (where clause changes per each batch). 
   
   Then, this is what I see in the compactions logs:
   
   ```sql
   with data as (
   select
       committed_at,
       snapshot_id,
       summary.`changed-partition-count` as changed_partition_count,
       if(summary.`added-data-files` is null, "compact_delete_files", 
"compact_base_files") as op,
       if(summary.`added-data-files` is null, 
summary.`added-position-delete-files`, summary.`added-data-files`) as 
added_files,
       if(summary.`added-data-files` is null, 
summary.`removed-position-delete-files`, summary.`deleted-data-files`) as 
removed_files
   from 
messaging_data_platform_pii_sandbox.mdp_temp_inflight_pffm_iceberg2.snapshots
   where operation = "replace")
   select
       committed_at,
       snapshot_id,
       changed_partition_count,
       op,
       concat(removed_files, "->", added_files) as change,
       removed_files / added_files as compact_ratio
   from data
   limit 100;
   ```
   
   Results:
   
   ```
   2024-03-10 20:36:07.15       8797169961871736418     23      
compact_delete_files    165->157        1.0509554140127388
   2024-03-10 20:36:08.41       7845087824966754379     23      
compact_delete_files    263->249        1.0562248995983936
   2024-03-10 20:36:09.532      7903647840988554268     23      
compact_delete_files    634->616        1.0292207792207793
   2024-03-10 20:36:10.764      483626480370270807      23      
compact_delete_files    835->814        1.0257985257985258
   2024-03-10 20:36:12.728      5477859694032525431     23      
compact_delete_files    441->426        1.0352112676056338
   2024-03-10 20:36:16.581      1255334267473732600     23      
compact_delete_files    803->784        1.024234693877551
   2024-03-10 20:36:18.947      7227297553321373728     23      
compact_delete_files    505->478        1.0564853556485356
   2024-03-10 20:36:21.98       1281338329182940375     23      
compact_delete_files    582->566        1.028268551236749
   2024-03-10 20:36:23.08       6410347697455606449     23      
compact_delete_files    353->317        1.113564668769716
   2024-03-10 20:36:24.808      7588875997900067709     20      
compact_delete_files    599->588        1.0187074829931972
   2024-03-10 20:40:05.974      198476720206951232      2       
compact_base_files      40->3   13.333333333333334
   2024-03-10 20:40:08.515      7210798375839628837     2       
compact_base_files      42->4   10.5
   2024-03-10 20:40:10.876      5692616137794405663     2       
compact_base_files      41->4   10.25
   2024-03-10 20:40:13.437      6290596725370482099     2       
compact_base_files      40->4   10.0
   2024-03-10 20:40:16.507      217541509135133596      2       
compact_base_files      42->4   10.5
   2024-03-10 20:40:20.011      4802230468835188293     1       
compact_base_files      20->2   10.0
   2024-03-10 20:41:04.273      6246580026217410227     23      
compact_delete_files    166->155        1.070967741935484
   2024-03-10 20:41:05.719      3297695269624318303     23      
compact_delete_files    264->239        1.104602510460251
   2024-03-10 20:41:06.785      5847626578154331062     23      
compact_delete_files    666->654        1.018348623853211
   2024-03-10 20:41:08.07       4267888321478088912     23      
compact_delete_files    804->778        1.0334190231362468
   2024-03-10 20:41:10.013      8135762954708433780     23      
compact_delete_files    437->420        1.0404761904761906
   2024-03-10 20:41:14.315      2664190367967793873     23      
compact_delete_files    845->828        1.0205314009661837
   2024-03-10 20:41:16.688      3946211637382110477     23      
compact_delete_files    365->343        1.064139941690962
   2024-03-10 20:41:18.796      4932807097664769635     23      
compact_delete_files    546->511        1.0684931506849316
   2024-03-10 20:41:19.909      6618458420791632675     23      
compact_delete_files    448->439        1.020501138952164
   2024-03-10 20:41:21.741      6703590499156726364     19      
compact_delete_files    609->585        1.041025641025641
   2024-03-10 20:45:05.725      9121859527157436009     2       
compact_base_files      40->3   13.333333333333334
   2024-03-10 20:45:08.159      4157493376091722749     2       
compact_base_files      40->4   10.0
   2024-03-10 20:45:10.364      7307518553901315194     2       
compact_base_files      40->4   10.0
   2024-03-10 20:45:12.53       7391382820317167039     2       
compact_base_files      40->4   10.0
   2024-03-10 20:45:15.294      1634346428437060541     2       
compact_base_files      40->4   10.0
   2024-03-10 20:45:18.664      6471891107683048438     1       
compact_base_files      20->2   10.0
   2024-03-10 20:45:23.872      2705895529971003001     1       
compact_base_files      20->2   10.0
   2024-03-10 20:46:09.038      3459810320855697041     23      
compact_delete_files    182->170        1.0705882352941176
   2024-03-10 20:46:10.458      2661568600570547866     23      
compact_delete_files    454->430        1.0558139534883721
   2024-03-10 20:46:11.621      1341779779088511826     23      
compact_delete_files    577->564        1.0230496453900708
   2024-03-10 20:46:12.949      1453266135319995705     23      
compact_delete_files    859->835        1.02874251497006
   2024-03-10 20:46:14.829      4995825204442784024     23      
compact_delete_files    277->259        1.0694980694980696
   2024-03-10 20:46:19.456      6932435756418096621     23      
compact_delete_files    815->786        1.0368956743002544
   2024-03-10 20:46:21.472      2430383133564740551     23      
compact_delete_files    337->325        1.0369230769230768
   2024-03-10 20:46:23.548      3455842334691225549     23      
compact_delete_files    545->528        1.0321969696969697
   2024-03-10 20:46:24.884      6197136702470865431     23      
compact_delete_files    417->405        1.0296296296296297
   2024-03-10 20:46:26.57       3018837951760552054     19      
compact_delete_files    658->639        1.029733959311424
   2024-03-10 20:50:05.474      5737241757182222621     2       
compact_base_files      40->4   10.0
   ```
   
   `compact_delete_files` actually do nothing, i.e. rewrite data into itself, 
or even produce more files than needed.
   
   Are there any obvious places to check for inconsistency in setup or setting?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to