bk-mz commented on issue #9833:
URL: https://github.com/apache/iceberg/issues/9833#issuecomment-1973061741

   I investigated a little.
   
   So it seems that iceberg keeps partitions mapped to some form of id. I.e. 
`2024-02-29-06` partition is translated to `474425`. Apparently running both 
rewrite_data_files and rewrite_position_delete_files has forced iceberg to leak 
those internal partitions to filesystem.
   
   ```
   spark-sql ()> SELECT * FROM database.table.partitions;
   {"data_load_ts_hour":474111} 0       31581863        67      5518171238      
0       0       0       0       2024-03-01 11:39:04.284 2980406515838442447
   {"data_load_ts_hour":474110} 0       27528941        59      4744718083      
0       0       0       0       2024-03-01 11:39:04.284 2980406515838442447
   {"data_load_ts_hour":474113} 0       35247584        75      6106815135      
0       0       0       0       2024-03-01 11:39:04.284 2980406515838442447
   {"data_load_ts_hour":474112} 0       35767820        76      6203474378      
0       0       0       0       2024-03-01 11:39:04.284 2980406515838442447
   {"data_load_ts_hour":474115} 0       33848781        73      5714870794      
0       0       0       0       2024-03-01 11:39:04.284 2980406515838442447
   {"data_load_ts_hour":474114} 0       33251894        72      5706434958      
0       0       0       0       2024-03-01 11:39:04.284 2980406515838442447
   {"data_load_ts_hour":474117} 0       26825760        56      4575503869      
0       0       0       0       2024-03-01 11:39:04.284 2980406515838442447
   {"data_load_ts_hour":474116} 0       29780249        64      5100337983      
0       0       0       0       2024-03-01 11:39:04.284 2980406515838442447
   {"data_load_ts_hour":474109} 0       19755026        43      3250584769      
0       0       0       0       2024-03-01 11:39:04.284 2980406515838442447
   {"data_load_ts_hour":474108} 0       11820983        24      1801821967      
0       0       0       0       2024-03-01 11:39:04.284 2980406515838442447
   {"data_load_ts_hour":474127} 0       3751415 8       546119138       0       
0       0       0       2024-03-01 11:39:04.284 2980406515838442447
   {"data_load_ts_hour":474126} 0       4094247 8       583096432       0       
0       0       0       2024-03-01 11:39:04.284 2980406515838442447
   {"data_load_ts_hour":474129} 0       4341823 8       647139274       0       
0       0       0       2024-03-01 11:39:04.284 2980406515838442447
   {"data_load_ts_hour":474128} 0       4645898 8       661700686       0       
0       0       0       2024-03-01 11:39:04.284 2980406515838442447
   {"data_load_ts_hour":474131} 0       7696352 16      1157927863      0       
0       0       0       2024-03-01 11:39:04.284 2980406515838442447
   {"data_load_ts_hour":474130} 0       5359994 11      782552958       0       
0       0       0       2024-03-01 11:39:04.284 2980406515838442447
   ```
   
   I wasn't able to reproduce the issue. 
   
   For merging position delete files I switched to multi-stage 
rewrite_data_files varying `where` clauses and `delete-files-threshold`.
   
   For fresh partitions that have bigger possibility of update, I run 
`rewrite_data_files {delete-files-threshold: 10}`. For older partitions 
`rewrite_data_files {delete-files-threshold: 1}`.
   
   Latter will merge all delete files into base files, while former will just 
merge those base files that has at least 10 delete files associated with this.
   
   Can anybody clarify on this weird mapping of iceberg partitions? 
`{"data_load_ts_hour":474117}`? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to