bk-mz commented on issue #9833: URL: https://github.com/apache/iceberg/issues/9833#issuecomment-1973061741
I investigated a little. So it seems that iceberg keeps partitions mapped to some form of id. I.e. `2024-02-29-06` partition is translated to `474425`. Apparently running both rewrite_data_files and rewrite_position_delete_files has forced iceberg to leak those internal partitions to filesystem. ``` spark-sql ()> SELECT * FROM database.table.partitions; {"data_load_ts_hour":474111} 0 31581863 67 5518171238 0 0 0 0 2024-03-01 11:39:04.284 2980406515838442447 {"data_load_ts_hour":474110} 0 27528941 59 4744718083 0 0 0 0 2024-03-01 11:39:04.284 2980406515838442447 {"data_load_ts_hour":474113} 0 35247584 75 6106815135 0 0 0 0 2024-03-01 11:39:04.284 2980406515838442447 {"data_load_ts_hour":474112} 0 35767820 76 6203474378 0 0 0 0 2024-03-01 11:39:04.284 2980406515838442447 {"data_load_ts_hour":474115} 0 33848781 73 5714870794 0 0 0 0 2024-03-01 11:39:04.284 2980406515838442447 {"data_load_ts_hour":474114} 0 33251894 72 5706434958 0 0 0 0 2024-03-01 11:39:04.284 2980406515838442447 {"data_load_ts_hour":474117} 0 26825760 56 4575503869 0 0 0 0 2024-03-01 11:39:04.284 2980406515838442447 {"data_load_ts_hour":474116} 0 29780249 64 5100337983 0 0 0 0 2024-03-01 11:39:04.284 2980406515838442447 {"data_load_ts_hour":474109} 0 19755026 43 3250584769 0 0 0 0 2024-03-01 11:39:04.284 2980406515838442447 {"data_load_ts_hour":474108} 0 11820983 24 1801821967 0 0 0 0 2024-03-01 11:39:04.284 2980406515838442447 {"data_load_ts_hour":474127} 0 3751415 8 546119138 0 0 0 0 2024-03-01 11:39:04.284 2980406515838442447 {"data_load_ts_hour":474126} 0 4094247 8 583096432 0 0 0 0 2024-03-01 11:39:04.284 2980406515838442447 {"data_load_ts_hour":474129} 0 4341823 8 647139274 0 0 0 0 2024-03-01 11:39:04.284 2980406515838442447 {"data_load_ts_hour":474128} 0 4645898 8 661700686 0 0 0 0 2024-03-01 11:39:04.284 2980406515838442447 {"data_load_ts_hour":474131} 0 7696352 16 1157927863 0 0 0 0 2024-03-01 11:39:04.284 2980406515838442447 {"data_load_ts_hour":474130} 0 5359994 11 782552958 0 0 0 0 2024-03-01 11:39:04.284 2980406515838442447 ``` I wasn't able to reproduce the issue. For merging position delete files I switched to multi-stage rewrite_data_files varying `where` clauses and `delete-files-threshold`. For fresh partitions that have bigger possibility of update, I run `rewrite_data_files {delete-files-threshold: 10}`. For older partitions `rewrite_data_files {delete-files-threshold: 1}`. Latter will merge all delete files into base files, while former will just merge those base files that has at least 10 delete files associated with this. Can anybody clarify on this weird mapping of iceberg partitions? `{"data_load_ts_hour":474117}`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org