b4sus commented on issue #11169: URL: https://github.com/apache/iceberg/issues/11169#issuecomment-2747878001
I also wonder whether metadata files are cleared correctly. This is my test scenario: 1. create table ``` create table op_test.test_entity ( created_at timestamp(6) with time zone, id bigint, data bigint ) with ( format = 'PARQUET', format_version = 2, location = 's3://warehouse/op_test/test_entity', partitioning = ARRAY['day(created_at)'], sorted_by = ARRAY['created_at'] ); select committed_at, snapshot_id, parent_id, operation, manifest_list from op_test."test_entity$snapshots"; ``` | committed\_at | snapshot\_id | parent\_id | operation | manifest\_list | | :--- | :--- | :--- | :--- | :--- | | 2025-03-24 09:41:52.653 +00:00 | 1552900420629541478 | null | append | s3://warehouse/op\_test/test\_entity/metadata/snap-1552900420629541478-1-aa0615f9-5b78-457c-af13-1111de39f8ff.avro | 2. inserting data ```insert into op_test.test_entity values (current_timestamp, 1, 1); select committed_at, snapshot_id, parent_id, operation, manifest_list from op_test."test_entity$snapshots"; ``` | committed\_at | snapshot\_id | parent\_id | operation | manifest\_list | | :--- | :--- | :--- | :--- | :--- | | 2025-03-24 09:41:52.653 +00:00 | 1552900420629541478 | null | append | s3://warehouse/op\_test/test\_entity/metadata/snap-1552900420629541478-1-aa0615f9-5b78-457c-af13-1111de39f8ff.avro | | 2025-03-24 11:16:54.788 +00:00 | 7610639897136105147 | 1552900420629541478 | append | s3://warehouse/op\_test/test\_entity/metadata/snap-7610639897136105147-1-1402cc0a-723b-4fa6-893e-276204d4792c.avro | 3. updating the data ``` update op_test.test_entity set data = 2 where id = 1; select committed_at, snapshot_id, parent_id, operation, manifest_list from op_test."test_entity$snapshots"; ``` | committed\_at | snapshot\_id | parent\_id | operation | manifest\_list | | :--- | :--- | :--- | :--- | :--- | | 2025-03-24 09:41:52.653 +00:00 | 1552900420629541478 | null | append | s3://warehouse/op\_test/test\_entity/metadata/snap-1552900420629541478-1-aa0615f9-5b78-457c-af13-1111de39f8ff.avro | | 2025-03-24 11:16:54.788 +00:00 | 7610639897136105147 | 1552900420629541478 | append | s3://warehouse/op\_test/test\_entity/metadata/snap-7610639897136105147-1-1402cc0a-723b-4fa6-893e-276204d4792c.avro | | 2025-03-24 11:18:20.124 +00:00 | 1067091930589353459 | 7610639897136105147 | overwrite | s3://warehouse/op\_test/test\_entity/metadata/snap-1067091930589353459-1-5b3f30dc-37a0-4288-af2d-c579c1aa2878.avro | The metadata files in s3: <img width="920" alt="Image" src="https://github.com/user-attachments/assets/7b5aec78-47b7-4220-a1d3-dd6466efebb4" /> 4. Running manifest/snapshot optimizations ``` SET SESSION rest_backend.expire_snapshots_min_retention = '1s'; SET SESSION rest_backend.remove_orphan_files_min_retention = '1s'; ALTER TABLE op_test.test_entity EXECUTE optimize_manifests; ALTER TABLE op_test.test_entity EXECUTE expire_snapshots(retention_threshold => '1s'); ALTER TABLE op_test.test_entity EXECUTE remove_orphan_files(retention_threshold => '1s'); select committed_at, snapshot_id, parent_id, operation, manifest_list from op_test."test_entity$snapshots"; ``` | committed\_at | snapshot\_id | parent\_id | operation | manifest\_list | | :--- | :--- | :--- | :--- | :--- | | 2025-03-24 11:23:28.763 +00:00 | 1944591329871905239 | 1067091930589353459 | replace | s3://warehouse/op\_test/test\_entity/metadata/snap-1944591329871905239-1-af49462f-ede6-4ce8-9c0e-9f4e2d3ad88f.avro | new snapshot was created, all other removed - so far so good. But looking at the files in s3: <img width="920" alt="Image" src="https://github.com/user-attachments/assets/93325711-8aba-4428-9f38-6f6b44743114" /> we can see snapshot files removed (eg snap-155....avro), but the metadata.json (eg 00000-210fd671-e520-42e7-90eb-bf26664b2fda.metadata.json) is still there. When opening this json, it has only one snapshot - 1552900420629541478 - the one removed. Shouldn't this metadata.json also be removed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org