b4sus commented on issue #11169:
URL: https://github.com/apache/iceberg/issues/11169#issuecomment-2747878001

   I also wonder whether metadata files are cleared correctly. This is my test 
scenario:
   1. create table
   ```
   create table op_test.test_entity (
       created_at timestamp(6) with time zone,
       id bigint,
       data bigint
   ) with (
       format = 'PARQUET',
       format_version = 2,
       location = 's3://warehouse/op_test/test_entity',
       partitioning = ARRAY['day(created_at)'],
       sorted_by = ARRAY['created_at']
   );
   
   select committed_at, snapshot_id, parent_id, operation, manifest_list from 
op_test."test_entity$snapshots";
   ```
   | committed\_at | snapshot\_id | parent\_id | operation | manifest\_list |
   | :--- | :--- | :--- | :--- | :--- |
   | 2025-03-24 09:41:52.653 +00:00 | 1552900420629541478 | null | append | 
s3://warehouse/op\_test/test\_entity/metadata/snap-1552900420629541478-1-aa0615f9-5b78-457c-af13-1111de39f8ff.avro
 |
   
   2. inserting data
   ```insert into op_test.test_entity values (current_timestamp, 1, 1);
   
   select committed_at, snapshot_id, parent_id, operation, manifest_list from 
op_test."test_entity$snapshots";
   ```
   | committed\_at | snapshot\_id | parent\_id | operation | manifest\_list |
   | :--- | :--- | :--- | :--- | :--- |
   | 2025-03-24 09:41:52.653 +00:00 | 1552900420629541478 | null | append | 
s3://warehouse/op\_test/test\_entity/metadata/snap-1552900420629541478-1-aa0615f9-5b78-457c-af13-1111de39f8ff.avro
 |
   | 2025-03-24 11:16:54.788 +00:00 | 7610639897136105147 | 1552900420629541478 
| append | 
s3://warehouse/op\_test/test\_entity/metadata/snap-7610639897136105147-1-1402cc0a-723b-4fa6-893e-276204d4792c.avro
 |
   
   3. updating the data
   ```
   update op_test.test_entity set data = 2 where id = 1;
   
   select committed_at, snapshot_id, parent_id, operation, manifest_list from 
op_test."test_entity$snapshots";
   ```
   | committed\_at | snapshot\_id | parent\_id | operation | manifest\_list |
   | :--- | :--- | :--- | :--- | :--- |
   | 2025-03-24 09:41:52.653 +00:00 | 1552900420629541478 | null | append | 
s3://warehouse/op\_test/test\_entity/metadata/snap-1552900420629541478-1-aa0615f9-5b78-457c-af13-1111de39f8ff.avro
 |
   | 2025-03-24 11:16:54.788 +00:00 | 7610639897136105147 | 1552900420629541478 
| append | 
s3://warehouse/op\_test/test\_entity/metadata/snap-7610639897136105147-1-1402cc0a-723b-4fa6-893e-276204d4792c.avro
 |
   | 2025-03-24 11:18:20.124 +00:00 | 1067091930589353459 | 7610639897136105147 
| overwrite | 
s3://warehouse/op\_test/test\_entity/metadata/snap-1067091930589353459-1-5b3f30dc-37a0-4288-af2d-c579c1aa2878.avro
 |
   
   The metadata files in s3:
   
   <img width="920" alt="Image" 
src="https://github.com/user-attachments/assets/7b5aec78-47b7-4220-a1d3-dd6466efebb4";
 />
   
   4. Running manifest/snapshot optimizations
   ```
   SET SESSION rest_backend.expire_snapshots_min_retention = '1s';
   SET SESSION rest_backend.remove_orphan_files_min_retention = '1s';
   ALTER TABLE op_test.test_entity EXECUTE optimize_manifests;
   ALTER TABLE op_test.test_entity EXECUTE expire_snapshots(retention_threshold 
=> '1s');
   ALTER TABLE op_test.test_entity EXECUTE 
remove_orphan_files(retention_threshold => '1s');
   
   select committed_at, snapshot_id, parent_id, operation, manifest_list from 
op_test."test_entity$snapshots";
   ```
   | committed\_at | snapshot\_id | parent\_id | operation | manifest\_list |
   | :--- | :--- | :--- | :--- | :--- |
   | 2025-03-24 11:23:28.763 +00:00 | 1944591329871905239 | 1067091930589353459 
| replace | 
s3://warehouse/op\_test/test\_entity/metadata/snap-1944591329871905239-1-af49462f-ede6-4ce8-9c0e-9f4e2d3ad88f.avro
 |
   
   new snapshot was created, all other removed -  so far so good. But looking 
at the files in s3:
   
   <img width="920" alt="Image" 
src="https://github.com/user-attachments/assets/93325711-8aba-4428-9f38-6f6b44743114";
 />
   
   we can see snapshot files removed (eg snap-155....avro), but the 
metadata.json (eg 00000-210fd671-e520-42e7-90eb-bf26664b2fda.metadata.json) is 
still there. When opening this json, it has only one snapshot - 
1552900420629541478 - the one removed. Shouldn't this metadata.json also be 
removed?
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to