SanjayKhoros commented on issue #10907:
URL: https://github.com/apache/iceberg/issues/10907#issuecomment-2351566866

   I'm facing the exact issue myself !! The **metadata/** folder is piling up 
and the contents from **data/** & **metadata/** are not getting deleted.
   
   Sharing my **table properties** just to make sure I didn't make any errors 
there:
   `Table Properties {history.expire.max-snapshot-age-ms=300000, 
write.metadata.previous-versions-max=10, write.parquet.compression-codec=zstd, 
write.manifest-target-size-bytes=33554432, 
read.split.metadata-target-size=67108864, 
write.metadata.delete-after-commit.enabled=true, 
write.target-file-size-bytes=134217728, read.split.target-size=134217728, 
history.expire.min-snapshots-to-keep=3}`
   
   **newDelete**() code is soft deleting the data & it doesn't appear in the 
Athena query:
   ```
   Expression olderThanCutoff = Expressions.lessThan("created_epoch", 
cutoffDateMillis);
   icebergTable.newDelete()
         .deleteFromRowFilter(olderThanCutoff)
          .commit()
   icebergTable.refresh();
   ```
                
   I'm also doing the **rewriteManifests** & **expireSnapshots**, The process 
is succeeding without errors but none of the contents are getting deleted. 
metadata/ folder only grows higher & higher !
   
   ```
               FileIO fileIO = icebergTable.io();
               icebergTable.rewriteManifests()
                       .clusterBy(file -> file.partition().get(0, String.class))
                       .rewriteIf(file -> file.length() < 10 * 1024 * 1024)
                       .deleteWith(fileIO::deleteFile)
                       .commit();
   
               icebergTable.expireSnapshots()
                       .expireOlderThan(cutoffDateMillis)
                       .commit();
               icebergTable.refresh();
   ```
   
   Can someone please point out the issue in my code and why the metadata/ 
keeps rising ? I'm trying every options but I'm stuck without ideas now !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to