SanjayKhoros commented on issue #10907: URL: https://github.com/apache/iceberg/issues/10907#issuecomment-2353234819
Thanks for the quick reply @RussellSpitzer Sharing little more details, Flink version - 1.20.0 Iceberg version - 1.6.1 ` long cutoffDateMillis = LocalDateTime.now() .minusDays(Long.parseLong(flinkConfig.dataCleanup.retentionPeriod)) .toInstant(ZoneOffset.UTC) .toEpochMilli();` I printed my cutOffDateMillis -> **1726313530137** Currently testing the issue in my Dev environment so changed the retain day to 2 days. Like I mentioned earlier, Soft delete is working without any issues. When I query the records based on day, I only see 2 days of data, older records are not appearing ! Major issue is the data not getting cleaned up from S3. My **data/** folder is hardly around **600MB** while **metadata/** is around **1TB** ! I get no errors executing the above rewriteManifests() & expireSnapshots() as well ! Based on your comments above, I thought maybe I should run **deleteOrphanFiles** so added the below support as well: ``` <dependency> <groupId>org.apache.iceberg</groupId> <artifactId>iceberg-spark-runtime-3.4_2.12</artifactId> <version>${iceberg.version}</version> </dependency> ``` And added it below expireSnapshots which is inside a try catch block ``` icebergTable.expireSnapshots() .expireOlderThan(cutoffDateMillis) .commit(); icebergTable.refresh(); logger.info("executing deleteOrphanFiles " + System.currentTimeMillis()); SparkActions.get().deleteOrphanFiles(icebergTable) .olderThan(cutoffDateMillis) .execute(); logger.info("deleteOrphanFiles completed successfully"); ``` Currently the service is on hold after "executing deleteOrphanFiles" log for the past 4 hours ! I'm hoping it does something or throws any error atleast. If you have any suggestions please do share, I'm out of options and references at this point, Thank you ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org