pvary commented on PR #12629: URL: https://github.com/apache/iceberg/pull/12629#issuecomment-2782842039
> Let's wait for @ajantha-bhat to come back from the Summit and see what he thinks. > > Based on the discussion above we could just provide these 2 methods on the API: > > ``` > public static Collection<PartitionStats> computeStats(Table table) { > ``` > > and > > ``` > public static Collection<PartitionStats> reComputeStats(Table table) { > ``` Based on our offline discussion with @gaborkaszab, the incremental stats calculation doesn't need to traverse multiple files, so we only need to force recompute stats in cease of some stat corruption. I think that could be a different design/PR (either drop the corrupt stats, or force recompute). Then here we just need an api like: ``` /** * Updates the partition statistics for the table. * <ul> * <li>If there are existing stats for the table then finds the latest one, * and does incremental stats calculation from there. * <li>If there are no current stats, calculate them from scratch * </ul> */ public static Collection<PartitionStats> computeStats(Table table) { ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org