pvary commented on PR #12629:
URL: https://github.com/apache/iceberg/pull/12629#issuecomment-2782842039

   > Let's wait for @ajantha-bhat to come back from the Summit and see what he 
thinks.
   > 
   > Based on the discussion above we could just provide these 2 methods on the 
API:
   > 
   > ```
   > public static Collection<PartitionStats> computeStats(Table table) {
   > ```
   > 
   > and
   > 
   > ```
   > public static Collection<PartitionStats> reComputeStats(Table table) {
   > ```
   
   Based on our offline discussion with @gaborkaszab, the incremental stats 
calculation doesn't need to traverse multiple files, so we only need to force 
recompute stats in cease of some stat corruption. I think that could be a 
different design/PR (either drop the corrupt stats, or force recompute).
   
   Then here we just need an api like:
   ```
   /**
     * Updates the partition statistics for the table.
     * <ul>
     *    <li>If there are existing stats for the table then finds the latest 
one,
     *              and does incremental stats calculation from there.
     *    <li>If there are no current stats, calculate them from scratch
     * </ul>
     */
   public static Collection<PartitionStats> computeStats(Table table) {
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to