ajantha-bhat commented on PR #12629:
URL: https://github.com/apache/iceberg/pull/12629#issuecomment-2867156263

   @deniskuzZ: 
   
   > We can work around it with removePartitionStatistics and then compute, but 
that would create 2 snapshots, not sure if that is a good approach. 
   
   Iceberg doesn't create 2 snapshots for this. Updating table metadata with 
the partition stats is `PendingUpdate` not `SnapshotUpdate`. So, it just 
creates a new table metadata file. Not a new snapshots. So, it is a light 
weight operation. 
   
   @pvary : Thanks for discussing this in depth. I agree that as of now one 
interface is enough 
   `PartitionStatsHandler.computeAndWriteStats()` -- incrementally compute 
stats if previous stats available, if not full compute.
   
   In the future we can have 
`UpdatePartitionStatistics.removePartitionStatistics()` as @gaborkaszab 
suggested if user want to bulk remove stats based on the need. If stats are 
corrupted, as of now we do have a way to recompute by unregistering existing 
stats. So, I am fine to keep one interface as of now. 
   
   > Do we know someone from Trino / Spark who can chime in with their 
requirements?
   
   I didn't see much participation directly from these community. Maybe in 
future we can discuss again about adding new interface to force refresh if it 
is needed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to