jeesou commented on code in PR #11040:
URL: https://github.com/apache/iceberg/pull/11040#discussion_r1861546270


##########
api/src/main/java/org/apache/iceberg/Table.java:
##########
@@ -373,4 +374,14 @@ default Snapshot snapshot(String name) {
 
     return null;
   }
+
+  /**
+   * Returns the statistics file for the given snapshot id, if available.
+   *
+   * @return the {@link StatisticsFile} for the given snapshot id, if 
available.
+   */
+  default Optional<StatisticsFile> statistics(long snapshotId) {

Review Comment:
   yes @amogh-jahagirdar your suggestion is perfect, considering a generic 
solution where we support multiple bolb types. The current implementation is 
considering that we will only support the "apache-datasketches-theta-v1".
   We recently faced this when we were dealing with presto, considering both 
engines were using a common catalog, and hence the puffin file created by 
presto was not use-able as it was of a different blob type 
"presto-sum-data-size-bytes-v1". This change would be a more of a futuristic 
change which we may take up.
   
   Regarding the best effort search of stats @amogh-jahagirdar, I thing we need 
to reconsider if we want to have some statistics always, because that would 
depend on the amount of data added or deleted after the last time we ran and 
Analyze. Because stale statistics could lead to wrong query plans. And what if 
we let the user configure how much deviation or change is the user fine with to 
continue using the older statistics. For the same I had made some changes so 
that the user may decide the amount of change 
https://github.com/karuppayya/iceberg/compare/fix_snapshot...jeesou:fix_snapshot_modifications?expand=1.
   
   Kindly have a look at it @amogh-jahagirdar and @karuppayya and share your 
suggestions please.
   I have not considered the delete scenario, if i find any deletion happening 
I am not using old stats, but that can be up to discussion as delete is a 
tricky subject in this case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to