findepi commented on issue #10930: URL: https://github.com/apache/iceberg/issues/10930#issuecomment-2295721068
i do you mean min/max and count from the Iceberg metadata are already used for query planning? though none of them used for the agg pushdown? @osscm Correct. Basic idea of data pruning for aggregation queries: during planning of a `max(v)` query, do not create splits for files which have `v`'s _higher bound_ that is lower than some other split's `v`'s _lower bound_. Obviously this wouldn't be as efficient when splits have greatly overlapping `v` values (e.g. a hash value), as the pruning condition would rarely trigger. It's unclear to me how often such "random" values are used for dashboarding though. So it is possible that this strategy is as efficient as we can do for the workloads we care about, yet doesn't require spec changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org