Re: [I] Specify in lower/upper bounds in data_file struct are exact [iceberg]

via GitHub Sun, 18 Aug 2024 22:52:06 -0700


findepi commented on issue #10930:
URL: https://github.com/apache/iceberg/issues/10930#issuecomment-2295721068


   i do you mean min/max and count from the Iceberg metadata are already used 
for query planning? though none of them used for the agg pushdown?
   
   @osscm  Correct. 
   Basic idea of data pruning for aggregation queries:
   during planning of  a `max(v)` query, do not create splits for files which 
have `v`'s _higher bound_ that is lower than some other split's `v`'s _lower 
bound_.
   
   Obviously this wouldn't be as efficient when splits have greatly overlapping 
`v` values (e.g. a hash value), as the pruning condition would rarely trigger. 
It's unclear to me how often such "random" values are used for dashboarding 
though.
   
   So it is possible that this strategy is as efficient as we can do for the 
workloads we care about, yet doesn't require spec changes.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Specify in lower/upper bounds in data_file struct are exact [iceberg]

Reply via email to