Re: [I] Is there any way to quickly query count(*) and the min/max of a specified column? [iceberg]

via GitHub Wed, 17 Dec 2025 10:08:47 -0800


RussellSpitzer commented on issue #14864:
URL: https://github.com/apache/iceberg/issues/14864#issuecomment-3666558824


   There really isn't a way to calculate aggregates with only metadata if there 
are delete files of any kind (at least at the moment). 
   
   You can see the relevant code in our spark plugin impl 
https://github.com/apache/iceberg/blob/36bb82675ff68ac0ed059d4db62550d30aa35760/spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java#L240-L245
 where we just bail out.
   
   This code doesn't actually use the "files" at all since all the information 
is stored within the Iceberg Manifest files. If we cannot determine the answer 
because of delete files, Spark will see that the aggregates are not pushed and 
calculate the aggregates on the engine side from the actual rows in the data 
files.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Is there any way to quickly query count(*) and the min/max of a specified column? [iceberg]

Reply via email to