RussellSpitzer commented on issue #14864: URL: https://github.com/apache/iceberg/issues/14864#issuecomment-3666558824
There really isn't a way to calculate aggregates with only metadata if there are delete files of any kind (at least at the moment). You can see the relevant code in our spark plugin impl https://github.com/apache/iceberg/blob/36bb82675ff68ac0ed059d4db62550d30aa35760/spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java#L240-L245 where we just bail out. This code doesn't actually use the "files" at all since all the information is stored within the Iceberg Manifest files. If we cannot determine the answer because of delete files, Spark will see that the aggregates are not pushed and calculate the aggregates on the engine side from the actual rows in the data files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
