ChristinaTech opened a new issue, #7623: URL: https://github.com/apache/iceberg/issues/7623
### Apache Iceberg version 1.2.1 (latest release) ### Query engine Spark ### Please describe the bug 🐞 It has been discovered that when performing in incremental read in Iceberg 1.2.1, if you call count on the incremental read DataFrame right after calling `load/table` it will return the count as though its not an incremental read DataFrame. This issue disappears if the DataFrame you call count on has any post-load operations such as `orderBy` on it. In addition, if you collect the contents of the unmodified DataFrame to a list or call `show` on the DataFrame the size of the contents is correct, making this a well hidden bug. We found this bug when one of our use case's unit tests failed while attempting to upgrade from Iceberg 1.1.0 to Iceberg 1.2.1, meaning this is a regression between those versions. We were able to replicate this in Iceberg's own unit tests, where we found it impacts Spark 3.3/3.4 but not Spark 3.1/3.2. Considering this only appears after upgrading to a newer Iceberg version, it seems more likely the issue is in Iceberg than Spark, and in addition that the reason Spark 3.1/3.2 are not impacted is it was likely an improvement that was not backported that caused the bug, but I have not tracked down what specific change causes the issue yet. I have provided Draft PR #7616 which contains the minimal change to the unit tests that replicates the issue and will attempt to narrow down the commit that causes the issue in the coming days as time permits. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
