RussellSpitzer commented on issue #13737: URL: https://github.com/apache/iceberg/issues/13737#issuecomment-3164330236
The Spark Sql Call just calls the Java Api :). It was most likely just some OOM sort of issue. The Broadcast requires keeping a copy of the shipped data on the driver and executor during broadcast so if it needed a lot of memory that could have been it. While @amogh-jahagirdar is correct that the amount of memory should be small, I think you are on an older version (1.4.2) of Iceberg where we didn't have the "Ignore duplicates from intermediary snapshots" Code so I think the complexity in that version may be been more or less equivalent of the number of Snapshots * the number of files. I would definitely recommend upgrading your iceberg library -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
