RussellSpitzer commented on issue #13737:
URL: https://github.com/apache/iceberg/issues/13737#issuecomment-3164330236

   The Spark Sql Call just calls the Java Api :). It was most likely just some 
OOM sort of issue. The Broadcast requires keeping a copy of the shipped data on 
the driver and executor during broadcast so if it needed a lot of memory that 
could have been it. 
   
   While @amogh-jahagirdar is correct that the amount of memory should be 
small, I think you are on an older version (1.4.2) of Iceberg where we didn't 
have the "Ignore duplicates from intermediary snapshots" Code so I think the 
complexity in that version may be been more or less equivalent of the number of 
Snapshots * the number of files. 
   
   I would definitely recommend upgrading your iceberg library


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to