SandeepSinghGahir commented on issue #10340:
URL: https://github.com/apache/iceberg/issues/10340#issuecomment-2356794643

   > @SandeepSinghGahir I'm really surprised that you're hitting this issue so 
frequently. Is there something specific about this workload that you think 
might be triggering this issue?
   > 
   > I asked @bryanck to see how frequently he sees this happening, but I 
wouldn't expect it to be a common occurrence.
   
   In our workloads, we process data for 20 marketplaces/countries in 
**separate runs**. One observation is that larger data sizes increase the 
likelihood of encountering this exception. We never see this issue with 
marketplaces that have fewer records, and we encounter it less frequently with 
those that have a medium number of records.
   
   Our workloads utilize Glue-Spark, and the transformation process involves 
joining 4-5 tables, with the driving table containing 25 billion rows. After 
applying proper filtering for the targeted marketplace, we process output data 
ranging from a few million to 8 billion records(depending on a marketplace).
   
   Even after increasing the number of workers, we continue to face the same 
issue. If a job takes 2 hours to complete, the exception may be thrown at 30 
minutes, or sometimes around an hour. In contrast, when processing data using 
Hive tables, we do not encounter this issue, although the runtime is longer.
   
   We are transitioning our workloads to use open table formats like Iceberg to 
reduce processing costs. However, with multiple retries, we are incurring 
higher costs than we initially anticipated in savings.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to