SandeepSinghGahir commented on issue #10340: URL: https://github.com/apache/iceberg/issues/10340#issuecomment-2356794643
> @SandeepSinghGahir I'm really surprised that you're hitting this issue so frequently. Is there something specific about this workload that you think might be triggering this issue? > > I asked @bryanck to see how frequently he sees this happening, but I wouldn't expect it to be a common occurrence. In our workloads, we process data for 20 marketplaces/countries in **separate runs**. One observation is that larger data sizes increase the likelihood of encountering this exception. We never see this issue with marketplaces that have fewer records, and we encounter it less frequently with those that have a medium number of records. Our workloads utilize Glue-Spark, and the transformation process involves joining 4-5 tables, with the driving table containing 25 billion rows. After applying proper filtering for the targeted marketplace, we process output data ranging from a few million to 8 billion records(depending on a marketplace). Even after increasing the number of workers, we continue to face the same issue. If a job takes 2 hours to complete, the exception may be thrown at 30 minutes, or sometimes around an hour. In contrast, when processing data using Hive tables, we do not encounter this issue, although the runtime is longer. We are transitioning our workloads to use open table formats like Iceberg to reduce processing costs. However, with multiple retries, we are incurring higher costs than we initially anticipated in savings. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org