parasj commented on issue #6456: URL: https://github.com/apache/iceberg/issues/6456#issuecomment-1359913918
Thanks for looking into this @singhpk234. The benchmark is Section 5 from the [TPC-DS spec](https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v3.2.0.pdf). There isn't a need to review this most likely since I can share the specific query that causes an issue (MERGE INTO aka MergeIntoIcebergTable). If I use the default `fs.s3.maxConnections` value, I receive the `Timeout waiting for connection from pool` error. Following [EMR documentation](https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/), I increase that value to at least 400 which resolves the error. However, task runtime increases substantially on the 9th or 10th MERGE INTO iteration. This is the query plan for the slow MERGE operation  Looking at the relevant job, we can see that a single worker is creating an issue. However, this issue occurs consistently across many different EMR clusters, so this is not caused by a bad worker.  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org