Re: [I] Massive TIME_WAIT socket exhaustion during metadata (manifest/avro) reads with S3FileIO + Apache HTTP client [iceberg]

via GitHub Fri, 20 Feb 2026 06:38:01 -0800


cristian-fatu commented on issue #14951:
URL: https://github.com/apache/iceberg/issues/14951#issuecomment-3935267663


   We have the same issue, on Spark 3.5.8 and Iceberg 1.10.1. It only seems to 
happen with tables that have a lot of fragmentation (lots of data files, lots 
of metadata and lots of position delete files; unclear which one is triggering 
the problem).
   When checking available ports on the Spark executor there are about 40k to 
begin with. When the error happens later on we see 40K connections to S3 in 
TIME_WAIT (so all ephemeral ports are used up).
   If I compact the table with Trino and later on run compaction with Spark (so 
there are only a few files as input) everything works fine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Massive TIME_WAIT socket exhaustion during metadata (manifest/avro) reads with S3FileIO + Apache HTTP client [iceberg]

Reply via email to