RussellSpitzer commented on issue #8130: URL: https://github.com/apache/iceberg/issues/8130#issuecomment-1671481858
If it is the planning phase there isn't much to do since most of the cost is in reading the manifests. With 4000 files there are most likely many many manifests. You can try increasing the size of the manifest reading thread pool to increase parallelism there but it's best to just optimize manfiests more regularly and take the cost for one long optimize to start with. I would also highly recommend running optimize data files more frequently as well if you have 4000 files that only take up 5 gb. If it's in the delete phase you just need to enable bulk deletes, this is default in newer versions of Iceberg. In older versions of Iceberg there was an explicit delete parallelism parameter for expire snapshots and delete orphan files. If you are on an older version set these parameters to a high number like 50 or 100. If delete are taking a long time it's probably latency of getting the delete response which is taking so long. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
