RussellSpitzer commented on issue #6326: URL: https://github.com/apache/iceberg/issues/6326#issuecomment-1333946715
While I don't have a problem with disabling statistics reporting, I am pretty dubious this takes that long. What I believe you are actually seeing is the task list being created fort the first time and stored in a list. We use a lazy iterator which needs to be turned into a list before the job begins (even if statistics are not reported). This means even if we don't spend the time iterating the list when we are estimating stats, we will spend that same amount of time later when planning tasks. The only difference would be in the current case the second access to "tasks()" is cached so it's very fast. In this case the speed could probably be improved if the parallelism of the Manifest Reads was increased. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org