RussellSpitzer commented on PR #11906: URL: https://github.com/apache/iceberg/pull/11906#issuecomment-2598728800
> @ismailsimsek [my issue](https://github.com/apache/iceberg/pull/7914#issuecomment-2557715049) with this PR is the same as the previous pr. This isn't a scaleable solution. The file system approach was able to parallelize the work through directory traversal, but this does not. > > I think we need a way to break up the prefixes appropriately so that we can distribute the listing. Do we have some technical docs on the performance of the listPrefix approach? I tried to look this up but couldn't find anything other than some old stack overflow posts saying it worked on 80million entries in someones workflow. I just want to make sure we aren't parallelizing something on the client that isn't already parallelized on the server. I think @danielcweeks is right that the naive approach here could be very dangerous if the server implementation of list prefix was not internally distributed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org