Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

via GitHub Fri, 17 Jan 2025 08:27:16 -0800


RussellSpitzer commented on PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#issuecomment-2598728800


   > @ismailsimsek [my 
issue](https://github.com/apache/iceberg/pull/7914#issuecomment-2557715049) 
with this PR is the same as the previous pr. This isn't a scaleable solution. 
The file system approach was able to parallelize the work through directory 
traversal, but this does not.
   > 
   > I think we need a way to break up the prefixes appropriately so that we 
can distribute the listing.
   
   Do we have some technical docs on the performance of the listPrefix 
approach? I tried to look this up but couldn't find anything other than some 
old stack overflow posts saying it worked on 80million entries in someones 
workflow. I just want to make sure we aren't parallelizing something on the 
client that isn't already parallelized on the server.
   
   I think @danielcweeks is right that the naive approach here could be very 
dangerous if the server implementation of list prefix was not internally 
distributed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

Reply via email to