singhpk234 commented on PR #13400: URL: https://github.com/apache/iceberg/pull/13400#issuecomment-3408871500
Thank you so much for hands on POC Amogh (i am reading some of these interface first time) , It clean and nicely abstracted ! Here is my understanding, please let me know what your thoughts are : 1. [RecursiveTask](https://github.com/amogh-jahagirdar/iceberg/blob/separate-pool-with-queues/core/src/main/java/org/apache/iceberg/RESTTableScan.java#L261) is a really nice abstraction to incorporate this nested tree traversal, specially in the BFS manner - we call the `invokeAll()` immediately ([here](https://github.com/amogh-jahagirdar/iceberg/blob/separate-pool-with-queues/core/src/main/java/org/apache/iceberg/RESTTableScan.java#L307)) since this is a blocking call this would mean ForkJoinPool will hit plan endpoint for all the queries concurrently, this might lead to unnecessary call when things like LIMIT is posed ? 2 [FlatteningTaskIterable](https://github.com/amogh-jahagirdar/iceberg/blob/separate-pool-with-queues/core/src/main/java/org/apache/iceberg/RESTTableScan.java#L325) when we are offering to the queue, if the queue has already reached to the capacity, essentially `queue.offer()` returning false, we would have to wait and hence essentially invokeAll waits ? Presently i am doing a DFS and the ScanIterable i am using is not great, it tracks a List<FileScanTask> in itself and ParallelIterable contains at one point both the Leaf and non-Leaf nodes which are hard to reason about :( -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
