Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-08-06 Thread via GitHub
liurenjie1024 commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2272749648 Closed by #373 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-07-06 Thread via GitHub
liurenjie1024 closed issue #124: Add runtime module to enable concurrent load of manifest files. URL: https://github.com/apache/iceberg-rust/issues/124 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-07-06 Thread via GitHub
liurenjie1024 commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2211701240 Close by #233 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-05-14 Thread via GitHub
liurenjie1024 commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2110454632 > try_for_each_concurrent Do you meam [this method](https://docs.rs/futures/latest/futures/prelude/stream/trait.TryStreamExt.html#method.try_for_each_concurrent)? I t

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-05-13 Thread via GitHub
sdd commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2108489909 Using `try_for_each_concurrent` here rather than just spawning in a for loop will allow us to tune the concurrncy as it accepts a max concurrent tasks argument. I'd advocate for a dat

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-05-06 Thread via GitHub
liurenjie1024 commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2095481603 > > How do you feel starting with one task for one manifest file > > you mean: > > * spawn a new task for each manifest, load the manifest (entry.load_manifest(

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-05-06 Thread via GitHub
Fokko commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2095455322 > so if we have a manifest_list with e.g. 5 entries, 1 is pruned (ManifestEvaluator) we'd effectively spawn 4 tasks, to load the manifest and handle all the data files; is this corr

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-05-06 Thread via GitHub
marvinlanhenke commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2095442660 > How do you feel starting with one task for one manifest file you mean: - spawn a new task for each manifest, load the manifest (entry.load_manifest(...).await?)

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-05-06 Thread via GitHub
Fokko commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2095389830 With Iceberg, the manifests are written to a target size (8 megabyte) by default. Each manifest is bound to the same schema and partition, so you can re-use the evaluators here. I w

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-05-05 Thread via GitHub
liurenjie1024 commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2095266262 Hi, @marvinlanhenke After #233 got merged, we will have a basic runtime framework. > Have you already made up your mind; Not yet. I think you solution g

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-05-02 Thread via GitHub
marvinlanhenke commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2092098359 ... so as a first step - simple wrap tokio::spawn (for example) like [here](https://github.com/launchbadge/sqlx/blob/main/sqlx-core/src/rt/mod.rs#L61-L78) - and not even us

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-05-02 Thread via GitHub
liurenjie1024 commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2090490939 Maybe currently we don't need a `Runtime` trait? From what we have learned, we currently need two methods: 1. spawn 2. block_on I think the method [here](https:

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-05-01 Thread via GitHub
marvinlanhenke commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2089542654 in order to verify my understanding and possibly kick of a design discussion, we could follow the approach of `sqlx`: - have a `runtime.rs` - to define a `Runti

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-05-01 Thread via GitHub
liurenjie1024 commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2088084228 It's already tracked here: https://github.com/apache/iceberg-rust/issues/123 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-04-30 Thread via GitHub
marvinlanhenke commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2085361745 @odysa Just to follow up on this, any progress regarding some design ideas? @liurenjie1024 Do we have any reference implementation where we can get "inspired"

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-02-02 Thread via GitHub
liurenjie1024 commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-1925022721 > Do you want users to choose their own runtime like [sqlx](https://github.com/launchbadge/sqlx/tree/main#install)? Yes, exactly. I don't think we should bind to some

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-02-02 Thread via GitHub
odysa commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-1924672646 > Do you want users to choose their own runtime like [sqlx](https://github.com/launchbadge/sqlx/tree/main?rgh-link-date=2024-02-02T17%3A02%3A32Z#install)? They are building an abstr

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-02-02 Thread via GitHub
odysa commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-1924284690 > I mean we may need an extra layer for task scheduling, so that we can be adopted to any async runtime such as tokio, async-std. Do you want users to choose their own runtime

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-02-01 Thread via GitHub
liurenjie1024 commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-1922889772 > Hi, is this what you refer to? Yes, exactly. > Can you plz explain more about "careful to runtime agnostic"? Is there anything we need to be careful when impl

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-02-01 Thread via GitHub
odysa commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-1922745247 Hi, is this what you refer to? Can you plz explain more about "careful to runtime agnostic"? Is there anything we need to be careful when implementing concurrent scanning? https

[I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2023-12-18 Thread via GitHub
liurenjie1024 opened a new issue, #124: URL: https://github.com/apache/iceberg-rust/issues/124 Currently we implement manifest loading in a sequential approach, e.g. load them one by one. We should add load them concurrently. This requires submitting tasks to rust async runtime, and we shou