liurenjie1024 commented on issue #124:
URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2272749648
Closed by #373
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
liurenjie1024 closed issue #124: Add runtime module to enable concurrent load
of manifest files.
URL: https://github.com/apache/iceberg-rust/issues/124
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go t
liurenjie1024 commented on issue #124:
URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2211701240
Close by #233
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific c
liurenjie1024 commented on issue #124:
URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2110454632
> try_for_each_concurrent
Do you meam [this
method](https://docs.rs/futures/latest/futures/prelude/stream/trait.TryStreamExt.html#method.try_for_each_concurrent)?
I t
sdd commented on issue #124:
URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2108489909
Using `try_for_each_concurrent` here rather than just spawning in a for loop
will allow us to tune the concurrncy as it accepts a max concurrent tasks
argument. I'd advocate for a dat
liurenjie1024 commented on issue #124:
URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2095481603
> > How do you feel starting with one task for one manifest file
>
> you mean:
>
> * spawn a new task for each manifest, load the manifest
(entry.load_manifest(
Fokko commented on issue #124:
URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2095455322
> so if we have a manifest_list with e.g. 5 entries, 1 is pruned
(ManifestEvaluator) we'd effectively spawn 4 tasks, to load the manifest and
handle all the data files; is this corr
marvinlanhenke commented on issue #124:
URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2095442660
> How do you feel starting with one task for one manifest file
you mean:
- spawn a new task for each manifest, load the manifest
(entry.load_manifest(...).await?)
Fokko commented on issue #124:
URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2095389830
With Iceberg, the manifests are written to a target size (8 megabyte) by
default. Each manifest is bound to the same schema and partition, so you can
re-use the evaluators here. I w
liurenjie1024 commented on issue #124:
URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2095266262
Hi, @marvinlanhenke After #233 got merged, we will have a basic runtime
framework.
> Have you already made up your mind;
Not yet.
I think you solution g
marvinlanhenke commented on issue #124:
URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2092098359
... so as a first step - simple wrap tokio::spawn (for example) like
[here](https://github.com/launchbadge/sqlx/blob/main/sqlx-core/src/rt/mod.rs#L61-L78)
- and not even us
liurenjie1024 commented on issue #124:
URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2090490939
Maybe currently we don't need a `Runtime` trait? From what we have learned,
we currently need two methods:
1. spawn
2. block_on
I think the method
[here](https:
marvinlanhenke commented on issue #124:
URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2089542654
in order to verify my understanding and possibly kick of a design
discussion, we could follow the approach of `sqlx`:
- have a `runtime.rs`
- to define a `Runti
liurenjie1024 commented on issue #124:
URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2088084228
It's already tracked here: https://github.com/apache/iceberg-rust/issues/123
--
This is an automated message from the Apache Git Service.
To respond to the message, please
marvinlanhenke commented on issue #124:
URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-2085361745
@odysa
Just to follow up on this, any progress regarding some design ideas?
@liurenjie1024
Do we have any reference implementation where we can get "inspired"
liurenjie1024 commented on issue #124:
URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-1925022721
> Do you want users to choose their own runtime like
[sqlx](https://github.com/launchbadge/sqlx/tree/main#install)?
Yes, exactly. I don't think we should bind to some
odysa commented on issue #124:
URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-1924672646
> Do you want users to choose their own runtime like
[sqlx](https://github.com/launchbadge/sqlx/tree/main?rgh-link-date=2024-02-02T17%3A02%3A32Z#install)?
They are building an abstr
odysa commented on issue #124:
URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-1924284690
> I mean we may need an extra layer for task scheduling, so that we can be
adopted to any async runtime such as tokio, async-std.
Do you want users to choose their own runtime
liurenjie1024 commented on issue #124:
URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-1922889772
> Hi, is this what you refer to?
Yes, exactly.
> Can you plz explain more about "careful to runtime agnostic"? Is there
anything we need to be careful when impl
odysa commented on issue #124:
URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-1922745247
Hi, is this what you refer to? Can you plz explain more about "careful to
runtime agnostic"? Is there anything we need to be careful when implementing
concurrent scanning?
https
liurenjie1024 opened a new issue, #124:
URL: https://github.com/apache/iceberg-rust/issues/124
Currently we implement manifest loading in a sequential approach, e.g. load
them one by one. We should add load them concurrently. This requires submitting
tasks to rust async runtime, and we shou
21 matches
Mail list logo